VEN Troubleshooting

This topic describes some important system administration considerations on Windows, useful tools, and a generalized set of actions to troubleshoot VEN operations.

Windows: Enable Base Filtering Engine (BFE)

Windows BFE is a Windows subsystem that determines which packets should be allowed to the network stack. BFE is enabled by default. If you disable BFE on your Windows workload, all packets are sent to the TCP/IP stack bypassing BFE which can result in different behavior from one system to another. The worst case scenario is all the ingress and egress packets get dropped.

If you have disabled BFE on your Windows workload, re-enable it.

Linux: ignored_interface

The Linux ignored_interface inhibits PCE policy updates.

Transitioning an enforced workload's interface from or to ignored_interface might drop the dynamic, long-lived connections maintained by the system.

When a VEN interface is placed in the ignore_interface list, the any flow state over the interface won't be kept by conntrack an longer. (The conntrack table on Linux stores information on network connections.) If the connection on TCP port 8444 to the PCE is reinitialized, any arriving packets from the PCE are dropped, because the packets do not have any state in conntrack.

The VEN heartbeat eventually restores connections, but meanwhile the VEN implements any policy sent by lightning bolt from the PCE.

VEN Troubleshooting Tools

Illumio Xpress provides the following tools for VEN connectivity checking and troubleshooting VEN issues on workloads:

  • A VEN connectivity checking tool called venconch for workloads is available on the Illumio Support site.
  • A VEN compatibility checking feature is available in the PCE web console for paired workloads. See VEN Compatibility Check in the VEN Installation and Upgrade Guide.

Commands to Obtain Firewall Snapshot

Run the following commands on the workload to get a copy of the logs and configured firewall settings.

Linux

  • iptables-save
  • ipset –L

Windows

  • netsh wfp show state

Solaris

ipfstat -ionv

AIX

ipfstat -ionv

Troubleshooting Tips

Connectivity Issues

Perform the following actions to identify why a workload is unreachable, cannot reach other workloads, or cannot communicate with the PCE:

  • Determine if all workloads are unable to communicate or just a subset of the workloads are reported as disconnected. If the PCE reports that all workloads are offline, check if PCE is reachable from workloads.
  • If a subset of workloads are down, check if there are differences in network configuration between those and the workloads that are connected, and if they are contributing to PCE being unreachable.
  • Check if any workloads that are unable to communicate are located behind NAT devices, firewalls, or remote data centers.
  • Ensure that outbound TCP port 443 and 444 on the workloads are opened to the PCE.
  • If running in a public cloud instance:
    • For AWS, ensure security groups permit TCP ports 443 and 444.
    • For Azure, ensure that Endpoints are configured to allow traffic.

VEN Process Issues

Check the status of the VEN-specific processes and ensure that they are running and active:

  • Linux: Run /opt/illumio/illumio-ven-ctl status
  • Windows: Execute get-service in the PowerShell

Ensure the following processes are running and active:

  • Linux: venAgentManager, venPlatformHandler, venAgentLManager, VtapServer, and AgentMonitor
  • Windows: venAgentLogMgrSvc, venPlatformHandler, venVtapServerSvc, and ilowfp

Errors in the VEN Logs

Review the VEN log files to find any errors generated by the system (sudo required):

  • Logs in Data_Dir/log directory

    To look for any errors in the log files, execute grep –ir ERROR *

    To check for firewall updates, view the platform.log file. Look for logs related to firewall updates; for example:

    2014-07-26T22:20:41Z INFO:: Enforcement mode is: XXXX
    2014-07-26T22:20:41Z INFO:: Is fw update yes
    2014-07-26T22:20:41Z INFO:: Is ipset update yes
    2014-07-26T22:20:41Z INFO:: saved fw-json
  • Check heartbeat logs for records related to update messages from the PCE. See the following example heartbeats:

    2014-07-26T22:43:12Z Received HELLO from EventService.
    2014-07-26T22:43:12Z Sent ACK to EventService.
    Events – f/w updates etc.
    014-07-26T22:34:11Z Received EVENT from EventService.
    2014-07-26T22:34:11Z Added EVENT from EventService to PLATFORM handler thread message queue
    iptables-save | grep 443 | grep allow_out
    	-A tcp_allow_out -d 54.185.43.60/32 -p tcp -m multiport --dports 443 -m conntrack --ctstate NEW -j NFLOG --nflog-prefix "0x800000000000025f " --nflog-threshold 1
    	-A tcp_allow_out -d 54.185.43.60/32 -p tcp -m multiport --dports 443 -m conntrack --ctstate NEW -j ACCEPT
    	-A tcp_allow_out -d 204.51.153.0/27 -p tcp -m multiport --dports 443 -m conntrack --ctstate NEW -j NFLOG --nflog-prefix "0x8000000000000265 " --nflog-threshold 1
    	-A tcp_allow_out -d 204.51.153.0/27 -p tcp -m multiport --dports 443 -m conntrack --ctstate NEW -j ACCEPT
    iptables-save | grep 444 | grep allow_out
    	-A tcp_allow_out -d 54.185.43.60/32 -p tcp -m multiport --dports 444 -m conntrack --ctstate NEW -j NFLOG --nflog-prefix "0x8000000000000266 " --nflog-threshold 1
    	-A tcp_allow_out -d 54.185.43.60/32 -p tcp -m multiport --dports 444 -m conntrack --ctstate NEW -j ACCEPT

Policy Sync Might Require Reboot

Persistent errors with policy sync on a workload can be cleared by rebooting the VEN.

Event Viewer Stops Logging

After you upgrade the VEN, Event Viewer can stop logging so that the support report does not include windows_evt_application, windows_evt_system, and the system directory (e.g.: msinfo32). To correct the issue, close Event Viewer before upgrading the VEN. Then reopen Event Viewer.