VEN Troubleshooting
This topic describes some important system administration considerations on Windows, useful tools, and a generalized set of actions to troubleshoot VEN operations.
Windows: Enable Base Filtering Engine (BFE)
Windows BFE is a Windows subsystem that determines which packets should be allowed to the network stack. BFE is enabled by default. If you disable BFE on your Windows workload, all packets are sent to the TCP/IP stack bypassing BFE which can result in different behavior from one system to another. The worst case scenario is all the ingress and egress packets get dropped.
If you have disabled BFE on your Windows workload, re-enable it.
Linux: ignored_interface
The Linux ignored_interface
inhibits PCE policy updates.
Transitioning an enforced workload's interface from or to ignored_interface
might drop the dynamic, long-lived connections maintained by the system.
When a VEN interface is placed in the ignore_interface
list, the any flow state over the interface won't be kept by conntrack an longer. (The conntrack table on Linux stores information on network connections.) If the connection on TCP port 8444 to the PCE is reinitialized, any arriving packets from the PCE are dropped, because the packets do not have any state in conntrack.
The VEN heartbeat eventually restores connections, but meanwhile the VEN implements any policy sent by lightning bolt from the PCE.
VEN Troubleshooting Tools
Illumio Xpress provides the following tools for VEN connectivity checking and troubleshooting VEN issues on workloads:
- A VEN connectivity checking tool called
venconch
for workloads is available on the Illumio Support site. - A VEN compatibility checking feature is available in the PCE web console for paired workloads. See VEN Compatibility Check in the VEN Installation and Upgrade Guide.
Commands to Obtain Firewall Snapshot
Run the following commands on the workload to get a copy of the logs and configured firewall settings.
Linux
iptables-save
ipset –L
Windows
netsh wfp show state
Solaris
ipfstat -ionv
AIX
ipfstat -ionv
Troubleshooting Tips
Connectivity Issues
Perform the following actions to identify why a workload is unreachable, cannot reach other workloads, or cannot communicate with the PCE:
- Determine if all workloads are unable to communicate or just a subset of the workloads are reported as disconnected. If the PCE reports that all workloads are offline, check if PCE is reachable from workloads.
- If a subset of workloads are down, check if there are differences in network configuration between those and the workloads that are connected, and if they are contributing to PCE being unreachable.
- Check if any workloads that are unable to communicate are located behind NAT devices, firewalls, or remote data centers.
- Ensure that outbound TCP port 443 and 444 on the workloads are opened to the PCE.
- If running in a public cloud instance:
- For AWS, ensure security groups permit TCP ports 443 and 444.
- For Azure, ensure that Endpoints are configured to allow traffic.
VEN Process Issues
Check the status of the VEN-specific processes and ensure that they are running and active:
- Linux: Run
/opt/illumio/illumio-ven-ctl status
- Windows: Execute
get-service
in the PowerShell
Ensure the following processes are running and active:
- Linux:
venAgentManager
,venPlatformHandler
,venAgentLManager
,VtapServer
, andAgentMonitor
- Windows:
venAgentLogMgrSvc
,venPlatformHandler
,venVtapServerSvc
, andilowfp
Errors in the VEN Logs
Review the VEN log files to find any errors generated by the system (sudo
required):
-
Logs in
Data_Dir/log
directoryTo look for any errors in the log files, execute
grep –ir ERROR *
To check for firewall updates, view the
platform.log
file. Look for logs related to firewall updates; for example:2014-07-26T22:20:41Z INFO:: Enforcement mode is: XXXX 2014-07-26T22:20:41Z INFO:: Is fw update yes 2014-07-26T22:20:41Z INFO:: Is ipset update yes 2014-07-26T22:20:41Z INFO:: saved fw-json
-
Check heartbeat logs for records related to update messages from the PCE. See the following example heartbeats:
2014-07-26T22:43:12Z Received HELLO from EventService. 2014-07-26T22:43:12Z Sent ACK to EventService. Events – f/w updates etc. 014-07-26T22:34:11Z Received EVENT from EventService. 2014-07-26T22:34:11Z Added EVENT from EventService to PLATFORM handler thread message queue
iptables-save | grep 443 | grep allow_out -A tcp_allow_out -d 54.185.43.60/32 -p tcp -m multiport --dports 443 -m conntrack --ctstate NEW -j NFLOG --nflog-prefix "0x800000000000025f " --nflog-threshold 1 -A tcp_allow_out -d 54.185.43.60/32 -p tcp -m multiport --dports 443 -m conntrack --ctstate NEW -j ACCEPT -A tcp_allow_out -d 204.51.153.0/27 -p tcp -m multiport --dports 443 -m conntrack --ctstate NEW -j NFLOG --nflog-prefix "0x8000000000000265 " --nflog-threshold 1 -A tcp_allow_out -d 204.51.153.0/27 -p tcp -m multiport --dports 443 -m conntrack --ctstate NEW -j ACCEPT iptables-save | grep 444 | grep allow_out -A tcp_allow_out -d 54.185.43.60/32 -p tcp -m multiport --dports 444 -m conntrack --ctstate NEW -j NFLOG --nflog-prefix "0x8000000000000266 " --nflog-threshold 1 -A tcp_allow_out -d 54.185.43.60/32 -p tcp -m multiport --dports 444 -m conntrack --ctstate NEW -j ACCEPT
Policy Sync Might Require Reboot
Persistent errors with policy sync on a workload can be cleared by rebooting the VEN.
Event Viewer Stops Logging
After you upgrade the VEN, Event Viewer can stop logging so that the support report does not include windows_evt_application
, windows_evt_system
, and the system directory (e.g.: msinfo32
). To correct the issue, close Event Viewer before upgrading the VEN. Then reopen Event Viewer.