PCE Health

The Public Stable Health Check API displays health information about a 4X2 Supercluster or a PCE virtual appliance.

NOTE:

This API is only available for Illumio Core PCE installed on-premises and is not available for Illumio Cloud customers.

About PCE Health API

With this API, you can see the following health information: 

  • How long the PCE has been running, its runlevel, and overall health (normal, warning, or error).
  • Each node hostname, IP address, uptime, runlevel, and whether the PCE software is running properly.
  • Each node type (core or data), and which data node is the database replica and which is the primary database. The replication delay for the database replica is also displayed.
  • Information about PCE service alerts, such as the number of degraded or failed services in the cluster, so you can see where service failures have occurred.

PCE Health API Method

Functionality HTTP URI

Check the health of the PCE.

GET

[api_version]/health

Check PCE Health

URI to Check PCE Health

GET [api_version]/health

Curl Command Check PCE Health

curl -i -X GET https://pce.my-company.com:8443/api/v2/health -H 'Accept: application/json' -u $KEY:'TOKEN' 
PCE Health Response Properties
PropertyDescriptionType
status

Current health status of the PCE. Possible values: 

  • normal: When a PCE health is a normal state it means: 
    • All required services are running.
    • All nodes are running.
    • CPU usage of all nodes is less than 95%.
    • Memory usage of all nodes is less than 95%.
    • Disk usage of all nodes is less than 95%.
    • Database replication lag is less than or equal to 30 seconds.
  • warning: When PCE health is in a warning state, it means: 
    • One or more nodes are unreachable.
    • One or more optional services are missing, or one or more required services have been degraded.
    • The CPU usage of any node is greater than or equal to 95%.
    • Memory usage of any node is greater than or equal to 95%.
    • Disk usage of any node is greater than or equal to 95%.
    • Database replication lag is greater than 30 seconds.
  • critical: A PCE is considered to be in a critical state when one or more required services are missing.
    If a PCE enters a critical state, it might not be possible to authenticate to the PCE or get an API response depending on which services are missing from the PCE.
String
type

The type of PCE:

  • standalone: Indicates that this PCE is an on-premises 2x2 or 4x2 PCE cluster.
    Or one of the following types: 
  • leader: Indicates that this PCE is the leader of a Supercluster.
  • member: Indicates that this PCE is a member of a Supercluster.
String
fqdnThe fully qualified domain name (FQDN) of the PCE.String
available_secondsThe length of time that this PCE has been available, measured in seconds. Number
notifications

Health warnings related to the PCE, which contain the following properties: 

  • status: Severity status of this notification. Possible values include: normal, warning, or critical.
  • token: Description of the notification.
  • message: Notification message.
 
listen_only_mode_enabled_at

Indicates when listen-only mode was enabled for this PCE.

For information about enabling or disabling listen-only mode for a PCE, see the PCE Administration Guide.

String
nodes

The nodes that comprise your PCE cluster.
For each node of your PCE, this API call returns the following properties: 

  • hostname: The node hostname.
  • ip_address: The node IP address.
  • runlevel: (Number) The current runlevel of the PCE software on the node.
    For more information about runlevels and their usage, see the PCE Administration Guide.
  • uptime_seconds: Seconds since this node has been restarted.
  • cpu: Percentage of the node CPU being used.
    Includes the following two sub-properties: 
    • status: Either normal, warning, or critical.
    • percent: (Number) Percentage of the node CPU being used.
  • disk: Percentage of the node's disk that is being used.
    Includes the following two sub-properties: 
    • status: Either normal, warning, or critical.
    • percent: (Number) Percentage of the node disk being used.
  • memory: Percentage of the node's memory that is being used.
    Includes the following two sub-properties: 
    • status: Either normal, warning, or critical.
    • percent: (Number) Percentage of the node disk being used.
  • services: The status of all PCE services running on the node.
    Possible status for PCE services include: 
    • running: The service is fully running and operational.
    • not running: The service has stopped running.
    • partial: The service is running but in a partial state.
    • optional
    • unknown
  • generated_at:Timestamp when this information was generated.
String
network

PCE 2x2 or 4x2 Deployment

For a PCE 2x2 or 4x2 deployment, the networkproperty provides latency information between the database primary and database replica data nodes in your PCE for policy and traffic data.

This property also indicates which data node in your PCE is the database primary database and which is the database replica.

This type of database replication is called intracluster in the REST API.

Sub-properties include: 

replication: The category of properties that provide database replication latency information for a PCE cluster. (For a PCE Supercluster, this information is provided for each PCE in the Supercluster.)

  • type: Type of replication. intracluster for a PCE 2x2 or 4x2 deployment.
  • details: Includes the following properties: 
    • database_name: Either agent for policy data or traffic for traffic data.
    • primary_fqdn: The FQDN of the database primarynode.
    • replica_fqdn: FQDN of the replica database node.

  • value: The amount of replication lag between the primary and database replica for both policy and traffic data.
    • status: Either normal, warning, or critical.
    • lag_seconds: The amount of lag measured in seconds between the primary and replica databases for both policy and traffic data.

Supercluster Deployment

If you have deployed a PCE Supercluster, the PCE health call also returns information about the database replication between the PCE you are currently logged into and all other PCEs in the Supercluster.

In a Supercluster deployment, the security policy provisioned on the leader is replicated to all other PCEs in the Supercluster. Additionally, all PCEs in the Supercluster (leader and members) replicate copies of each workload's context, such as IP addresses, to all other PCEs in the Supercluster.

This other type of database replication for a Supercluster is called intercluster in the REST API, and information is provided for all PCEs in the Supercluster.

Properties include: 

replication: The category of properties that provide database replication latency
information for a PCE cluster.

  • type: Type of replication. intercluster for a PCE Supercluster deployment.
  • details: Includes the following properties: 
    • fqdn: The FQDN of the primary database of the other PCEs listed in this section.
  • value: The amount of replication lag between the PCE you are logged into
    and one of the other PCEs in the Supercluster.
    • status: Either normal, warning, or critical.
    • lag_seconds: The amount of lag measured in seconds between the PCE you are logged into
      and the other PCE listed in this section.
Array
generated_atThe timestamp of when the information was generated. String

PCE Health Response

Example response returned from the PCE Health API.

[
    {
        "status": "normal",
        "type": "standalone",
        "fqdn": "pce.mycompany.com",
        "available_seconds": 84133,
        "notifications": [],
        "listen_only_mode_enabled_at": null,
        "nodes": [
            {
                "hostname": "pce_core1.mycompany.com,
                "ip_address": "192.0.1.0",
                "type": "core",
                "runlevel": 5,
                "uptime_seconds": 2051301,
                "cpu": {
                    "status": "normal",
                    "percent": 7
                },
                "disk": [
                    {
                        "location": "disk",
                        "value": {
                            "status": "normal",
                            "percent": 17
                        }
                    }
                ],
                "memory": {
                    "status": "warning",
                    "percent": 85
                },
                "services": {
                    "status": "normal",
                    "services": {
                        "running": [
                            "agent_background_worker_service",
                            "agent_service",
                            "agent_traffic_service",
                            "auditable_events_service",
                            "collector_service",
                            "ev_service",
                            "executor_service",
                            "fluentd_source_service",
                            "login_service",
                            "memcached",
                            "node_monitor",
                            "search_index_service",
                            "server_load_balancer",
                            "service_discovery_server",
                            "traffic_worker_service",
                            "web_server",
                            "nfc_service"
                        ]
                    }
                },
                "generated_at": "2020-03-03T19:38:52+00:00"
            },
            {
                "hostname": "pce_core2.mycompany.com",
                "ip_address": "192.0.2.0",
                "type": "core",
                "runlevel": 5,
                "uptime_seconds": 2051226,
                "cpu": {
                    "status": "normal",
                    "percent": 7
                },
                "disk": [
                    {
                        "location": "disk",
                        "value": {
                            "status": "normal",
                            "percent": 16
                        }
                    }
                ],
                "memory": {
                    "status": "warning",
                    "percent": 81
                },
                "services": {
                    "status": "normal",
                    "services": {
                        "running": [
                            "agent_background_worker_service",
                            "agent_service",
                            "agent_traffic_service",
                            "auditable_events_service",
                            "collector_service",
                            "ev_service",
                            "executor_service",
                            "fluentd_source_service",
                            "login_service",
                            "memcached",
                            "node_monitor",
                            "search_index_service",
                            "service_discovery_server",
                            "traffic_worker_service",
                            "web_server"
                        ]
                    }
                },
                "generated_at": "2020-03-03T19:38:30+00:00"
            },
            {
                "hostname": "pce_core3.mycompany.com",
                "ip_address": "192.0.3.0",
                "type": "core",
                "runlevel": 5,
                "uptime_seconds": 2051192,
                "cpu": {
                    "status": "normal",
                    "percent": 7
                },
                "disk": [
                    {
                        "location": "disk",
                        "value": {
                            "status": "normal",
                            "percent": 16
                        }
                    }
                ],
                "memory": {
                    "status": "warning",
                    "percent": 90
                },
                "services": {
                    "status": "normal",
                    "services": {
                        "running": [
                            "agent_background_worker_service",
                            "agent_service",
                            "agent_traffic_service",
                            "auditable_events_service",
                            "collector_service",
                            "ev_service",
                            "executor_service",
                            "fluentd_source_service",
                            "login_service",
                            "memcached",
                            "node_monitor",
                            "search_index_service",
                            "service_discovery_server",
                            "traffic_worker_service",
                            "web_server"
                        ]
                    }
                },
                "generated_at": "2020-03-03T19:38:48+00:00"
            },
            {
                "hostname": "pce_core4.mycompany.com",
                "ip_address": "192.0.4.0",
                "type": "core",
                "runlevel": 5,
                "uptime_seconds": 2051136,
                "cpu": {
                    "status": "normal",
                    "percent": 6
                },
                "disk": [
                    {
                        "location": "disk",
                        "value": {
                            "status": "normal",
                            "percent": 16
                        }
                    }
                ],
                "memory": {
                    "status": "warning",
                    "percent": 84
                },
                "services": {
                    "status": "normal",
                    "services": {
                        "running": [
                            "agent_background_worker_service",
                            "agent_service",
                            "agent_traffic_service",
                            "auditable_events_service",
                            "collector_service",
                            "ev_service",
                            "executor_service",
                            "fluentd_source_service",
                            "login_service",
                            "memcached",
                            "node_monitor",
                            "search_index_service",
                            "server_load_balancer",
                            "service_discovery_server",
                            "traffic_worker_service",
                            "web_server"
                        ]
                    }
                },
                "generated_at": "2020-03-03T19:38:51+00:00"
            },
            {
                "hostname": "pce_datae0.mycompany.com",
                "ip_address": "192.0.5.0",
                "type": "data0",
                "runlevel": 5,
                "uptime_seconds": 2051052,
                "cpu": {
                    "status": "normal",
                    "percent": 41
                },
                "disk": [
                    {
                        "location": "disk",
                        "value": {
                            "status": "normal",
                            "percent": 19
                        }
                    }
                ],
                "memory": {
                    "status": "normal",
                    "percent": 26
                },
                "services": {
                    "status": "normal",
                    "services": {
                        "running": [
                            "agent_traffic_redis_cache",
                            "agent_traffic_redis_server",
                            "citus_database_service",
                            "database_monitor",
                            "database_service",
                            "fileserver_service",
                            "flow_analytics_service",
                            "fluentd_data_service",
                            "node_monitor",
                            "service_discovery_server",
                            "set_server_redis_server",
                            "traffic_query_service"
                        ]
                    }
                },
                "generated_at": "2020-03-03T19:38:21+00:00"
            },
            {
                "hostname": "pce_datae1.mycompany.com",
                "ip_address": "192.0.6.0",
                "type": "data1",
                "runlevel": 5,
                "uptime_seconds": 2050979,
                "cpu": {
                    "status": "normal",
                    "percent": 2
                },
                "disk": [
                    {
                        "location": "disk",
                        "value": {
                            "status": "normal",
                            "percent": 21
                        }
                    }
                ],
                "memory": {
                    "status": "normal",
                    "percent": 21
                },
                "services": {
                    "status": "normal",
                    "services": {
                        "running": [
                            "agent_traffic_redis_cache",
                            "citus_database_replica_service",
                            "database_monitor",
                            "database_replica_service",
                            "fileserver_replica_service",
                            "flow_analytics_service",
                            "fluentd_data_service",
                            "node_monitor",
                            "service_discovery_agent",
                            "traffic_query_service"
                        ]
                    }
                },
                "generated_at": "2020-03-03T19:38:02+00:00"
            }
        ],
        "network": {
            "replication": [
                {
                    "type": "intracluster",
                    "details": {
                        "database_name": "agent",
                        "primary_fqdn": "bkhorram-qa-6node-v0-pce-1-dbase0"
                    },
                    "value": {
                        "status": "normal",
                        "lag_seconds": 0
                    }
                },
                {
                    "type": "intracluster",
                    "details": {
                        "database_name": "traffic",
                        "primary_fqdn": "bkhorram-qa-6node-v0-pce-1-dbase0"
                    },
                    "value": {
                        "status": "normal",
                        "lag_seconds": 0
                    }
                }
            ]
        },
        "generated_at": "2020-03-03T19:38:52+00:00"
    }
]