Appliance Troubleshooting

Prev Next

This section provides information to help you use logs and daemon information and provides useful commands for appliance troubleshooting. The AppGate ZTNA appliance runs a customized version of Ubuntu 24. The information below is based on standard Linux but does include some information that is more AppGate ZTNA centric where relevant. The User/device troubleshooting section has more information, particularly in respect to Gateways.

Health warnings and errors

The Collective performs about 50 Site, appliance, and functional healthchecks. The results of these healthchecks are shown in the dashboard in the Sites widget or the appliances widget. From there you can get to the Appliance Health Details, where any warnings or errors will be shown. It is important to take any corrective actions before user access is impacted. Table below suggests some actions to perform when your Collective is reporting that it is unhealthy:

Source

Error Level

Urgency

Message

Action to be taken

Appliance

Offline

High

Current Controller cannot reach the appliance on appliance-hostname:443

Verify the Appliance is running. Verify the hostname of the Controller(s) are resolvable on the appliance and that port 433 is open to the Controller(s). Verify the time on the appliance. SSH to the appliance and run: "nc -zv controller-hostname 443". That should return succeeded if the Collective is in TCP-SPA mode. If not, verify the network firewall for 'Man In the Middle' interference.

Appliance

Error

High

I/O Stalled

This error indicates an underlying storage issue. The disk I/O stalled for several seconds is an indication of either an availability issue or capacity problem of the underlying storage system. Please verify the hardware diagnostics or check your hypervisor stack for storage issues.

Appliance

Error

Low

I/O Error

This error is caused by the underlying storage system of the hardware or hypervisor. Verify that the storage is working correctly, run storage diagnostics, and verify potential read or write issues on the underlying disk system. I/O errors could lead to data corruption.

Appliance

Error

Medium

Geoip database missing

The appliance cannot download geoip data from https://bin.appgate-sdp.com nor https://updates.maxmind.com. Make sure your appliance has access to the internet and DNS is working correctly. If no geoip data is required or no external server connection is allowed, it can be disabled on the Settings > Global Settings page.

Appliance

Error

Medium

Failed to read ntp status

NTP status cannot be verified. Go to the appliance and verify if the command `sudo ntpq -np` returns any errors. Most likely the appliance has a DNS or connectivity issue, as it cannot receive the current time from the configured NTP servers.

Appliance

Error

High

Failed to perform Healthcheck for <appliance name>

The healthcheck service is not running on this appliance. Check cz-configd logs for more info.

Appliance

Error

Medium

Not connected to any Controller

The appliance is not able to reach any of the Controllers in the Collective. Make sure the appliance can reach Controller TCP port <default 443>. If UDP-SPA is enabled, make sure it can connect to UDP port 53 and 443 of the Controllers. Also, ensure that the time is set the same on the appliances.

Appliance

Error

High

Customization error

The appliance has a broken customization script. Download the appliance logs and verify the logs_by_daemon/cz-customization.log file.

Appliance

Error

High

Stuck initializing cloud instance

The appliance is expecting cloudinit information and is not receiving it. Verify your network settings and check with your cloud provider that it is sending the cloudinit information. Additionally, make sure DHCP settings for DNS and the default Gateway are enabled, as they are typically required in most platforms to receive cloudinit information.

Appliance

Error

Medium

The following services are not running:

This error is generated when certain daemons are not started when they should. Sign in to the appliance and verify the status of the daemon: `sudo systemctl status <daemon name>`

Appliance

Error

Medium

High volume usage <name> [X%]

The specific volume on the disk is >90% full. Check what is taking up space and remove files that are not required, such as old core dumps.

Controller

Error

High

Unable to connect dbd instance

The Controller cannot reach the database daemon. Check cz-dbs for status and contact support.

Controller

Error

High

Unexpected state for running Controller

Please contact support.

Controller

Error

High

IP Pool <name> has X Ips allocated out of Y (Error on 90+% usage)

This Controller is running out of IPs from the IP pool. Check the IP pool page in the admin UI under Identity > IP Pools. Check the currently used IPs vs total size of that pool. If the used IPs is almost equal to the total size, you can either change the lease time to a lower number (to clear out some IPs), or you can add additional ranges. When adding additional ranges, make sure those ranges are routed properly for each Site that is not using SNAT. If the currently used IPs is at least 50% lower than the total size, the reason is probably that your users can only reach one of the Controllers. Make sure you fix the connectivity issue and the new Controller will be able to assign the unused IPs.

Gateway

Error

High

Failed to query cz-sessiond for status

The cz-sessiond daemon does not seem to be working correctly. Try restarting the daemon with sudo systemctl restart cz-sessiond.

Gateway

Error

High

Very high number of active connections

The connection tracking table has reached 95%. Once it reaches 100% your user might experience dropped application sessions. Verify the conntrack settings on the appliance with the following command: sudo sysctl -a | grep nf_conntrack and verify _max with _count value. If needed, the max value can be bumped with sudo sysctl -w net.nf_conntrack_max=<New Value>. In 6.1 and 6.2 it requires a customization. In later versions, check the cz-config command to change conntrack limits.

Portal

Error

High

Failed to query cz-nginx@urlaccess for status

Run sudo systemctl status cz-nginx@urlaccess to check that it is started correctly.

Portal

Error

High

cz-nginx@urlaccess: Shared memory size is not enough to save all the HTTP up Action objects + the auxiliary data

Add more memory, or reduce the amount of http UP actions.

LogServer

Error

Medium

Opensearch is down or starting up.

Run sudo systemctl status cz-opensearch to check that it is started correctly.

LogServer/LogForwarder

Error

Medium

cz-logd: Unable to connect to elasticsearch

LogServer or LogForwarder is unable to communicate with Elasticsearch. Check to see if Elasticsearch is down.

LogServer/LogForwarder

Error

Medium

cz-logd: Unable to prepare POST request for inserting data into elasticsearch

Unable to post inserts into Elasticsearch. Check connectivity to the http port of Elasticsearch. Check that the configuration of Elasticsearch matches the configuration of the LogForwarder.

LogServer/LogForwarder

Error

Medium

cz-logd: Unable to create health request for elasticsearch

LogForwarder or LogServer is unable to query health information from Elasticsearch. Check the connectivity and configuration of Elasticsearch, or in the case of a LogServer, if the Elasticsearch service is running at all.

LogServer/LogForwarder

Error

Medium

cz-logd: Unable to create index into elasticsearch, got status: X

LogServer/LogForwarder is unable to create indexes in Elasticsearch. Check the Elasticsearch configuration.

LogServer/LogForwarder

Error

Medium

cz-logd: Elasticsearch status is not green/yellow

LogServer/LogForwarder status of Elasticsearch is not green. Check Elasticsearch server status and fix the status to green/yellow.

LogForwarder

Error

Medium

cz-logd: Unable to get stream, does it exist?, streamname: X, details: Y

LogForwarder is configured with a stream that does not exist in AWS. Check configured name or create it in AWS.

LogForwarder

Error

Medium

cz-logd: Unable to get delivery stream, does it exist?, streamname: %v, details: %v

LogForwarder is unable to get delivery stream. Check IAM roles for the Kinesis output.

LogForwarder

Error

Medium

cz-logd: Could not compile filters, X

LogForwarder has been configured with a filter but is not compiling Check the configuration of filters for the LogForwarder.

LogForwarder

Error

Medium

cz-logd: No credials provided

LogForwarder, Kinesis output: no credentials or faulty credentials provided. Check the Kinesis LogForwarder configuration.

LogForwarder

Error

Medium

cz-logd: Unable to get AWS region, details: X

LogForwarder is unable to get info about an AWS region. Ensure that the correct region is configured for the Kinesis output.

LogForwarder

Error

Medium

cz-logd: Unable to create AWS session, details: X

LogForwarder is unable to create an AWS session using the AWS SDK.

Appliance

Warning

Low

Geoip database was last updated X days ago

The appliance missed receiving the latest geoip data from https://bin.appgate-sdp.com or https://updates.maxmind.com.  Make sure your appliance has access to the internet and DNS is working correctly. You can manually force an update using sudo /etc/cron.daily/geoIpDbUpdate --force

Appliance

Warning

Medium

High volume usage <name> [X%]

The specific volume on the disk is >75% full. Check what is taking up space and remove files that are not required, such as old core dumps.

Appliance

Warning

Medium

Certificate with subject <name> for <appliance name> has expired. You must replace it now if it's in use.

When an appliance certificate has expired, this warning will appear and the appliance will stop accepting a connection. The certificates automatically renew since version 6.1. So this message would appear only if the appliance was offline.

Appliance

Warning

Medium

Certificate with subject <name> for <appliance name> is expiring. You must replace it before <date>.

There is a 30 day warning when an appliance certificate is about to expire. Press the appliance renew certificate option in the System > Appliances menu. Renewing the certificate will restart all services on this appliance.

Appliance

Warning

Medium

The following services have debug logs enabled:

Running with debug logs enabled may harm performance. Switch back to normal logs as soon as possible.

Appliance

Warning

Low

Configuration from Controller is incompatible with this appliance

The configuration from the Controller does not match the configuration format of this appliance. This might be because of a version incompatibility.

Appliance

Warning

Low

cz-ffwd: Unable to connect to X@X (X)

Appliance is unable to connect a websocket connection from the appliance to the LogServer/LogForwarder. Check connectivity between appliances (default TCP port 443, UDP ports 53 and 443).

Appliance

Warning

Low

This system has a CD drive attached

The appliance is running on VMWare and has still a CD drive attached. Go to the VMWare console and remove the attached CD Drive from the virtual machine.

Controller

Warning

High

There are more X than your license allows.

You went over the amount of users. Access to only the first licensed users will be granted.

Controller

Warning

High

IP Pool <name> is too small to be utilized by this Controller.

This error occurs if you assign an IP pool that has less IPs then the amount of Controllers. For example, a /30 with 6 Controllers will give this error.

Controller

Warning

High

Controller appliance certificate does not include the Client profile DNS name <name>. You must renew it to allow Client connections to this Controller.

The Client Profile DNS name included in the Client profile is not present as a SAN in the appliance's certificate. The certificate can be renewed from the Appliances page in the admin UI.  

Controller

Warning

Medium

X of the user licenses are in use.

You are almost running out of user licenses. Contact support or sales to update your license count.

Controller

Warning

Medium

X of the Portal licenses are in use.

You are almost out of Portal licenses. Contact support or sales to update your license count.

Controller

Warning

Medium

X of the service licenses are in use.

You are almost out of service licenses. Contact support or sales to update your license count.

Controller

Warning

Medium

Controller is running in maintenance mode

The Controller is running in maintenance mode due to an ongoing upgrade. If the upgrade failed and your node is still in maintenance mode, you can take it out of maintenance with the following command `sudo cz-config set -j Controller/maintenance false`. Be careful to use only when the upgrade has been cancelled.

Controller

Warning

Medium

Database node not replicating

This Controller is unable to replicate the database with another Controller. Check the connectivity between the two controllers. Bi-directional connectivity is required.

Controller

Warning

Medium

IP Pool <name> has X Ips allocated out of Y (warning between 75-90% usage)

This Controller is running out of IPs from the IP pool. First, check the IP pool page in the admin UI under Identity > IP Pools. Check the Currently used IPs vs total size of that pool. If used IPs is almost equal as the total size, you can either change the lease time to a lower number (to clear out IPs), or you can add additional ranges. When adding additional ranges, make sure those ranges are routed properly for each Site that is not using SNAT. If the currently used IPs is at least 50% lower than the total size, the reason is probably that your users can only reach one of the Controllers. Fix the connectivity issue and the new Controller will be able to assign the unused IPs.

Controller

Warning

Low

BDR conflict

This error would occur if different Controllers have conflicting versions of the data. This would occur when there was a temporary network connectivity issue between different Controllers. Most of these conflicts will be automatically resolved by accepting the latest update of the record. You can run the following command to resolve the remaining conflicts: `sudo cz-config bdr resolve-conflicts` if it keeps appearing please contact support.

Controller

Warning

Low

The following Entitlements are using deprecated Risk Based Access feature. Please migrate them to Condition Based Access. <entitlement name>, <entitlement name>

The same functionality can be achieved using Conditions and checking the risk score criteria.

LogForwarder

Warning

Medium

cz-logd: Not connected to X

A output from the LogForwarder named X in the LogForwarder configuration is not connected. Check connectivity from LogForwarder to log destination.

LogForwarder

Warrning

Medium

cz-logd: Unable to perform log-retention, check access or elasticsearch status. Details: X

LogForwarder has Elasticsearch output configured. But the LogForwarder is unable to perform remove of indexes in the Elasticsearch. Check connectivity or configuration of the Elasticsearch.

LogForwarder

Warning

Medium

cz-logd: Failed to send logs to X

Connectivity for a http based output is not working. Check connectivity from LogForwarder to output destination X.

LogForwarder

Warning

Medium

cz-logd: Kinisis was unable to handle all records, not enough shards?

LogForwarder has Kinesis configured for output. Kinesis output is getting throttled, so might require additional shards configured.

LogForwarder

Warning

Medium

cz-logd: Kinesis error codes: %v

LogForwarder has Kinesis configured for output, but AWS is returning an error code. In many cases these are IAM based errors that need to be fixed in the AWS configuration.

LogForwarder

Warning

Medium

cz-logd: Firehose was unable to handle all records, throughput exceeded?

LogForwarder has Kinesis-firehose configured and is getting throttled. More resources needed to be configured on the AWS side to handle the load.

LogForwarder

Warning

Medium

cz-logd: unable to connect to LogForwarding destination %s (%s) (TLS) %v

LogForwarder has TCP based output configured. Check connectivity towards configured output.

LogForwarder

Warning

Medium

cz-logd: tcp output (%s) is slow, incoming amount of logs exceeds outgoing amout

The amount of generated logs is larger than the amount that is being sent. This might indicate a slow destination SIEM, a large amount of logs are being generated by the appliance, or a very slow connection to the destination SIEM.

Gateway

Warning

High

Gateway appliance certificate does not include the profile hostname <name>. You must renew it to allow client connections to this gateway.

The Client hostname name included in the configuration is not present as a SAN in the appliance's certificate. The certificate can be renewed from the Appliances page in the admin UI.  

Gateway

Warning

High

cz-sessiond: High watermark event queue

Gateway is struggling to keep up with fw-rules generation and sessions. Make sure you are running version 6.1.x or later, add more Gateways to handle the load, refactor Entitlements to be less demanding, change dynamic rules to be more static.

Gateway

Warning

High

cz-sessiond: No revocation has been received for X secs

Gateway to Controller communication is not working. The Controller is unable to push revocation list to the Gateway, and the Gateway is unable to pull them from the Controller

Gateway

Warning

Medium

The following DNS names are unstable: <DNS names>

The DNS names used in the Entitlements are not always returning the same answer. This causes the FW rules to be updated all the time, but also could lead to different IPs being resolved on the Client vs the Gateways. To solve this, create a DNS Policy for the DNA name or domains and add it to the DNS Forwarder configuration. Then replace the dns://<name> in the Entitlement with a *.domain.com<Domain name>. This will make sure the DNS result sent to the Client is the same as the one used by the Gateway. This is needed to address public DNS names.

Gateway

Warning

Medium

cz-sessiond: The following applications are reported as unhealthy: X, Y, Z

Gateway has flagged the applications X, Y, Z as unhealthy.  See manual about App Monitoring feature.

Connector

Warning

Low

Connector client X: Waiting for configuration Connection failed.

Client can't connect to the Controller. Check the Global Client Profile DNS name and make sure the connector can resolve it.  

Run Commands (admin UI)

Run Commands will open the Remote Commands window.

There are eight limited remote commands available which can be run on this appliance, thus avoiding any immediate requirement to SSH to remote machines to perform basic diagnostics.

•addressshow

•dig

•ip route show

•netcat

•nptq

•ping

•tcpdump

•traceroute

Most of the commands have a Timeout field that accepts a value in seconds.

NOTE

The max number of concurrently running commands allowed is five.

Daemon Log commands (SSH)

journalctl

To see live logs:

journalctl -f

To show a specific service:

journalctl -u cz-configd

journalctl -u cz-vpnd@0  (vpnd instance number 0)

journalctl -u "cz-vpnd@*"  (vpnd all instances)

journalctl -u appgatedriver@<client name> (Connector Client tun driver)

journalctl -u appgateservice@<client name> (Connector Client service)

journalctl -u cz-nginx@admin (admin/API interface)

journalctl -u cz-nginx@urlaccess (HTTP up Action type)

journalctl -u cz-nginx@portal (Portal)

journalctl -u cz-nginx@main

To only show logs from a certain importance

journalctl -p warning

To show logs in reverse order :

journalctl -r

To show logs since last boot:

journalctl -b

To show logs in a specific time range:

journalctl --since="2017-06-01 12:17:16" --until="2017-06-02"

All the above flags can be combined. For more information, see: http://manpages.ubuntu.com/manpages/xenial/man1/journalctl.1.html

SYSLOG

journalctl should normally be sufficient for looking at the logs.

However, in the case that the binary logs are corrupt, you can fall-back to /var/log/syslog to get logs (/var/log/upstart was used in earlier versions of the product)

Saving logs

AppGate ZTNA has two types of log records: daemon logs and audit logs. Daemon logs are used to examine the workings of the AppGate ZTNA system and audit logs are used to record the actions performed by the system.

Logs are automatically saved on the appliance. Logs can be downloaded from System > Appliances for examination locally or copied to another machine using the secure copy command: scp or sftp.

Debug log level

The types of system events that are stored in the appliance's Debug Log depends on the appliance Debug log level setting.
For troubleshooting purposes, you may wish to run the appliance in 'DEBUG' mode. If you run the appliance in DEBUG mode, remember to reset the Debug Log Level to a lower mode once the appliance is running satisfactorily to ensure optimal performance of the system.

For details of changing Debug log levels, refer to: System Logs > Debug Logs

Troubleshooting commands (SSH)

You can use the SSH command line to run the following troubleshooting commands:

NOTE

There is a list of more cz-config commands in the cz-setup and cz-config commands section.

Current version

cat /usr/share/cz-image/version  

Current state of the appliance

cat /mnt/state/last-state

Up time and system load

uptime

Reboot the appliance (Some Clients may need to reconnect)

sudo reboot

Restart appliance services in lieu of 'reboot' command

sudo service cz-configd restart

Restart an arbitrary daemon

sudo service <daemon> restart

Bypass root (requires access to GRUB menu)

Reboot the appliance. Press 'e' to edit the GRUB menu. Append cz-login=root to the kernel line as shown below.  Then issue 'Ctrl-x' as listed in the grub menu instructions.

GRUB bootloader configuration showing root and login parameters for system initialization.

Collect appliance diagnostics

sudo cz-system-info [--full]

This command will collect a full set of appliance diagnostic information, including license usage, and save it to /tmp/cz-system-info.txt.gz. The option --full dumps additional information about the state of certain processes. The file will be owned by root. Alternatively, running the command(s) below will make the file be owned by the cz user which may make it easier to scp or sftp it from the appliance.

sudo cz-system-info; sudo chmod a+r /tmp/system-info.txt.gz

It can be downloaded from there using SCP; example command:

scp -i ~/keys/mykey cz@controller.myco.com:/tmp/cz-system-info.txt.gz ~/Desktop/cz-system-info.txt.gz

Detect termination of Client-Appliance connections

cz-config set -j reportBogusTLS/enabled true

Detects MitM issues caused by firewalls or network devices terminating the connection from the Client. When a TLS connection lacks the expected extensions, Appliance raises a five‑minute warning that includes the source IP:port to help identify misconfigured devices.

NOTE

This will cause a lot of activity in the system during normal operations. It is therefore recommended to use this command only when troubleshooting.

Remove a core dump (and warning in dashboard)

Core dumps are under /mnt/data/core so sudo rm (remove) any files from there.

Update the geoIP database now

sudo /etc/cron.daily/geoIpDbUpdate

View the memory available and used on the appliance

free -mh

View the processes, for example those using the most CPU

htop

View running processes

ps -aux or ps -aux | grep <pid>

Alternatively, issue:

ps -ef

View the current system firewall rules

iptables -L -v

For IPv6:

ip6tables -L -v

View network interface addresses

ip addr show

ip link show (link information)

For IPv6:

ip -6 addr show

ip -6 link show

View routes

ip route show

For IPv6:

ip -6 route show

View configuration files

Daemons: The configuration file for each AppGate ZTNA daemon is stored under /etc/cz-<daemon-name>. For example:

  • sessiond: configuration is available under /etc/cz-sessiond

  • Nginx: configuration is stored under the directory: /etc/nginx

  • Rsyslog: configuration is stored under /etc/rsyslog.conf

System: the current combined system configuration is stored under /mnt/state/config/current and contains the following files:

  • local.JSON:  The appliance local configuration.

  • remote.JSON: The appliance remote configuration.

To view these files use the commands jql (local) and jqr (remote) from anywhere in the file system. The previous combined system configuration is stored under /mnt/state/config/previous

These advanced tools are detailed here to help recover from situations when add/remove of a Controller has failed. Please contact support before trying to use any of these tools.

sudo cz-config <option>

bdr status [--show-parted-nodes --exclude-raft --JSON --table-fmt]

Display human readable BDR status for all controller databases

--show-parted-nodes

Show also the nodes already parted when showing the status

--exclude-raft

Exclude RAFT status

--JSON

Output JSON

--table-fmt

Table format when not outputting JSON. Use psql for compatibility with terminals like putty (VALUES: fancy_grid, psql)--show-parted-nodes

bdr force-single-controller-ready

Force appliance to single_controller_ready state, use with caution

bdr force-appliance-ready

Force appliance to appliance_ready state, use with caution

bdr clear-barrier

Clear all BDR barriers, use with caution

bdr remove-node-record REMOVE_NODE_RECORD

Forcefully remove a node from BDR on the current node

bdr update-bdr-group

Take the current BDR leave barrier in the name of a dead node

bdr enable-ip-allocation

Enable IP allocation for current node

bdr disable-ip-allocation

Disable IP allocation for current node

bdr repartition-ip-allocations

Re-partition IP allocations to match current controllers

bdr resolve-conflicts [--run-local-node --keep-conflict-history]

--run-local-node

Run the query clean-up query on this node only, by default it runs on every node

--keep-conflict-history

Don't delete the conflict history

bdr --help

Access help

System internals (excluding LogServer) showing Daemons

Diagram illustrating appliance architecture with components like Controller, Gateway, and Client connections.

Purple

Represents the communication between appliances inside the Collective. All communication occurs between ports 443 mTLS.

Blue

Represents all user metadata flows. The Client connects on 443 mTLS to the Controller and/or Gateway, where the initial packet is handled by spaD and proxyD. Subsequent packets are then routed using unix domain sockets.

Red

Represents user application traffic that travels over the mTLS tunnel.

Black

Internal appliance traffic between daemons.

Green

Represents appliance initiated traffic to do DNS resolving, API connectivity, or ARP traffic.