The AppGate ZTNA system generates metrics for SNMP and Prometheus in a unified way internally. Effectively when a metric is collected it is programmatically formatted for both types of output, so even though Prometheus is the lead method, there will (almost) always be an exactly matching metric available in SNMP.
SNMP
Simple Network Management Protocol [SNMP] is an application layer protocol which is part of the Transmission Control Protocol ⁄ Internet Protocol (TCP⁄IP) protocol suite originating in the 1980s. It was designed specifically to manage and monitor network elements.
The SNMP server in AppGate ZTNA exposes metrics in the form of variables organized in a management information base [MIB], which describes the metrics relevant to the SDP system. These metrics can then be remotely queried by monitoring applications.
Understanding the AppGate MIB
AppGate ZTNA provides an SNMP MIB (Management Information Base) which includes a rich set of metrics that can be used to monitor the health of the appliances in the Collective.
To use SNMP, the SNMP server must first be enabled in Appliances.
The Appgate MIB can be downloaded in Utilities.
The information in the MIB is effectively a mirror of the Prometheus metrics.
| There are four top level Object Identifiers in the Appgate MIB which relate to the first part of the Prometheus metric name. Under each Identifier there is a sequence defined for each metric. For details of each of these different metrics available, please see Prometheus Metrics where they are all briefly explained. The list of Object Types within each sequence appears to be very long but mostly just relates to the labels/values used in Prometheus. By example for the Portal DNS Proxy we have 2 labels, type: {a, aaaa} and result: {success, cache, nodata, timeout, notfound} as well as the value of the metric itself. So in the MIB we see 3 object types for this metric: type, result and count. |
Below is a short summary of the most salient items found in the MIB:
sdpApn OBJECT IDENTIFIER ::= { sdp 1 }
Appliance related metrics such as CPU usage, memory, disk, network i/o, image size, status, etc. include much of the same information found in the dashboard.
SPA is the front door to the appliances from the Internet. These metrics are provided to monitor this front door. From these metrics you can ensure firewalls are allowing all the SPA traffic through and additionally it may be possible to interpret if an appliance is being targeted by a BOT trying to get past the SPA layer or if there is a DoS attack underway.
Basic Proxy_protocol metrics are provided if this option is in use.
Snat metrics are useful when multiple aliases are being used on the protected network.
audit_logs reports on the logd service running on each appliance. This effectively gives a view of the flow of audit log records. This would quickly identify a breakage in the flow of audit logs before any are permanently lost or a disk becomes full.
audit_events reports the number log records written for each record type. While this does duplicate (to some extent) other parts of the MIB, it does make it possible to monitor something that might be interest. For instance "Total password_authentication_failed events" could be measured for the previous hour.
sdpGw OBJECT IDENTIFIER ::= { sdp 2 }
xxx_resolver comes in several types (DNS, AWS, GCP). The performance of the system can be negatively impacted if name resolving is slow or suffers from frequent errors. These metrics allow DNS queries or API calls to monitored using hits, misses and timeouts. By monitoring these any issues with name resolution could be quickly identified an fixed.
DNS_forwarder is an additional type of name resolver.
Sessions in Gateways relate to the decision making - who should get access to what and when. These are controlled by 'events'. The information provides a way to monitor how well the Gateways are coping with the number of events they have to handle. This becomes more important as the number of users on a Gateway rises as events are scheduled per user.
Vpn has various metrics such as total number of sessions, firewall rules and memory are provided. These are effectively a measure of the utilization of the Gateway and would normally be used to trigger auto-scaling (to add or remove Gateways).
Http reports on stats relating to use of the URL Access.
Ha provides metrics for the HA calls made on the protected network relating to Gateway failover. This might give some indication of the size, frequency and timing of failover events.
sdpCtr OBJECT IDENTIFIER ::= { sdp 3 }
Client and Admin include Authentication, Authorization. Certificate Signing Requests, handling Remedy Actions and Device registrations. As well as quantity of operations, the average processing time is provided. When the Controller becomes overly busy then these times will grow and some corrective action taken such as increasing the number of vCPU specified.
Ip_pool and License metrics are available.
Database relates to the size and state of the database the Controller relies on.
There are also some java process monitoring options.
sdpPtl OBJECT IDENTIFIER ::= { sdp 4 }
The Portal is a hosting platform for (web) Clients. The information provides a way to monitor the number and state of the Clients provisioned on the Portal. Can be useful to ensure you do not run short of available Clients.
The DNS Proxy lives in the Portal. It forwards the Client's DNS requests according to the DNS settings for each user. The information provides a way to monitor the quantity and results of the DNS queries.
