Audit logs

Prev Next

Audit logs will typically be used to keep a record of the transactions being performed by the Collective. These typically provide a definitive record of all accesses and administrative actions sufficient for an audit trail to be followed through the system. These do not drill down to the same level of detail as daemon logs. Even though you will see audit logs from cz-configd such as appliance_status_changed, this should not be considered as an adequate means of monitoring the health of a Collective.

Audit logs are generated by the appliances but are not queried or viewed locally, They are typically are aggregated by the LogServer or the LogForwarder and often exported.

Log handling

Audit logs are written to a local database within each Appliance using logd. Logd manages the storage and can store up to 100,000 audit log records; this should not be considered as log storage - rather logs must be sent on to an appropriate (external) system for permanent storage and analysis. This limit can be increased if required using the SSH command line cz-config command. In the event that logs cannot be sent on (or the sending rate over the network is insufficient) then a warning will be issued (on the dashboard) when 40% of the allowed storage is full.

When sending logs, there are three options: use the LogServer, via the LogForwarder or through Rsyslog. Irrespective of the option chosen audit logs are always sent to Rsyslog in order that they can be exported from the appliance (in real time) along with the daemon logs. Unlike the daemon logs, audit logs are stored in syslog to ensure that the integrity of the audit log trail is not broken during outages, upgrades, etc. This storage has a daily limit and a maximum duration of 7 days. There is also the additional 1GB buffer per configured destination. However, neither of these should be considered as guaranteed log storage and if no Rsyslog destination is available then these logs will be lost.

Logs are written based on quantity and rate - both these parameters should be carefully thought about when planning any deployment. The main generator of audit logs will be IP access audit logs from vpnd. These are generated when a user connects to a (protected) host. There is a default interval for ongoing IP access connections such that a new record will be generated for each open connection once every 120 seconds. This can mean a lot of records - 5000 users with 10 connections to 4 Sites for 8 hours would produce almost 50M audit log records! For this reason it is possible to set the Change IP access audit log interval by Site in Sites > General.

As well as the option to increase the time interval, by setting the value to 0 you can disable IP access logging for that Site. This can for instance be very useful when combined with other Site based settings such as when:

  • Route all traffic through tunnel is being used and a specific Site is the destination for all the user's default Gateway traffic. This is likely to be very uncontrolled with many web pages opening 10s of embedded links all generating audit logs.

  • Alternative default route for all user traffic is being used and all the user's traffic is being directed to a different default gateway. This is sometimes done when it needs to be filtered or processed by some other network appliance which may well also be performing audit logging.

Log Format

The log records themselves are always sent in JSON format:

{
    "version": 2,
    "date": "<MMM dd HH:mm:ss>",
    "timestamp": "<ISO 8601 UTC>",
    "hostname": "<hostname>",
    "daemon": "cz-controllerd",
    "log": {
        "authentication_type": "Client",
        "claims_token_id": "<GUID>",
        "client_ip": "<IP>",
        "collective_id": "<GUID>",
        "collective_name": "<name>",
        "distinguished_name": "CN=<device ID>,CN=<name>,OU=<OU>",
        "distinguished_name_device_id": "<device ID>",
        "distinguished_name_ou": "<OU>",
        "distinguished_name_user": "<name>",
        "id": "<GUID>",
        "timestamp": "<ISO 8601 UTC>",
        "user_claims": {
            "ag": {
                "deviceId": "<GUID>",
                "distinguishedName": "CN=<device ID>,CN=<name>,OU=<OU>",
                "identityProviderId": "<GUID>",
                "identityProviderName": "<name>",
                "loginTime": "<ISO 8601 UTC>",
                "passwordWarning": false,
                "username": "<name>"
            },
            "emails": [
                "admin@email.com"
            ],
            "firstName": "<name>",
            "id": "<GUID>",
            "lastName": "<surname>",
            "Tags": [
                "builtin"
            ],
            "username": "<username>"
        },
        "version": 22
    }
}

They include the collective_id field, this is useful when several Collectives have their logs fed into one external SIEM. There is also the hostname (log_source on LogServer/OpenSearch/Elasticsearch) field which uses the appliance's hostname and records which appliance generated the log record. When auto-scaling scripts are being used, then this field may have a somewhat random looking value as it is generated from the instance IDs created by the cloud provider.

For a some complete examples of how selective audit logs are formatted refer to Audit log flow.

For details of all the fields included in every type of audit log refer to Audit log detail.

Log persistence modes

The storage itself has the option of one of three different audit log persistence modes; Guaranteed, Default, or Performance. These three options can be selected in Global settings.

In Default mode, Logd which uses a database to manage the writing of the Audit logs to disk batches records in 1 second lots. This ensures that at any time the number of unwritten (potentially lost) logs will never exceed 1 second + any write delay in the disk sub-system (say maybe another 1 second).

Performance logging works exactly the same way as Default, but instead of writing to a physical disk, RAM is used. This makes the writing very fast indeed and reduced the IOPS loading on the disk sub-subsystem. You should ensure your system has sufficient RAM (maximum 1GB) before enabling this option.

If Guaranteed logging is enabled then all the daemons in the Appgate SDP system will demand that each audit log record is written before allowing whatever action. So the daemons will wait for an 'ack' from logd before proceeding. The daemons attempt to connect to logd multiple times and logd attempts to write to disk multiple times. The effect of this is that the writing of each audit log record is pretty much guaranteed (on a good performant system). The downside of using Guaranteed Logging is that on a slow system daemons may be waiting for up to 5 seconds (the give-up waiting time)  before continuing. Clearly this could have a very significant performance impact on the system. For this reason we recommend only enabling this feature if there is a specific audit or security requirement mandating it. We also suggest it is only used on hardware appliances or where you have some control over the disk system performance (can specify SSD with high IOPS).

LogServer

Appgate SDP includes a built-in LogServer function, using OpenSearch. The LogServer is an appliance that collects logs from the other members of the Collective, providing an audit trail of actions and user access. Only one LogServer can be deployed therefore HA configuration is not supported. It’s primary use case is to help customers during initial set up, configuration, evaluation and during initial deployment. It is also suitable for use in production environments for certain smaller scale deployments.

Audit Logs using LogServer

Audit logs are fed to the LogServer and to Rsyslog where they combine with other logs.

Using the internal LogServer

Diagram illustrating log management flow between appliances and external servers.

The Audit logs are not intended to live on or be viewed on each appliance, so they will be sent to the LogServer (if configured). This process happens in real time so there should never be a significant amount of (unsent) Audit  logs on an appliance at any time.

The internal LogServer is based on OpenSearch which can use significant resources so it is recommended that it is run on a separate appliance for production usage.

For enterprise usage it is recommended not to rely on the internal LogServer as the only means of collecting and reviewing audit logs. It is not available in an HA configuration, so if the LogServer is lost for a reason such as disk failure; then the log records are also lost.

When a LogServers is configured then it will receive logs from all Sites (all the appliances in the Collective). The audit logs are sent using and HTTPS based protocol on port 443. This ensures that log records have integrity and are secure. There may be situations when the audit logs cannot be sent for a period of time such as:

  • when RSYSLOG and LogServer are both configured, they both need to be reachable before the logs are sent from the appliance. This guarantees the consistency of the system logs

  • when the LogServer or RSYSLOG destination is taken offline, for instance during an upgrade

  • during a network outage

In these situations then cz-logd can store up to 100,000 audit log records (default setting) which will be sent as soon as the problem is resolved.

You can access the LogServer UI (OpenSearch) from the Controller admin UI under Users&Devices>Audit Logs. The LogServer does NOT need to be running on that Controller. When using the inbuilt LogServer, the logs are stored in the LogServer's database but there is no replication of data. The backup utility will create a copy of the current snapshot of the audit logs which can then be saved off-appliance.

Viewing Audit Logs

One of the key requirements when viewing Audit Logs is to be able to track a user or device history. In Appgate SDP there is no ”session id” since we have a distributed system with log records being collected from different appliances in the Collective. Instead we use “distinguished name”, which consists of machine+username+idp; this is the unique identifier which will be included in all Audit Log records no matter how many Controllers or Gateways are being used by the user/device. "Collective ID" is also recorded in each audit log record, so when logs have been consolidated from several different Collectives it is possible to easily identify the source.

LogServer retention period

When you create a LogServer the you can set the Audit Log Retention period in days. This will default to the last 30 days of logs - meaning the rest will be deleted. If a longer period is set then Logs will be saved until the disk is full! After which no more logs will be written (on the LogServer).

Using an external RSYSLOG server

Each appliance also has the ability to export all the logs (audit and daemon) using Rsyslog. If you are using the LogServer then it is not possible to also have a LogForwarder function within the Collective so Rsyslog is the only means of exporting the audit logs.

To export the appliance logs using Rsyslog, then for each appliance you will need to configure the <Rsyslog Destinations>. For more information and examples of the configuration see Appliances > Miscellaneous. It is also possible to configure secure log transfer via RSYSLOG but this requires manual configuration on both ends.

When the LogServer is used in production environments it must meet the parameters outlined below for full support:

LogServer - production use operational parameters

There are six key areas which define the envelope of LogServer operation:

  1. Appliance - The LogServer should NOT co-habit with a Controller in a production environment. It should ideally be deployed as a separate appliance.

  2. Days – The LogServer is not designed to handle large record-sets. The audit log retention days should be set to 45 days maximum in production use. Due to internal data storage representation the system can become very slow to use.

  3. Data – The maximum size of the audit log data set should not exceed 100GB otherwise the operation of the Appgate SDP system may be adversely affected (for instance during backups).

  4. Search – The LogServer requires CPU and RAM when performing searches etc. The minimum requirement for production use is 4vCPU and 16GB RAM. Performing large or complex searches while in production can impact the ability of the LogServer to process incoming production data.

  5. Data rate – The LogServer is the single point to which all log records are sent -  it is therefore a bottleneck. To ensure the LogServer is able to handle the volume of log records sent by multiple Gateways:

    • the LogServer must use a fast (write speed) disk, such as SSD on hardware or its high-performance virtual or cloud equivalent.

    • the IP access audit log interval (set in Sites) has a default of 120s repeat interval which should not be reduced.

    • the maximum user count supported for production use is ~1000 users.

  6. Integrity – The LogServer is not suitable in environments which have a compliance requirement to permanently retain audit log records.

    • HA - the LogServer is not available in an HA configuration. While there is some buffering in the system (during say upgrades), you must use an external means of backing up LogServer data to obtain failure resiliency.  

    • Backup – The Appgate SDP system supports the backing up of audit log records. If a restore is required, then all records since the back-up will be over-written and lost.  

LogForwarder

For large-scale enterprise deployments the LogForwarder appliance should be used; it handles logs within the Appgate SDP Collective, and exports them to a suitable external enterprise class logging system using a choice of protocols.

LogForwarders can be configured for HA operation using two or more appliances. They can be deployed to export the logs by Site to different destinations. Multiple export protocols can be specified at the same time including one for the ELK stack. This means that if there is an ongoing requirement to retain the ELK stack (effectively a copy of the LogServer) in an enterprise environment then one can be deployed outside of the Appgate SDP Collective (for example, running in AWS) and the logs forwarded there whilst also exporting the log data into an enterprise-class logging system.  

Audit Logs using LogForwarder

Audit logs are fed to the LogForwarder (or external log server in the case of the appliance actually being a LogForwarder) and to Rsyslog where they combine with other logs. The destination for each appliance can be seen in the dashboard by clicking on its Status.

Using LogForwarders

Diagram illustrating log management flow between appliances and external servers.

The audit logs are not intended to live on or be viewed on each appliance, so they will be forwarded to the LogForwarder (if configured). This process happens in real time so there should never be a significant amount of (unsent) audit  logs on an appliance at any time. The LogForwarder itself is also designed as a forwarding service (based on the same log forwarding daemon used in all the appliances).

For enterprise usage it is recommended to use 2 or more LogForwarders. Every appliance knows about the configured LogForwarders (for its Site) and will connect to one that is responsive, spreading their load across the LogForwarders based on their health, connectivity and throughput capacity. The HA capability means that during an upgrade, another LogForwarder will immediately take over and continue forwarding logs.

LogForwarders can co-habit with other appliance functions such as Controllers or Gateways. Siting them with Gateways would be a good model to use especially if the log server being targeted was residing on the protected network behind the Gateway.

When a LogForwarder is configured you must specify for which Sites it is going to forward logs. Separate HA LogForwarders can be specified within one Collective; this means different Sites can forward their logs to different end destinations. The appliances on the specified Sites then send their logs using an HTTPS based protocol on port 443 (ensuring log records have integrity and are secure).

NOTE

A LogForwarder will always includes its own logs when forwarding logs to their final destination.

There may be situations when the Audit logs cannot be sent for a period of time such as:

  • when RSYSLOG and LogForwarder are both configured, they both need to be reachable before the logs are sent from the appliance. This guarantees the consistency of the system logs

  • when the LogForwarder or RSYSLOG destination is taken offline, for instance during an upgrade

  • during a network outage

In these situations then cz-logd can store up to 100,000 audit log records (default setting) which will be sent as soon as the problem is resolved. There is an SSH command line cz-config command available to change this value if required.

Once the LogForwarder has received some audit logs they will be forwarded to an external log system outside the Collective. The logs can be sent a number of different ways which are configured as part of the LogForwarder function. Some of these options include the ability to filter the logs before they are forwarded. These advanced filtering options allow only very specific log records to be forwarded.

Viewing Audit Logs

One of the key requirements when viewing Audit Logs is to be able to track a user or device history. In Appgate SDP there is no ”session id” since we have a distributed system with log records being collected from different appliances in the Collective. Instead we use “distinguished name”, which consists of machine+username+idp; this is the unique identifier which will be included in all Audit Log records no matter how many Controllers or Gateways are being used by the user/device. "Collective ID" is also recorded in each audit log record, so when logs have been consolidated from several different Collectives it is possible to easily identify the source.

Using an external RSYSLOG server

Each appliance also has the ability to export all the logs (audit and daemon) using Rsyslog. It is not recommended to use this unless you have a specific requirement to export the daemon logs - instead just configure the LogForwarder to export the logs using syslog.

Migrations

If you have started using the LogServer (maybe on the Controller) during initial deployment and now want to migrate to a different appliance (LogServer or LogForwarder) - this can be done seamlessly without loosing any existing audit logs.

If you are migrating from LogServer to your first LogForwarder then afterwards you will be able to add additional LogForwarders either for HA operation or because you want to distribute audit logs differently according to the Site.

See LogServer migrations for instructions.