Tunnel architectures

Understanding Tunnel architectures within the AppGate ZTNA Collective

Site is an AppGate ZTNA term that defines the environment that the AppGate ZTNA system will protect; it should not be confused with a physical site. An AppGate ZTNA Site may represent one or more hosts in a subnet, a DMZ, a geographical location, or a business unit. Sites will typically include one or more Gateways (refer to HA) to handle the client tunnels connected to that Site. Unlike traditional VPN solutions, clients can connect simultaneously to multiple Sites in one Collective.

To make configuring a new Collective easier you will find a built in default Site already configured that uses EBR. For a new Collective you can just edit this default Site, but to begin with you may only want to add a DNS resolver and leave everything else with the default settings.

Every entitlement you configure must relate to a default Site (that entitlement’s home). This will allow the multi-tunnel network driver to set up a tunneled connection to each Site that the user has an entitlement token for. Once a connection is established to a Gateway, the client sets up the client routes to send the traffic down the appropriate tunnel. The Gateway will then route the tunneled traffic on to the protected network.

The tunnel

AppGate ZTNA supports both TCP and UDP for the tunnel. This choice does not affect the protocols supported within the tunnel.

Clients establish a secure tunneled connection to an available Gateway on each Site based on preset weighting. The multi-tunnel network driver is assigned an IP address from the IP pool, so tunneled client-to-Gateway connections appear like any other network-connected device. The tunnel supports many protocols, such as TCP, UDP, GRE, and ICMP, as well as both up and down traffic. This allows for the deployment of more complex systems such as IP telephony.

When setting up the tunnels, the default MTU for the appliances' external NICs is used, which is 1500.

On mobile operating systems, setting up the tunneled connection(s) involves invoking one instance of the operating system's VPN service for each Site. Running multiple VPN services is costly, so consideration should be given to minimizing the number of Sites when using mobile clients.

NOTE
On iOS and iPadOS you should not use more than five Sites as this is likely to exceed the allowed memory limit for the VPN services.

TLSv1.3 with fallback to TLSv1.2 is the default mode. Other modes include the option of DTLS when the need arises.

Using DTLS for the tunnel protocol

The tunnel protocol chosen does not affect the supported protocols within the tunnel from client to Gateway.

DTLS has no handshaking, it streams packets so there is no guarantee of packet delivery or retries in the case of failure. This might have some advantages for improving performance on a high-latency network where minimal delay matters more than packet loss, such as when the system is used for VoIP traffic.

Consider the following points when using DTLS.

If you want SPA functionality on Gateways, you need to use Check UDP (and TCP) SPA mode. In Check TCP SPA mode, it will still work for the TLS-based connections such as the Client-to-Controller but will do nothing in respect of the DTLS tunnel to the Gateways.
AppGate ZTNA has no support fragmentation of internal control packets, so since claims and entitlement tokens are sent inside the tunnel, the maximum token size supported is 16kiB. This sets a limit on the number of entitlements that can be configured for a given user on a DTLS-enabled Site. For reference, an Action (in an entitlement) requires about 20Bytes. The tokens are compressed up to 90%, so this is quite a high limit that will only be encountered in extreme cases. The way around this limitation is to split the traffic between separate DTLS and TLS tunnels connecting to different protected hosts on the same Site.

Tunneled connections are designed around the Software Defined Perimeter network model. But there are also a number of more advanced connectivity options that are supported by AppGate ZTNA for use in more complicated environments.

Tunnel Connection Models

Gateway and Site configurations include advanced connectivity options that allow the tunneled connections to be handled in specific ways to meet requirements, such as when performing VPN replacement. They can also be used to meet other requirements such as ensuring optimal geographic routing for the tunnels. These should help when overlaying AppGate ZTNA into environments which were not originally designed according to the SDP model.

Apart from the Software Defined Perimeter network model, these connectivity options should be considered for advanced use cases only. Even then, some of the models require enabling in policies to allow assignment criteria to be used to limit the number of users who might have their connections handled differently (for example, system admins).

Software Defined Perimeter network model

The entitlement tokens that the client receives include the (public) Client Hostname/IP of the Gateways (120.33.17.10,11,12,20) configured in Secure Tunnel Settings. Most users connect to only the 111.x Site, but a couple connect to both the 111.x Site and the 112.x Site. By default, Disable Source NAT on Gateways is not enabled, so Gateways will perform source NAT from the 192.168.100.64-254 address range to the 172.17.111.x and 172.17.112.20 IP address (shown in black) of the respective Gateways. Each user will automatically be assigned a unique source port per TCP/UDP stream so this provides immediate compatibility with most network environments.

Network diagram illustrating connections between devices and IP addresses in two subnets.

Using an external load-balancer

External load balancers offer some advanced HA features, similar to those built into the system, and they can be used in front of both the Controller and Gateway. However two of the system's core security features, SPA and mTLS, mean there are some big limitations in the use of a number of load-balancer features, and other more advanced HA features are not available at all.

When an external load-balancer is used, UDP-TCP SPA mode cannot be enabled as it won't be able to handle the UDP and TCP traffic. As load-balancers do not have any equivalent cloaking feature they will also now be visible on the internet. This defeats one of the core principals defined in the Software Define Perimeter model.

There is also a big impact on the use of an external load-balancers when using TCP SPA mode in conjunction with mTLS. These two core security features are designed to protect you from man-in-the-middle attacks, which unfortunately makes any application layer load-balancer unusable. This means the load balancer must work exclusively on DNS or on the TCP/IP layer.

Using round-robin DNS is a possibility, where you can edit the Client Hostname/IP (Gateway) or the Profile DNS name (Controller) - maybe issuing different ones to two groups of users in two different locations. The DNS service will choose one of the IP addresses based on round-robin or specific predefined criteria such as geo-location. The round-robin DNS service needs to know the appliance is up and running, otherwise the client can be directed to an unavailable appliance. Configure the load balancer to use the Healthcheck Server to probe each appliance and ensure it is available. In this scenario, the client has no knowledge of the next IP address (it has no list), so if it receives a time-out message from the connected Controller then the user would be unable to authenticate.

A TCP/IP layer load balancer is the other possibility, where you should set the DNS record for Client Hostname/IP (Gateway) or the Profile DNS name (Controller) to point to the load balancer. This will be the only hostname/IP address the client connects to. Traffic is forwarded to one of the appliances based on predefined criteria which ideally includes knowing the appliance is up and running, otherwise the client will be directed to an unavailable appliance. The load balancer should use the Healthcheck Server to probe each appliance and ensure it is available. In this scenario, the client has no knowledge of the next IP address (it has no list), so if it receives a time-out message from the connected Controller then the user would be unable to authenticate.

Care also needs to be taken if the load balancer is being used with Controllers and is set up to direct all traffic to the primary Controller in the first instance. In this case the IP pool sizing will need to be significantly increased because of the way they are split between all the Controllers.

Instead of an external load balancer, AppGate recommends the use of the internal HA mechanisms for both Controller and Gateway which are designed to optimize scalability, security, and performance.

A diagram showing connections between servers, an external load balancer, and user devices.

Accessing Gateways directly using their internal IP addresses

There is an optional setting for Connect via Local Network in the Gateways Secure Tunnel Settings which allows the Gateway's Local Client Hostname/IP (172.17.111.10,11,12) to be added.

When the Use local network option has been enabled in the Site, then the Controller is on the look-out for clients connecting from the list of Public IPs for local network users. When a user or device connects to the Controller from one of these public IP addresses, it assumes the user is on the local LAN. It will issue entitlement tokens which include the Local Client hostname/IP(s) (172.17.111.10,11,12) instead of the Client Hostname/IP(s). Clients will now access the Gateways on that Site directly, and not have their traffic hair-pinned on the firewall.

A diagram showing connections between devices with internal IP addresses and interfaces.

Accessing the nearest Geolocated Site

Some organizations still operate with a WAN, linking different locations with external access to the WAN from several defined places. Their users might also be remotely accessing these different locations depending on their geographies. To support this scenario, AppGate ZTNA has an Override with nearest Site option that can be used.

However, with Disable Source NAT on Gateways enabled, the clients' IP addresses visible via the connected Site are those given by the IP Pool (based on IdP settings) and will be the same on all Sites. This does not matter if a true Software Defined Perimeter model is being implemented as all Sites are assumed to be independent and not WAN connected. This combination presents two challenges and one point that requires some clarification. In the example below, some clients need access to A,B,C, and D via the 112.x Site rather than the usual 111.x Site because of their location.

Assigning Entitlements

By default, entitlements are assigned to one Site only. In this case A,B,C, and D are assigned to the 111.x Site. When the client connects to the 112.x Site, the entitlements would therefore not be available.

There are two ways of overriding this default entitlement assignment when the client needs to connect to the 112.x Site:

Assign just one policy for geo-location use. In the Policy, override the entitlements using Override with nearest Site under Override Site. This will force A,B,C, and D to be assigned to the nearest geo-located Site. Geo-located Sites must be enabled in Sites by checking Use for nearest Site selection.
Use the geo-location claim to assign either a policy for the 111.x location or another for the 112.x location. Override the entitlements in the latter policy to the 112.x Site using Override by selecting from list under Override Site. This will force A,B,C, and D to be assigned to this 112.x Site instead of the default 111.x Site.

Static routes

The static routing for the clients' address range (192.168.100.64-254) can be set only in one place on the WAN, in this case via the 111.x Site. Clients that connect to the .112.x Site will still be using their 192.168.100.64-254 address, therefore all their return traffic will be routed to the wrong Site.

The IP Pool mapping feature in Sites allows a mapped IP address to be allocated (from the range 192.168.101.1-63) for use on the 112.x Site. The allocation option is used in this case to reduce the size of the required IP pool as the number of users on the 112.x Site is much less than the 111.x site. This means the user will always appear as coming from the 192.168.1-63 address range on the .112.x Site. A separate static route for the 192.168.101.1-63 address range can now be set via the 112.x Site, ensuring the user's return traffic is routed correctly.

Connect via Local Network

There is one thing more that should be considered: what if you are using Override with nearest Site and Connect via Local Network together? In both cases the Controller is messing with the entitlement tokens before they are sent to the Client, so the behavior is well-defined.

When a client signs in from a local IP, then the usual local IP change is applied and the Gateway’s public hostname is overwritten.
The nearest Site geolocation calculation is skipped, and the Controller makes the assumption that the local Site is the nearest one. So no further changes are made.

A diagram showing routing tables and IP allocations for multiple devices and tunnels.

Accessing alternative WAN-conjoined Sites in the event of an outage using a fallback Site

Some organizations still operate with a WAN linking different locations with external access to the WAN from several defined places. Users might normally have access provisioned by location depending on their geographies. In this example, all the clients usually access to A,B,C, and D via the 111.x Site. The 111.x Site has a fallback Site specified, in this case the 112.x Site. Two of the clients have Use fallback Site enabled in their Policy; these might be the sysadmins responsible for getting the 111.x Site back up and running. Only these two can fallback to the 112.x Site in the event that there is a problem at the 111.x Site to prevent the 112.x Site being overloaded.

The fallback process is controlled by the Client, so if the Controller’s Site was also affected then this should not matter. After a minimum period of 30 seconds the client will decide when a Site is unresponsive and switch to the fallback Site.

Because users will only appear on the WAN once, there is no issue with the same IP address appearing in different locations. This option will work best when Disable Source NAT on Gateways is not enabled, as each user will automatically be assigned a unique source port per TCP/UDP stream on the fallback Site's Gateway(s) IP address, and network routing will already be in place for this. If Disable Source NAT on Gateways is enabled, then the IP Pool mapping feature in Sites should be used so unique routes can be set for each site (see the next example).

Network diagram showing connections and fallback site policy for users across devices.

Accessing WAN-conjoined Sites simultaneously

With Disable Source NAT on Gateways enabled, the clients' IP addresses visible on the protected network are those given by the IP Pool (based on IdP settings) and will be the same on all Sites. This does not matter if a true Software Defined Perimeter model is being implemented as all Sites are assumed to be independent and not WAN connected. For conjoined Sites you want to avoid multiple Gateways responding to ARP requests for tunnel IP address range (192.168.100.64-254).

In this example, users connect to the .111.x Site and to the .112.x Site because they need access to all the hosts A, B, C, and D. If host C was initiating a connection to a user (for instance IP telephony), then the router would make an ARP request for, say, 192.168.100.100. If the user had the same IP address on both Sites, then it would get two replies. The IP Pool mapping feature in Sites allows a mapped IP address to be one-to-one translated (from 192.168.100.64-254 to 192.168.101.64-254) for use on the 112.x Site. The translation option is used in this case as most users will be connecting both Sites, so the size of the two IP pools will need be the same. This means the user will only appear to be coming from 192.168.100.100 in one place (the .111.x Site). This configuration also allows you to route traffic for a specific purpose to a specific Site for instance to ensure a specific QOS target is achieved for IP telephony.

A diagram showing devices A, B, C, and D accessing WAN-conjoined Sites.