FortiGate :: HA Troubleshooting
FortiGates are capable of a few methods of High Availability. This post will help you troubleshoot issues with the FortiGate Cluster Protocol (FGCP) in Active/Active or Active/Passive configurations.
What is HA?
High Availability is a feature that allows you to offer redundancy for your firewall. This is achieved by replicating all network connections on your FortiGates (FGCP supports 2-4 cluster units) and connecting them to one another with a HA or Sync cable. You will then configure the HA settings under System > Config > HA and this configuration should match on all units. In either A/A or A/P a unit is selected as the primary cluster unit. With A/P this will handle all traffic and with A/A it will still manage all connections, offloading some inspection requirements to its slave members. I will not go in to any more detail about the setup of HA, or difference between A/A and A/P in this post.
It’s important to note that the FortiGates configuration is effectively duplicated across all members, what differs is the ownership fo a virtual MAC address. It is this MAC address that the FortiGate will respond with when sending packets or responding to ARP requests. Only the primary unit will own this MAC and in the event of a failover, which ever unit becomes primary will also take over ownership of this virtual MAC address, sending a gratuitous ARP to inform neighbouring switches.
How is the primary cluster unit determined?
A useful thing to know when troubleshooting HA issues is how the FortiGates determine which unit will become the primary. The order of priority is as follows:
- Which unit has the most connected, monitored interfaces (which interfaces are monitored is select under the HA configuration)
- Which unit has the highest uptime (there needs to be a fairly substantial difference, >5 minutes)
- Which unit has the highest configured priority (this is configured within the HA configuration)
- Finally, which unit has the highest serial number
At each step, if the value for one device is higher than the other, it will become the primary unit.
FortiGates within a HA cluster will be communicating to one another. They will verify the status of each firewalls interfaces in the form of hello packets and the primary will also be sharing it’s connection table to allow for stateful failover. This communication happens over TCP port 703 with a default interval of 200ms.
Multiple HA Clusters in the same LAN segment
If you are in a situation where you have multiple clusters within the same L2 network, you must consider how the virtual MAC for a cluster is calculated. The virtual MAC is made up of the following components
- group-id_hex = the HA group ID converted in to hexadecimal
- vcluster_integer = 0 for virtual cluster* 1, 2 for virtual cluster 2
- idx = the interface index
*virtual cluster is relevant when you have differing primaries for different Virtual Domains (VDOMs, out of the scope of this post)
As the method of calculating the above is not unique to any particular units, it is essential that the group-id is different between separate clusters within the same LAN segment.
The following section is a list of useful commands to help you troubleshoot issues within a FortiGate HA cluster, along with a brief description of what they show and in some cases, examples of output.
Lab-FG # execute ha manage ?
<id> please input peer box index.
<1> Subsidiary unit FGXXXXXXXXXX
When in a HA cluster, all units share the same interface IPs, so the slave units will not be accessible through their usual interfaces. There are ways around this (dedicate management ports or VDOMs) which are out of the scope of this post. The above command will allow you to access the CLI of the slave units.
Lab-FG # diagnose sys ha status
The above command will give you an overview of your HA cluster. It will show you the members in the cluster, their serial numbers and which unit is currently the primary unit.
Lab-FG # diagnose hardware deviceinfo nic <interface_name>
This command will show you the “Current_HWaddr” and “Permanent_HWaddr” – the former being the HA virtual MAC address that the primary will use.
Lab-FG # diagnose sys ha showcsum
This will show you the checksum for the configuration on the appliance it is ran on. The output should match on all appliance, if it does not, it could mean there is a sync issue amongst the cluster members.
Lab-FG # diagnose debug application hatalk 255
Lab-FG # diagnose debug application hasync 255
Lab-FG # diagnose debug enable
The above commands will show you a live debug of the communication between cluster units.
Lab-FG # execute ha synchronize all
In the event you are encountering some sort of sync issues, the above command will manually force a sync between cluster units. It’s worth then running a checksum to ensure all has sync’d.
Hopefully this post will assist you in resolving HA issues between your FortiGates. Although the above commands may well come in handy, the first step of course is to double check you configuration, ensuring the HA config matches on all units. I will be writing some more posts soon for other FortiGate aspects as well as other vendors.