JRehkemper.de

High Available Virtual IP with Keepalived

High Availability

If you want a system to be highly available it is not sufficient to simply supply multiple nodes of you application. Because if one goes down you have to tell your clients to go to the other one. There are a few options to do so:

  • Application Retry: You could program you client-application in a way to go through the servers one by one until one responds. That will introduce latency in case the first one is down. There is also no load balancing, meaning that all requests will be handled by the first node until it goes down.
  • External LoadBalancing: You could also use an external load balancer like HaProxy or NGINX. This will give you the option of load balancing and in case one of your nodes goes down, the load balancer will simply redirect all traffic to the remaining nodes. But what if your load balancer goes down? In that case you probably just moved the point of failure one layer up, but unless your load balancer itself is not highly available it does not help you.
  • Keepalived: The third option is my preferred one, because you do not have external dependencies. But you also have no load balancing.

Keepalived

Keepalived uses the Virtual Router Redundancy Protocol to achive high availability. All your nodes get a priority and the one with the highest will be the MASTER node. The Master-Node listens to the virtual ip und recieves all the traffic.
All the other nodes (called Backup) will constantly talk to the Master to check if it is still there. If it is not responding, the backup node with the highest priority will step in and listen to the virtual ip and thereby recieve the traffic.
As soon as the master comes back up, all nodes will agree to give the vip back to the master.
This will ensure that your clients can always connect to one of your nodes without any external dependencies. One Downside is, that you cannot loadbalance your traffic between the nodes. You are also limited to 255 keepalived services in one network.

Installation

Install Packages

First of all we need the keepalived package. It is in the standard-repository on every rhel-based linux system.
You can install it like this.

$ dnf install keepalived

Master Configuration

The next step would be to configure the master node.

Therefor open the configuration file with a text editor. If there is an example configuration, you may safely delete it.

$ vim /etc/keepalived/keepalived.conf
vrrp_instance Postgres_VIP {
	state MASTER
	interface ens18
	virtual_router_id 51
	priority 255
	advert_int 1
	unicast_src_ip 10.0.1.62
	unicast_peer {
		10.0.1.63
		10.0.1.64
	}
	authentication {
		auth_type PASS
		auth_pass PostgresReplication
	}
	virtual_ipaddress {
		10.0.2.100/8
	}
}

The parameters expained:

  • vrrp_instance defines an individual instance of the VRRP protocol running on an interface. Just choose a unique name.
  • state defines the initial state that the instance should start in (MASTER or BACKUP).
  • interface defines the interface that VRRP runs on. This should be your LAN interface.
  • virtual_router_id is the unique identifier. It may not be used twice in the same network.
  • priority is the advertised priority of this node. Every node should have a unique priority. The node with the highes priority will recieve the vip.
  • advert_int specifies the frequency that advertisements are sent at (1 second, in this case).
  • authentication specifies the information necessary for servers participating in VRRP to authenticate with each other. In this case, a simple password is defined.
  • virtual_ipaddress defines the IP addresses (there can be multiple) that VRRP is responsible for. In other words your VIP.
  • unicast_src_ip is the IP of this Node.
  • unicast_peer are the IPs of the other nodes.

Backup Configuration

Next we will configure the configuration on the backup nodes.
It will be mostly identical to the master node, with a few key differences:

  • The IP Adresses have to be changed
  • The Priority should be lower than the master. Also every node should get it’s own priority.

If there is an example configuration in the file, you can delete it.

$ vim /etc/keepalived/keepalived.conf
vrrp_instance Postgres_VIP {
	state BACKUP
	interface ens18
	virtual_router_id 51
	priority 254
	advert_int 1
	authentication {
		auth_type PASS
		auth_pass PostgresReplication
	}
	virtual_ipaddress {
		10.0.2.100/8
	}
}

Firewall

If you are using firewalld you will need to open Port 112 for the VRRP communication.

$ firewall-cmd --add-port=112/tcp
$ firewall-cmd --runtime-to-permanent

Start Service

Now that everything is configured we can start the keepalived service. At this point the master node will listen on the virtual ip.

$ systemctl enable --now keepalived

Failover Test

To test the failover, I will disconnect the network on the master and look at the logs of one of the backup nodes.
You can see the log below.

$ journactl -f | grep Keepalived_vrrp
Dec  9 11:05:38 postgres-02 Keepalived_vrrp[7675]: (Postgres_VIP) Entering MASTER STATE
Dec  9 11:05:38 postgres-02 Keepalived_vrrp[7675]: (Postgres_VIP) setting VIPs.  
Dec  9 11:05:38 postgres-02 Keepalived_vrrp[7675]: (Postgres_VIP) Sending/queueing gratuitous ARPs on ens18 for 10.0.2.100

The backup node is entering the master state and activating the vip. Now all the traffic goes through this node.

As soon as the master comes back up it will detect that a lower-priority node is the master and regain the control. Thats when the backup server will deactivate the vip for the master to take over.

$ journactl -f | grep Keepalived_vrrp
Dec  9 11:05:37 postgres-01 Keepalived_vrrp[7675]: (Postgres_VIP) received lower priority (100) advert from 10.0.1.63 - discarding
Dec  9 11:05:38 postgres-01 Keepalived_vrrp[7675]: (Postgres_VIP) received lower priority (100) advert from 10.0.1.63 - discarding
Dec  9 11:05:38 postgres-01 Keepalived_vrrp[7675]: (Postgres_VIP) Receive advertisement timeout
Dec  9 11:05:38 postgres-01 Keepalived_vrrp[7675]: (Postgres_VIP) Entering MASTER STATE
Dec  9 11:05:38 postgres-01 Keepalived_vrrp[7675]: (Postgres_VIP) setting VIPs.  
Dec  9 11:05:38 postgres-01 Keepalived_vrrp[7675]: (Postgres_VIP) Sending/queueing gratuitous ARPs on ens18 for 10.0.2.100

Because of that the backup will switch back to the backup state.

$ journactl -f | grep Keepalived_vrrp
Dec  9 11:01:39 postgres-02 Keepalived_vrrp[7724]: (Postgres_VIP) Entering BACKUP STATE (init)

Scripts

By default the vip is on the master as long as the server is running.
If you want to check for a specific Port or Service, you can also use a script like this.

Therefor we will declare a vrrp_script called apiserver and use curl as a test command. In this case I’m checking the K8S Api on Port 6443

vrrp_script apiserver {
	script  "/usr/bin/curl -s -k https://localhost:6443/healthz -o /dev/null"
	interval 5	# check every 5 seconds
	timeout  2	# fail if it takes longer than 2 seconds
	rise     5	# claim master state after 5 successful checks
	fall     2	# remove master state after 2 failed checks
	user     root	# execute checks as root
}

vrrp_instance ServerCenter {
        state MASTER
        interface ens192
        virtual_router_id 55
        priority 30
        advert_int 1
        unicast_src_ip 10.0.0.246
        unicast_peer {
                10.0.1.4
                10.0.1.51
        }
        authentication {
                auth_type PASS
                auth_pass svcentPass
        }
        virtual_ipaddress {
                10.0.0.191/22
        }
        track_script {
                apiserver 	# use the script from above
        }
}

To activate it we need to reload the service.

$ systemctl reload keepalived

Now your service should be highly available and if one node would fail the other could take other.

profile picture of the author

Jannik Rehkemper

I'm an professional Linux Administrator and Hobby Programmer. My training as an IT-Professional started in 2019 and ended in 2022. Since 2023 I'm working as an Linux Administrator.