Setting up IP failover with Heartbeat and Pacemaker on Ubuntu Lucid


One of Zivtech's clients recently asked us to renovate their server setup, with a focus on improving its availability.

This post will cover the highly available load balancer setup that currently serves 7 sites, with another 4 in the pipeline. Basically, we want to run the sites through two Linux-based software load balancer boxes, and be able to keep the sites up even if one of them fails.

The ingredients:

  • Two Ubuntu Lucid boxes, standard stripped down server install. We'll call them Bart and Lisa, because that's how server naming rolls at Zivtech
  • Heartbeat
  • Pacemaker
  • A service IP per site (an IP address that is not the primary IP of the box it lives on, so it can be moved)

Heartbeat and Pacemaker are the real stars here. Pacemaker is the "brains" of the two, providing the high level control of our little cluster, while Heartbeat does the dirty work - it provides the low level inter-node communication and command framework.

Getting started with Heartbeat and Pacemaker

All of the packages we need are available in Ubuntu repositories, so we can just apt-get them on both boxes.

root@bart:~# apt-get install heartbeat pacemaker

The first place to go for configuration information is the The Linux-HA User's Guide, followed by the excellent Pacemaker configuration website.

The first step is to create an initial Heartbeat configuration and auth key on one of the boxes and copy it to the other, and restart the service.
root@bart:~# cat /etc/ha.d/
autojoin none
bcast eth0
warntime 3
deadtime 6
initdead 60
keepalive 1
crm respawn
root@bart:~# scp /etc/ha.d/ lisa:/etc/ha.d/
root@bart:~# ( echo -ne "auth 1\n1 sha1 "; \
dd if=/dev/urandom bs=512 count=1 | openssl md5 ) \
> /etc/ha.d/authkeys
root@bart:~# chmod 0600 /etc/ha.d/authkeys
root@bart:~# scp /etc/ha.d/authkeys lisa:/etc/ha.d/
root@bart:~# ssh lisa chmod 0600 /etc/ha.d/authkeys
root@bart:~# /etc/init.d/heartbeat restart
root@bart:~# ssh lisa /etc/init.d/heartbeat restart

If all went well, both nodes of the cluster should be up and know about each other. To check, use one of the insanely helpful command line tools that come with the packages, crm_mon.

root@bart:~# crm_mon -1 | grep Online
Online: [ ]

Ok, now we move on to setting up the IP addresses we would like Hearbeat and Pacemaker to keep available. There are a few ways to change the configuration of your cluster, but the easiest one is to use the crm tool (go here for the excellent documentation). Note, this is the cluster configuration, not the configuration for a single node - you only need to update this on one node, and the changes will propagate automatically.

To Pacemaker, an IP address is a resource (and it can handle many different types of resources, but that's beyond the scope of this post). Assuming we have two sites, with IP addresses of and respectively, we'd tell Pacemaker about them like this:

root@bart:~# crm configure
crm(live)configure# primitive site_one_ip IPaddr params ip= cidr_netmask="" nic="eth0"
crm(live)configure# primitive site_two_ip IPaddr params ip= cidr_netmask="" nic="eth0"
crm(live)configure# commit
crm(live)configure# exit

To check that your configuration is what you think it is as you go, you can use crm configure show. Right now, your configuration should look similar to this:

root@lisa:~# crm configure show
node $id="9f5e6cd6-2b75-4445-8d75-43c1a34fe431"
node $id="fb502469-691b-4021-b504-7da7a188bc63"
primitive site_one_ip ocf:heartbeat:IPaddr \
params ip="" cidr_netmask="" nic="eth0"
primitive site_two_ip ocf:heartbeat:IPaddr \
params ip="" cidr_netmask="" nic="eth0"

The next thing we want to do is tell Pacemaker on which node we'd prefer the IPs to live when both nodes are up. We do this by setting a "location" for each IP resource. In the following configuration, we tell Pacemaker to keep one IP on each node when they are both up, which is a fairly typical setup for a pair of load balancers:

root@bart:~# crm configure
crm(live)configure# location site_one_ip_pref site_one_ip 100:
crm(live)configure# location site_two_ip_pref site_two_ip 100:
crm(live)configure# commit
crm(live)configure# exit

Checking the configuration, we should see two lines like this added to our output of crm configure show:

location site_one_ip_pref site_one_ip 100:
location site_two_ip_pref site_two_ip 100:

Nearly there! Now that Pacemaker knows about our IP address resources, and where they should live, we can setup monitoring, which is the real meat of this exercise:

root@bart:~# crm configure
crm(live)configure# monitor site_one_ip 40s:20s
crm(live)configure# monitor site_two_ip 40s:20s
crm(live)configure# commit
crm(live)configure# exit

Done! Now, to test your setup, turn off/pull out the network cable from one of your nodes, and watch the other one pick up the IPs. Connect the the node again, and Pacemaker will honour your preferences, and send the failed-over IP address back to the original node.

Now that we have highly available load-balancing layer, the next post in this series will show you how to setup Apache to balance requests between backend servers running PHP and Drupal.

Ready to get started?

Tell us about your project