Keepalived Loadbalancer

Introduction

This page contains a basic description about how to set up keepalived.

The example on this page is specific to debian. However, it should be possible to adjust it to other distributions.

Keepalived mode is Direct Routing.

It is required that the keepalived node(s) and the nodes for which it is loadbalancing on the same network segment. There must be no firewall between those hosts. Be careful: some virtualization systems do loadbalancing by themselves.

For more information please see: www.keepalived.org or the man keepalived.conf (which is much more helpful than the web pages).

Some terminology:

  • The keepalived node(s) are called keepalived node or loadbalancer node.
  • The nodes keepalived is loadbalancing for, for example OX nodes or Galera nodes, are called server nodes.

Software installation on the keepalived node

Packages are installed using

# apt-get install keepalived 

Keepalived requires some kernel modules to be loaded. They are loaded by the ipvsadmm service. So we enable it using dpkg-reconfigure:

# dpkg-reconfigure ipvsadm

Answer the questions with "Yes" ("load ... at boot") and then "backup" for "Daemon method".

Enable IP forwarding on the keepalived node: configure in /etc/sysctl.conf:

net.ipv4.ip_forward=1

Enable this by either rebooting or by issuing sysctl -w net.ipv4.ip_forward=1.

Networking adjustments on the server nodes

The server nodes need the loadbalancer IP configured on some network device in order for the server processes to be able to bind on this device.

However, in the case of Galera, creating a fully configured "alias" device is bad, since the Galera nodes will pick the loadbalancer IP as primary IP of the node for example for full state transfers (SST). So when trying a SST the Galera nodes will try to connect to the loadbalancer on the SST port. This will fail because on the loadbalancer nothing listens on the SST port.

If we instead create a dummy device and only assign an IP to it (without setting all those flags like UP), then galera can bind to the IP, but it won't use the IP as its primary IP. A configuration like this can be created using the following trick. Ad dsome pre-up, post-up, pre-down, post-down lines to the /etc/network/interfaces file as follows:

allow-hotplug eth0
iface eth0 inet dhcp
    pre-up echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
    pre-up echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
    post-up ip addr add 10.20.29.174/32 dev dummy0
    pre-down ip addr del 10.20.29.174/32 dev dummy0
    post-down echo 0 > /proc/sys/net/ipv4/conf/all/arp_ignore
    post-down echo 0 > /proc/sys/net/ipv4/conf/all/arp_announce

Here, 10.20.29.174 is the loadbalancer IP. Adjust to your environment.

Configuration example: HTTP

Keepalived configuration file

Create a file
/etc/keepalived/keepalived.conf
with following contend (adapt network adresses)
global_defs {
    router_id OX
}

vrrp_sync_group OX_GROUP {
    group {
        OX_GOUP
    }
}

vrrp_instance OX_VRRP {
    state BACKUP
    interface eth0
    garp_master_delay 10
    virtual_router_id 10
    priority 101
    nopreempt
    advert_int 1
    authentication {
        auth_type AH   # Simple 'PASS' can use
        auth_pass 1234 # example password '1234' 
    }
    virtual_ipaddress {
        10.20.30.77/24 brd 10.20.30.255 dev eth0 # virtual service ip 10.20.30.67
    }
    virtual_ipaddress_excluded {
    }
}

virtual_server_group OX_HTTP {
        10.20.30.77 80         # virtual ip and port 80
}

virtual_server_group OX_OL_PUSH {
        10.20.30.77 44335      # VIP VPORT
}

virtual_server group OX_HTTP {
    delay_loop 3
    lvs_sched  rr
    lvs_method DR
    protocol   TCP
    virtualhost 10.20.30.77

    real_server 10.20.30.123 80 {
        weight 1
        inhibit_on_failure
        HTTP_GET {
            url {
                path /servlet/TestServlet
                status_code 200
            } 
            connect_port 80
            connect_timeout 10
        }
    }

    real_server 10.20.30.321 80 {
        weight 1
        inhibit_on_failure
        HTTP_GET {
            url {
                path /servlet/TestServlet
                status_code 200
            }
            connect_port 80
            connect_timeout 10
        }
    } 
}

virtual_server group OX_OL_PUSH {
    delay_loop 3
    lvs_sched  rr
    lvs_method DR
    protocol   UDP

    real_server 10.20.30.123 44335 {
        weight 1
        inhibit_on_failure
	  TCP_CHECK {
                 connect_port 9999
		  connect_timeout 5
        }
    }

    real_server 10.20.30.321 44335 {
        weight 1
        inhibit_on_failure
        TCP_CHECK {
                 connect_port 9999
		  connect_timeout 5
        }
    }
}

For the client nodes: the server nodes networking adjustments from the previos section has not been tested with this configuration. It should be working. If not, take a look at the old version of the networking configuration.

Configuration example: Keepalived for Galera Loadbalancing

Keepalived configuration

In this example we have the following networking information:

  • loadbalancer IP
    • 10.20.29.174 as round-robin for the read requests
    • 10.20.29.175 as active-passive for the write requests
  • Three galera nodes: 10.20.29.140, 10.20.29.142, 10.20.29.138

Then the keepalived configuration file /etc/keepalived/keepalived.conf looks as follows:

global_defs {
  # This should be unique.
  router_id galera-lb
}

vrrp_instance mysql_pool {
  # The interface we listen on.
  interface eth0

  # The default state, one should be master, the others should be set to SLAVE.
  state MASTER
  priority 101

  # This should be the same on all participating load balancers.
  virtual_router_id 19

  # Set the interface whose status to track to trigger a failover.                   
  track_interface {           
    eth0
  }

  # Password for the loadbalancers to share.
  authentication {
    auth_type PASS
    auth_pass Twagipmiv3
  }

  # This is the IP address that floats between the loadbalancers.
  virtual_ipaddress {
   10.20.29.174/32 dev eth0
   10.20.29.175/32 dev eth0
  }
}

# Here we add the virtual mysql read node
virtual_server 10.20.29.174 3306 {
  delay_loop 6
  # Round robin, but you can use whatever fits your needs.
  lb_algo rr

  lb_kind DR
  protocol TCP

  # For each server add the following. 
  real_server 10.20.29.140 3306 {
    weight 10
    MISC_CHECK {
      misc_path "/etc/keepalived/galera-checker.pl 10.20.29.140"
      misc_timeout 5
    }
  }
  real_server 10.20.29.142 3306 {
    weight 10
    MISC_CHECK {
      misc_path "/etc/keepalived/galera-checker.pl 10.20.29.142"
      misc_timeout 5
    }
  }
  real_server 10.20.29.138 3306 {
    weight 10
    MISC_CHECK {
      misc_path "/etc/keepalived/galera-checker.pl 10.20.29.138"
      misc_timeout 5
    }
  }

# Here we add the virtual mysql write node
virtual_server 10.20.29.175 3306 {
  delay_loop 6
  # Round robin, but you can use whatever fits your needs.
  lb_algo rr
  # the following two options implement that active-passive behavior
  persistence_timeout 1800
  # make sure all OX nodes are included in that netmask
  persistence_granularity 255.255.255.0
   
  lb_kind DR
  protocol TCP

  # For each server add the following. 
  real_server 10.20.29.140 3306 {
    weight 10
    MISC_CHECK {
      misc_path "/etc/keepalived/galera-checker.pl 10.20.29.140"
      misc_timeout 5
    }
  }
  real_server 10.20.29.142 3306 {
    weight 10
    MISC_CHECK {
      misc_path "/etc/keepalived/galera-checker.pl 10.20.29.142"
      misc_timeout 5
    }
  }
  real_server 10.20.29.138 3306 {
    weight 10
    MISC_CHECK {
      misc_path "/etc/keepalived/galera-checker.pl 10.20.29.138"
      misc_timeout 5
    }
  }
}

Here we have configured a galera-specific node health checker. This is a custom perl script which requires some perl module for DB access:

# apt-get install libdbd-mysql-perl

The script is expected in /etc/keepalived/galera-checker.pl and looks like this:

#!/usr/bin/perl

# dominik.epple@open-xchange.com, 2013-06-10

use strict;
use warnings;

#
# config section
#
our $username="some_db_user";
our $password="some_db_pass";
our $debug=0;

our %checks=(
  #"wsrep_cluster_size" => "3",
  "wsrep_ready" => "ON",
  "wsrep_local_state" => "4" # Synced
);
#
# config section end
#

our $host=$ARGV[0] or die "usage: $0 <IP of galera node>"; 

use DBI;
our $dbh = DBI->connect("DBI:mysql:;host=$host", $username, $password
                   ) || die "Could not connect to database: $DBI::errstr";

our $results = $dbh->selectall_hashref("show status like '%wsrep%'", 'Variable_name') or die "Error trying to selectall_hashref";

our %cr=();

foreach my $id (keys %$results) {
  $::cr{$id}=$results->{$id}->{"Value"};
}

$dbh->disconnect();

for my $k (keys %checks) {
  if(exists $::cr{$k}) {
    if($::checks{$k} ne $::cr{$k}) {
      print STDERR "$0: warning: mismatch in $k: expected $::checks{$k}, got $::cr{$k}\n";
      exit(1);
    }
    else {
      print STDERR "$0: info: match in $k: expected $::checks{$k}, got $::cr{$k}\n" if($::debug);
    }
  }
  else {
    print STDERR "$0: warning: no check result for $k (want $::checks{$k})\n";
  }
}

exit(0);

There is also a /usr/bin/clustercheck script shipped with some Galera flavors. The scenario to use that (and described in our HAproxy page) is to wrap it via xinetd to implement a webservice which can be queried from keepalived using a HTTP_GET directive. Sounds more elegant -- but works not -- keepalived cannot handle the script's output and reports an error like "Keepalived_healthcheckers: Read error with server [10.20.29.210]:9200: Connection reset by peer".

Using tunneling (lb_kind TUN) instead of direct routing (lb_kind DR)

Assuming you have configured keepalived in DR mode as described above, the following changes are required for TUN mode instead of DR mode.

On the keepalived nodes, change in the keepalived config file lb_kind to TUN. Restart keepalived (if running).

On the server nodes, the networking adjustments need to be adjusted. We now configure a tun0 tunnel device instead of a dummy0 device. Additionally, since now there is traffic over this interface (compared to the previous situation, where the dummy0 device was only configured for MySQL to see its IP), we need to set the link to UP there.

So in the summary, the /etc/network/interfaces file on the server nodes needs to look like this:

allow-hotplug eth0
iface eth0 inet dhcp
    pre-up echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
    pre-up echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
    post-up ip link set up tunl0
    post-up ip addr add 10.20.29.174/32 dev tunl0
    pre-down ip addr del 10.20.29.174/32 dev tunl0
    pre-down ip link set down tunl0
    post-down echo 0 > /proc/sys/net/ipv4/conf/all/arp_ignore
    post-down echo 0 > /proc/sys/net/ipv4/conf/all/arp_announce

Adding a second Keepalived node for redundancy

This is optional.

With a single keepalived node we have a single point of failure. It is possible to add a second keepalived node which is communicating with the first keepalived node and transition from a backup state to master state upon failure of the first node.

This is tested with Galera.

To set up a second keepalived node as described above, create a keepalived node identical to the first one, with the following changes to the configuration file /etc/keepalived/keepalived.conf:

  • Change the router_id (to the hostname, for example)
  • Change the state to BACKUP
  • Change the priority to something lower than the masters priority (e.g. 100)

Make sure the virtual_router_id and authentication information is the same on the backup keepalived node as on the master keepalived node.

Now the backup node will notice the master going down and take over. Automatic failback also happens.

Keepalived monitoring

ipvsadm -Ln -t $LOADBALANCER_IP:$LOADBALANCER_PORT
ipvsadm -Ln -t $LOADBALANCER_ip:$LOADBALANCER_PORT --stats
ipvsadm -Ln -t $LOADBALANCER_IP:$LOADBALANCER_PORT --rate