Keepalived: Difference between revisions
No edit summary |
|||
Line 3: | Line 3: | ||
== Introduction == | == Introduction == | ||
This page contains a basic description about how to set up a LVS (Linux Virtual Server) / ipvsadm / keepalived based loadbalancer for MySQL (Galera) loadbalancing. | This page contains a basic description about how to set up a LVS (Linux Virtual Server) / <code>ipvsadm</code> / <code>keepalived</code> based loadbalancer for MySQL (Galera) loadbalancing. | ||
While the setup is more involved that simple user-space daemons and suffers from more constraints / requirements, the resulting solution is the cleanest with regards to high level design, most robust and best performing MySQL loadbalancing solution we are aware of. | While the setup is more involved that simple user-space daemons and suffers from more constraints / requirements, the resulting solution is the cleanest with regards to high level design, most robust and best performing MySQL loadbalancing solution we are aware of. | ||
Line 9: | Line 9: | ||
The instructions on this page have been worked out and tested on Debian (latest verified version: 8.9). It should be possible to transfer this information to other distributions / versions. | The instructions on this page have been worked out and tested on Debian (latest verified version: 8.9). It should be possible to transfer this information to other distributions / versions. | ||
LVS is a linux kernel module and has been included the mainline kernel since roughly 2.4.something in 2003 | LVS is a linux kernel module and has been included the mainline kernel since roughly 2.4.something in 2003 (see http://www.linuxvirtualserver.org). Most documentation available seems very outdated, however this code is part of the standard upstream linux kernel and as such perfectly maintained. It is, however, tricky to find recent reference documentation or howtos. | ||
The project homepage http://www.linuxvirtualserver.org/ has some applicable information, in particular on | The project homepage http://www.linuxvirtualserver.org/ has some applicable information, in particular on the wiki. There also exists a HOWTO http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/index.html which has proven useful while writing this article. But above all, consult the manpages for <code>ipvsadm</code>, <code>keepalived</code> and the references therein; they are up to date and precise. | ||
Some terminology: | Some terminology: | ||
* The keepalived node(s) are called ''keepalived node'' or ''loadbalancer node''. | * The <code>keepalived</code> node(s) are called ''keepalived node'' or ''loadbalancer node''. | ||
* The nodes | * The nodes the loadbalancer node is loadbalancing for, for example OX nodes or Galera nodes, are called ''server nodes'' or ''database nodes''. | ||
=== High Level Design === | === High Level Design === | ||
Line 22: | Line 22: | ||
The solution consists of several components. | The solution consists of several components. | ||
Main component is some kernel modules which implements the real loadbalancing / | Main component is some kernel modules which implements the real loadbalancing / forwarding functionality (<code>ip_vs</code>, <code>ip_vs_rr</code>, and some more). | ||
There is a command line tool to manage the loadbalancing konfiguration of the kernel called ipvsadm. | There is a command line tool to manage the loadbalancing konfiguration of the kernel called <code>ipvsadm</code>. | ||
It is possible to run an ipvsadm daemon which allows synchronization of connection states to a standby / slave ipvsadm/LVS instance, so that on failover "most" connections can keep intact. This is out of scope of this document. It is mentioned here to be aware of it and to not confuse it with the keepalived daemon (see below). | It is possible to run an ipvsadm daemon which allows synchronization of connection states to a standby / slave ipvsadm/LVS instance, so that on failover "most" connections can keep intact. This is out of scope of this document. It is mentioned here to be aware of it and to not confuse it with the <code>keepalived<code> daemon (see below). | ||
LVS | A LVS/ipvsadm loadbalancer can run standalone, i.e. without further "management" software ontop. This is helpful in setup and testing. However for production it lacks the functionality to health-check the loadbalancing targets (i.e. database servers) and adjust the loadbalancer tables accordingly. To do this, a separate user-space instance / daemon is required, and this is the functionality provided by keepalived. | ||
=== Routing methods === | === Routing methods === | ||
LVS provides several modes of routing. We will describe here Direct Routing (DR) and Tunneling (TUN). There are more routing methods available which might come interesting in special cases, but not covered in this document. | LVS provides several modes of routing. We will describe here Direct Routing (<code>DR</code>) and Tunneling (<code>TUN</code>). There are more routing methods available which might come interesting in special cases, but not covered in this document. | ||
When unsure, follow the TUN path. It seems more robust in certain environments than the DR | When unsure, follow the <code>TUN</code> path. It seems more robust in certain environments than the <code>DR</code> method. | ||
==== Direct Routing ==== | ==== Direct Routing ==== | ||
Line 40: | Line 40: | ||
Direct Routing works by replacing the target MAC in a package addressed to the loadbalancer to its virtual / loadbalancer IP with the MAC of the designated target server and re-sending it. | Direct Routing works by replacing the target MAC in a package addressed to the loadbalancer to its virtual / loadbalancer IP with the MAC of the designated target server and re-sending it. | ||
This requires the servers to accept packages for the given IP, so they need to configure the corresponding IP on some local looopback / dummy device. It must be ensured the servers do not answer ARP requests for the given IP. Otherwise there is a race condition on which server / loadbalancer ARP response will be first received by a client, leading to unwanted results. This is called | This requires the servers to accept packages for the given IP, so they need to configure the corresponding IP on some local looopback / <code>dummy</code> device. It must be ensured the servers do not answer ARP requests for the given IP. Otherwise there is a race condition on which server / loadbalancer ARP response will be first received by a client, leading to unwanted results. This is called ''the ARP problem'' in the documentation and there are given many possible solutions; however with current kernels the method explained below works reliably. | ||
Response packages are sent directly from the server to the client, thus they don't go through the loadbalancer, but appear to come from a source where the source IP does not match the MAC address. | Response packages are sent directly from the server to the client, thus they don't go through the loadbalancer, but appear to come from a source where the source IP does not match the MAC address. | ||
Line 50: | Line 50: | ||
The tunneling method works by the loadbalancer encapsulating the package in an IPIP tunneling package and sending it to the corresponding server. | The tunneling method works by the loadbalancer encapsulating the package in an IPIP tunneling package and sending it to the corresponding server. | ||
It also requires that the servers have configured the virtual / loadbalancer IP locally, but here on a | It also requires that the servers have configured the virtual / loadbalancer IP locally, but here on a <code>tunl</code> device. We have to cover the same ''ARP Problem'' as explained in the Direct Routing section above, with the same solution. We also have the situation that answers are going directly from the servers to the clients, not passing through the loadbalancer. | ||
The Tunneling method generally works better in modern virtualized / cloud environments. | The Tunneling method generally works better in modern virtualized / cloud environments. | ||
Line 64: | Line 64: | ||
# apt-get install keepalived | # apt-get install keepalived | ||
This will install the required dependencies like ipvsadm etc. | This will install the required dependencies like <code>ipvsadm</code> etc. | ||
Contrary to earlier Debian distros, currently there is no requirement to configure any special service (yet) for loading kernel modules and such. In older Debian versions (like Squeeze) some /etc/default/{ipvsadm,keepalived} files needed some tweaking to leverage kernel module loading (which seemed to fail automatically). This is currently no longer true; if working on an old (historical!) Debian version, you may have to investigate | Contrary to earlier Debian distros, currently there is no requirement to configure any special service (yet) for loading kernel modules and such. In older Debian versions (like Squeeze) some <code>/etc/default/{ipvsadm,keepalived}</code> files needed some tweaking to leverage kernel module loading (which seemed to fail automatically). This is currently no longer true; if working on an old (historical!) Debian version, you may have to investigate here. | ||
Also not required, but claimed somewhere is to configure IPv4 forwarding. If experimenting with other routing methods, this may become required; | Also not required, but claimed somewhere is to configure IPv4 forwarding. If experimenting with other routing methods, this may become required; it is not required with <code>DR</code> or <code>TUN</code>. | ||
== Configuration == | == Configuration == | ||
Line 103: | Line 103: | ||
ip addr add 10.0.0.11/32 brd 10.0.0.11 dev dummy0 | ip addr add 10.0.0.11/32 brd 10.0.0.11 dev dummy0 | ||
Then, you solve the | Then, you solve the ''ARP Problem'' by | ||
echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore | echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore | ||
Line 115: | Line 115: | ||
ip addr add 10.0.0.11/32 dev eth0 | ip addr add 10.0.0.11/32 dev eth0 | ||
Then the loadbalancer endpoints themselves can be configured with ipvsadm: | Then the loadbalancer endpoints themselves can be configured with </code>ipvsadm</code>: | ||
# For TUN | # For TUN | ||
Line 160: | Line 160: | ||
-> 10.0.0.3:mysql Tunnel 10 0 0 | -> 10.0.0.3:mysql Tunnel 10 0 0 | ||
Note: to stop / start over, use ipvsadm -C. | Note: to stop / start over, use <code>ipvsadm -C</code>. | ||
Note: you can use ipvsadm -S / ipvsadm -R for easier iterative testing (see manpage). | Note: you can use <code>ipvsadm -S</code> / <code>ipvsadm -R</code> for easier iterative testing (see manpage). | ||
# ipvsadm -S | # ipvsadm -S | ||
Line 202: | Line 202: | ||
* Remember you need to restart the MySQL server after networking adjustments | * Remember you need to restart the MySQL server after networking adjustments | ||
* Try to use tcpdump to find out on which node (loadbalancer or server) your packages actually arrive | * Try to use <code>tcpdump</code> to find out on which node (loadbalancer or server) your TCP packages actually arrive | ||
* Use arp -a to verify the server nodes did not advertise the virtual IP addresses with their MAC | * Use <code>arp -a</code> to verify the server nodes did not advertise the virtual IP addresses with their MAC | ||
* Verify the usual candidates like iptables (off by default Debian; may vary in your installation), selinux/apparmor (if using SLES or RHEL), additional firewalls are not spoiling your testing | * Verify the usual candidates like <code>iptables</code> (off by default Debian; may vary in your installation), <code>selinux/apparmor</code> (if using SLES or RHEL), additional firewalls are not spoiling your testing | ||
Please verify the manual setup before proceeding to the persistent / production configuration. | Please verify the manual setup before proceeding to the persistent / production configuration. | ||
Line 212: | Line 212: | ||
==== Networking adjustments on the server nodes ==== | ==== Networking adjustments on the server nodes ==== | ||
It is possible to attach the configuration to /etc/network/interfaces: | It is possible to attach the configuration to <code>/etc/network/interfaces</code>: | ||
# TUN example | # TUN example | ||
Line 250: | Line 250: | ||
==== Keepalived configuration (health checks skipped) ==== | ==== Keepalived configuration (health checks skipped) ==== | ||
Note: keepalived will manage the secondary IPs, so no need to hard-wire them in /etc/network/interfaces or alike. Rather, deconfigure any potentially manually configured seconday IPs from previous manual testing. | Note: keepalived will manage the secondary IPs, so no need to hard-wire them in <code>/etc/network/interfaces</code> or alike. Rather, deconfigure any potentially manually configured seconday IPs from previous manual testing. | ||
Create a config file /etc/keepalived/keepalived.conf for basic functionality testing like | Create a config file <code>/etc/keepalived/keepalived.conf</code> for basic functionality testing like | ||
global_defs { | global_defs { | ||
Line 335: | Line 335: | ||
} | } | ||
The file should be self-explaining if you followed the manual configuration explanations above. The only unexpected things are directives like state, priority which will be explained below for multi-keepalived-setup. | The file should be self-explaining if you followed the manual configuration explanations above. The only unexpected things are directives like <code>state</code>, <code>priority</code> which will be explained below for multi-keepalived-setup. | ||
The example has been using TUN; for DR, just replace TUN by DR in the virtual_server definitions. | The example has been using <code>TUN</code>; for <code>DR</code>, just replace <code>TUN</code> by <code>DR</code> in the <code>virtual_server</code> definitions. | ||
After a <code>service keepalived restart</code> you should be able to execute the same client connectivity tests as shown above. (Remember to cleanly unconfigure your manual setup before in order to not measure false success.) | After a <code>service keepalived restart</code> you should be able to execute the same client connectivity tests as shown above. (Remember to cleanly unconfigure your manual setup before in order to not measure false success.) | ||
Line 343: | Line 343: | ||
==== Keepalived configuration (with health checks) ==== | ==== Keepalived configuration (with health checks) ==== | ||
We can configure health checks in keepalived.conf: | We can configure health checks in <code>keepalived.conf</code>: | ||
global_defs { | global_defs { | ||
Line 450: | Line 450: | ||
} | } | ||
The config file is the same as before, but for the MISC_CHECK sections. | The config file is the same as before, but for the <code>MISC_CHECK</code> sections. | ||
We now need such a check.pl script which checks the galera replication status and exits 0 if fine and exits 1 if not. | We now need such a <code>check.pl</code> script which checks the galera replication status and exits 0 if fine and exits 1 if not. | ||
Such a sample script is given below. Consult your DBAs on the checks to be performed, i.e. when should a node be considered "available" and when not. | Such a sample script is given below. Consult your DBAs on the checks to be performed, i.e. when should a node be considered "available" and when not. | ||
Line 527: | Line 527: | ||
To set up a second keepalived node as described above, create a keepalived node identical to the first one, with the following changes to the configuration file <code>/etc/keepalived/keepalived.conf</code>: | To set up a second keepalived node as described above, create a keepalived node identical to the first one, with the following changes to the configuration file <code>/etc/keepalived/keepalived.conf</code>: | ||
* Change the router_id (to the hostname, for example) | * Change the <code>router_id</code> (to the hostname, for example) | ||
* Change the state to BACKUP | * Change the <code>state</code> to <code>BACKUP</code> | ||
* Change the priority to something lower than the masters priority (e.g. 100) | * Change the <code>priority</code> to something lower than the masters priority (e.g. <code>100</code>) | ||
Make sure the virtual_router_id and authentication information is the same on the backup keepalived node as on the master keepalived node. | Make sure the <code>virtual_router_id</code> and authentication information is the same on the backup keepalived node as on the master keepalived node. | ||
Now the backup node will notice the master going down and take over. Automatic failback also happens. | Now the backup node will notice the master going down and take over. Automatic failback also happens. | ||
Keepalived will automatically manage the secondary IPs, so no need for any additional clustering software like corosync/pacemaker etc. | Keepalived will automatically manage the secondary IPs, so no need for any additional clustering software like <code>corosync/pacemaker</code> etc. | ||
== Keepalived monitoring == | == Keepalived monitoring == |
Revision as of 11:49, 20 September 2017
Keepalived Loadbalancer
Introduction
This page contains a basic description about how to set up a LVS (Linux Virtual Server) / ipvsadm
/ keepalived
based loadbalancer for MySQL (Galera) loadbalancing.
While the setup is more involved that simple user-space daemons and suffers from more constraints / requirements, the resulting solution is the cleanest with regards to high level design, most robust and best performing MySQL loadbalancing solution we are aware of.
The instructions on this page have been worked out and tested on Debian (latest verified version: 8.9). It should be possible to transfer this information to other distributions / versions.
LVS is a linux kernel module and has been included the mainline kernel since roughly 2.4.something in 2003 (see http://www.linuxvirtualserver.org). Most documentation available seems very outdated, however this code is part of the standard upstream linux kernel and as such perfectly maintained. It is, however, tricky to find recent reference documentation or howtos.
The project homepage http://www.linuxvirtualserver.org/ has some applicable information, in particular on the wiki. There also exists a HOWTO http://www.austintek.com/LVS/LVS-HOWTO/HOWTO/index.html which has proven useful while writing this article. But above all, consult the manpages for ipvsadm
, keepalived
and the references therein; they are up to date and precise.
Some terminology:
- The
keepalived
node(s) are called keepalived node or loadbalancer node. - The nodes the loadbalancer node is loadbalancing for, for example OX nodes or Galera nodes, are called server nodes or database nodes.
High Level Design
The solution consists of several components.
Main component is some kernel modules which implements the real loadbalancing / forwarding functionality (ip_vs
, ip_vs_rr
, and some more).
There is a command line tool to manage the loadbalancing konfiguration of the kernel called ipvsadm
.
It is possible to run an ipvsadm daemon which allows synchronization of connection states to a standby / slave ipvsadm/LVS instance, so that on failover "most" connections can keep intact. This is out of scope of this document. It is mentioned here to be aware of it and to not confuse it with the keepalived
daemon (see below).
A LVS/ipvsadm loadbalancer can run standalone, i.e. without further "management" software ontop. This is helpful in setup and testing. However for production it lacks the functionality to health-check the loadbalancing targets (i.e. database servers) and adjust the loadbalancer tables accordingly. To do this, a separate user-space instance / daemon is required, and this is the functionality provided by keepalived.
Routing methods
LVS provides several modes of routing. We will describe here Direct Routing (DR
) and Tunneling (TUN
). There are more routing methods available which might come interesting in special cases, but not covered in this document.
When unsure, follow the TUN
path. It seems more robust in certain environments than the DR
method.
Direct Routing
Direct Routing works by replacing the target MAC in a package addressed to the loadbalancer to its virtual / loadbalancer IP with the MAC of the designated target server and re-sending it.
This requires the servers to accept packages for the given IP, so they need to configure the corresponding IP on some local looopback / dummy
device. It must be ensured the servers do not answer ARP requests for the given IP. Otherwise there is a race condition on which server / loadbalancer ARP response will be first received by a client, leading to unwanted results. This is called the ARP problem in the documentation and there are given many possible solutions; however with current kernels the method explained below works reliably.
Response packages are sent directly from the server to the client, thus they don't go through the loadbalancer, but appear to come from a source where the source IP does not match the MAC address.
In addition to the requirement to be able to configure addtional "secondary" IPs on the involved machines, this method also requires that no involved networking component (routers, virtualization hypervisors, etc) discard packages which seem "forged" (like, IPs do not match MACs, etc). This is typically not a problem in "classical" networking infrastructures, but getting more and more problematic in modern virtualized / cloud infrastructures.
Tunneling
The tunneling method works by the loadbalancer encapsulating the package in an IPIP tunneling package and sending it to the corresponding server.
It also requires that the servers have configured the virtual / loadbalancer IP locally, but here on a tunl
device. We have to cover the same ARP Problem as explained in the Direct Routing section above, with the same solution. We also have the situation that answers are going directly from the servers to the clients, not passing through the loadbalancer.
The Tunneling method generally works better in modern virtualized / cloud environments.
NAT method
We have not worked out / tested some NAT based setup yet, but it sounds promising to get it working in even more restrictive cloud environments, where routers typically reject packages with mismatching IPs/MACs. Feedback welcome.
Software installation on the loadbalancer node
Packages are installed from standard repos using
# apt-get install keepalived
This will install the required dependencies like ipvsadm
etc.
Contrary to earlier Debian distros, currently there is no requirement to configure any special service (yet) for loading kernel modules and such. In older Debian versions (like Squeeze) some /etc/default/{ipvsadm,keepalived}
files needed some tweaking to leverage kernel module loading (which seemed to fail automatically). This is currently no longer true; if working on an old (historical!) Debian version, you may have to investigate here.
Also not required, but claimed somewhere is to configure IPv4 forwarding. If experimenting with other routing methods, this may become required; it is not required with DR
or TUN
.
Configuration
The configuration examples given below assume a setup like
10.0.0.1 database server / galera node 1
10.0.0.2 database server / galera node 2
10.0.0.3 database server / galera node 3
10.0.0.4 loadbalancer primary IP
10.0.0.5 database client, e.g. OX middleware node
10.0.0.10 loadbalancer virtual IP for writing (persistent routing / dedicated write node)
10.0.0.11 loadbalancer virtual IP for reading (round-robin)
Note: with DR and TUN, it is not possible to change the port numbers on routing; thus, for each loadbalancer endpoint, the loadbalancer needs an additional virtual IP. (It is not possible to configure them on different ports on the same (e.g. primary) IP of the loadbalancer.)
Manual configuration / testing
Networking adjustments on the server nodes
The server nodes need the loadbalancer virtual IP(s) configured on some network device in order for the server processes to be able to bind on this device.
For DR, it seems natural to configure a dummy device. For TUN, you need a tunl device.
For testing, you can do it manually on the given nodes:
# for TUN
ip link set up tunl0
ip addr add 10.0.0.10/32 brd 10.0.0.10 dev tunl0
ip addr add 10.0.0.11/32 brd 10.0.0.11 dev tunl0
# for DR
ip addr add 10.0.0.10/32 brd 10.0.0.10 dev dummy0
ip addr add 10.0.0.11/32 brd 10.0.0.11 dev dummy0
Then, you solve the ARP Problem by
echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore
echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce
Loadbalancer
The loadbalancer also needs the virtual IPs configured as secondary IPs:
ip addr add 10.0.0.10/32 dev eth0
ip addr add 10.0.0.11/32 dev eth0
:
Then the loadbalancer endpoints themselves can be configured with
ipvsadm
# For TUN # Round-Robin / read instance /sbin/ipvsadm -A -t 10.0.0.10:3306 -s rr /sbin/ipvsadm -a -t 10.0.0.10:3306 -r 10.0.0.1 -i -w 10 /sbin/ipvsadm -a -t 10.0.0.10:3306 -r 10.0.0.2 -i -w 10 /sbin/ipvsadm -a -t 10.0.0.10:3306 -r 10.0.0.3 -i -w 10 # Persistent / write instance /sbin/ipvsadm -A -t 10.0.0.11:3306 -s rr -p 86400 -M 0.0.0.0 /sbin/ipvsadm -a -t 10.0.0.11:3306 -r 10.0.0.1 -i -w 10 /sbin/ipvsadm -a -t 10.0.0.11:3306 -r 10.0.0.2 -i -w 10 /sbin/ipvsadm -a -t 10.0.0.11:3306 -r 10.0.0.3 -i -w 10
# For DR # Round-Robin / read instance /sbin/ipvsadm -A -t 10.0.0.10:3306 -s rr /sbin/ipvsadm -a -t 10.0.0.10:3306 -r 10.0.0.1 -g -w 10 /sbin/ipvsadm -a -t 10.0.0.10:3306 -r 10.0.0.2 -g -w 10 /sbin/ipvsadm -a -t 10.0.0.10:3306 -r 10.0.0.3 -g -w 10 # Persistent / write instance /sbin/ipvsadm -A -t 10.0.0.11:3306 -s rr -p 86400 -M 0.0.0.0 /sbin/ipvsadm -a -t 10.0.0.11:3306 -r 10.0.0.1 -g -w 10 /sbin/ipvsadm -a -t 10.0.0.11:3306 -r 10.0.0.2 -g -w 10 /sbin/ipvsadm -a -t 10.0.0.11:3306 -r 10.0.0.3 -g -w 10
Note: you need to restart the MySQL service after the networking adjustments; otherwise, the MySQL daemon will not accept packages with the virtual IP as target IP. This has caused a lot of wasted time to quite some people.
Note: if lazy, you can test with one server node, and extend the configuration later to all three nodes.
Note: to view the current LVS configuration, use
# ipvsadm -L IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 10.0.0.10:mysql rr -> 10.0.0.1:mysql Tunnel 10 0 0 -> 10.0.0.2:mysql Tunnel 10 0 0 -> 10.0.0.3:mysql Tunnel 10 0 0 TCP 10.0.0.11:mysql rr persistent 86400 -> 10.0.0.1:mysql Tunnel 10 0 0 -> 10.0.0.2:mysql Tunnel 10 0 0 -> 10.0.0.3:mysql Tunnel 10 0 0
Note: to stop / start over, use ipvsadm -C
.
Note: you can use ipvsadm -S
/ ipvsadm -R
for easier iterative testing (see manpage).
# ipvsadm -S -A -t 10.0.0.10:mysql -s rr -a -t 10.0.0.10:mysql -r 10.0.0.1:mysql -i -w 10 -a -t 10.0.0.10:mysql -r 10.0.0.2:mysql -i -w 10 -a -t 10.0.0.10:mysql -r 10.0.0.3:mysql -i -w 10 -A -t 10.0.0.11:mysql -s rr -p 86400 -a -t 10.0.0.11:mysql -r 10.0.0.1:mysql -i -w 10 -a -t 10.0.0.11:mysql -r 10.0.0.2:mysql -i -w 10 -a -t 10.0.0.11:mysql -r 10.0.0.3:mysql -i -w 10 # ipvsadm -S > ipvsadm.conf # ipvsadm -R < ipvsadm.conf
Testing
You should be able to verify functionality then from the client / OX middleware node with something like (omitting authentication command line arguments for brevity)
# while true; do mysql -h10.0.0.10 -B -N -e "select @@hostname;"; sleep 1; done db3 db2 db1 db3 db2 db1 [...] ^C # while true; do mysql -h10.0.0.11 -B -N -e "select @@hostname;"; sleep 1; done db3 db3 db3 db3 db3 db3 [...] ^C
If it works not:
- Remember you need to restart the MySQL server after networking adjustments
- Try to use
tcpdump
to find out on which node (loadbalancer or server) your TCP packages actually arrive - Use
arp -a
to verify the server nodes did not advertise the virtual IP addresses with their MAC - Verify the usual candidates like
iptables
(off by default Debian; may vary in your installation),selinux/apparmor
(if using SLES or RHEL), additional firewalls are not spoiling your testing
Please verify the manual setup before proceeding to the persistent / production configuration.
Persistent / production configuration
Networking adjustments on the server nodes
It is possible to attach the configuration to /etc/network/interfaces
:
# TUN example # existing eth0 configuration auto eth0 iface eth0 inet static address 10.0.0.XYZ netmask 255.255.255.0 # add the following pre-up echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore pre-up echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce post-up ip link set up tunl0 post-up ip addr add 10.0.0.10/32 brd 10.0.0.10 dev tunl0 post-up ip addr add 10.0.0.11/32 brd 10.0.0.11 dev tunl0 pre-down ip addr del 10.0.0.11/32 dev tunl0 pre-down ip addr del 10.0.0.10/32 dev tunl0 pre-down ip link set down tunl0 post-down echo 0 > /proc/sys/net/ipv4/conf/all/arp_ignore post-down echo 0 > /proc/sys/net/ipv4/conf/all/arp_announce
# DR example # existing eth0 configuration auto eth0 iface eth0 inet static address 10.0.0.XYZ netmask 255.255.255.0 # add the following pre-up echo 1 > /proc/sys/net/ipv4/conf/all/arp_ignore pre-up echo 2 > /proc/sys/net/ipv4/conf/all/arp_announce post-up ip addr add 10.0.0.10/32 brd 10.0.0.10 dev dummy0 post-up ip addr add 10.0.0.11/32 brd 10.0.0.11 dev dummy0 pre-down ip addr del 10.0.0.11/32 dev dummy0 pre-down ip addr del 10.0.0.10/32 dev dummy0 post-down echo 0 > /proc/sys/net/ipv4/conf/all/arp_ignore post-down echo 0 > /proc/sys/net/ipv4/conf/all/arp_announce
Keepalived configuration (health checks skipped)
Note: keepalived will manage the secondary IPs, so no need to hard-wire them in /etc/network/interfaces
or alike. Rather, deconfigure any potentially manually configured seconday IPs from previous manual testing.
Create a config file /etc/keepalived/keepalived.conf
for basic functionality testing like
global_defs { # This should be unique. router_id galera-lb } vrrp_instance mysql_pool { # The interface we listen on. interface eth0 # The default state, one should be master, the others should be set to SLAVE. state MASTER priority 101 # This should be the same on all participating load balancers. virtual_router_id 19 # Set the interface whose status to track to trigger a failover. track_interface { eth0 } # Password for the loadbalancers to share. authentication { auth_type PASS auth_pass Twagipmiv3 } # This is the IP address that floats between the loadbalancers. virtual_ipaddress { 10.0.0.10/32 dev eth0 10.0.0.11/32 dev eth0 } } # Here we add the virtual mysql read node virtual_server 10.0.0.10 3306 { delay_loop 6 # Round robin, but you can use whatever fits your needs. lb_algo rr lb_kind TUN protocol TCP # For each server add the following. real_server 10.0.0.1 3306 { weight 10 } real_server 10.0.0.2 3306 { weight 10 } real_server 10.0.0.3 3306 { weight 10 } } # Here we add the virtual mysql write node virtual_server 10.0.0.11 3306 { delay_loop 6 # Round robin, but you can use whatever fits your needs. lb_algo rr lb_kind TUN protocol TCP # the following two options implement that active-passive behavior persistence_timeout 86400 # make sure all OX nodes are included in that netmask persistence_granularity 0.0.0.0 # For each server add the following. real_server 10.0.0.1 3306 { weight 10 } real_server 10.0.0.2 3306 { weight 10 } real_server 10.0.0.3 3306 { weight 10 } }
The file should be self-explaining if you followed the manual configuration explanations above. The only unexpected things are directives like state
, priority
which will be explained below for multi-keepalived-setup.
The example has been using TUN
; for DR
, just replace TUN
by DR
in the virtual_server
definitions.
After a service keepalived restart
you should be able to execute the same client connectivity tests as shown above. (Remember to cleanly unconfigure your manual setup before in order to not measure false success.)
Keepalived configuration (with health checks)
We can configure health checks in keepalived.conf
:
global_defs { # This should be unique. router_id galera-lb } vrrp_instance mysql_pool { # The interface we listen on. interface eth0 # The default state, one should be master, the others should be set to SLAVE. state MASTE1 priority 101 # This should be the same on all participating load balancers. virtual_router_id 19 # Set the interface whose status to track to trigger a failover. track_interface { eth0 } # Password for the loadbalancers to share. authentication { auth_type PASS auth_pass Twagipmiv3 } # This is the IP address that floats between the loadbalancers. virtual_ipaddress { 10.0.0.10/32 dev eth0 10.0.0.11/32 dev eth0 } } # Here we add the virtual mysql read node virtual_server 10.0.0.10 3306 { delay_loop 6 # Round robin, but you can use whatever fits your needs. lb_algo rr lb_kind TUN protocol TCP # For each server add the following. real_server 10.0.0.1 3306 { weight 10 MISC_CHECK { misc_path "/etc/keepalived/checker.pl 10.0.0.1" misc_timeout 5 } } real_server 10.0.0.2 3306 { weight 10 MISC_CHECK { misc_path "/etc/keepalived/checker.pl 10.0.0.2" misc_timeout 5 } } real_server 10.0.0.3 3306 { weight 10 MISC_CHECK { misc_path "/etc/keepalived/checker.pl 10.0.0.3" misc_timeout 5 } } } # Here we add the virtual mysql write node virtual_server 10.0.0.11 3306 { delay_loop 6 # Round robin, but you can use whatever fits your needs. lb_algo rr lb_kind TUN protocol TCP # the following two options implement that active-passive behavior persistence_timeout 86400 # make sure all OX nodes are included in that netmask persistence_granularity 0.0.0.0 # For each server add the following. real_server 10.0.0.1 3306 { weight 10 MISC_CHECK { misc_path "/etc/keepalived/checker.pl 10.0.0.1" misc_timeout 5 } } real_server 10.0.0.2 3306 { weight 10 MISC_CHECK { misc_path "/etc/keepalived/checker.pl 10.0.0.2" misc_timeout 5 } } real_server 10.0.0.3 3306 { weight 10 MISC_CHECK { misc_path "/etc/keepalived/checker.pl 10.0.0.3" misc_timeout 5 } } }
The config file is the same as before, but for the MISC_CHECK
sections.
We now need such a check.pl
script which checks the galera replication status and exits 0 if fine and exits 1 if not.
Such a sample script is given below. Consult your DBAs on the checks to be performed, i.e. when should a node be considered "available" and when not.
#!/usr/bin/perl # dominik.epple@open-xchange.com, 2013-06-10 use strict; use warnings; # # config section # our $username="checker"; our $password="aicHupdakek3"; our $debug=0; our %checks=( #"wsrep_cluster_size" => "3", "wsrep_ready" => "ON", "wsrep_local_state" => "4" # Synced ); # # config section end # our $host=$ARGV[0] or die "usage: $0 <IP of galera node>"; use DBI; our $dbh = DBI->connect("DBI:mysql:;host=$host", $username, $password ) || die "Could not connect to database: $DBI::errstr"; our $results = $dbh->selectall_hashref("show status like '%wsrep%'", 'Variable_name') or die "Error trying to selectall_hashref"; our %cr=(); foreach my $id (keys %$results) { $::cr{$id}=$results->{$id}->{"Value"}; } $dbh->disconnect(); for my $k (keys %checks) { if(exists $::cr{$k}) { if($::checks{$k} ne $::cr{$k}) { print STDERR "$0: warning: mismatch in $k: expected $::checks{$k}, got $::cr{$k}\n"; exit(1); } else { print STDERR "$0: info: match in $k: expected $::checks{$k}, got $::cr{$k}\n" if($::debug); } } else { print STDERR "$0: warning: no check result for $k (want $::checks{$k})\n"; } } exit(0);
The script requires as a dependency the installation of the Perl MySQL interface module:
# apt-get install libdbd-mysql-perl
Don't forget to configure a corresponding DB user for the checker script. Execute on a Galera node:
CREATE USER 'checker'@'%' IDENTIFIED BY 'aicHupdakek3'; FLUSH PRIVILEGES;
Adding a second Keepalived node for redundancy
With a single keepalived node we have a single point of failure. It is possible to add a second keepalived node which is communicating with the first keepalived node and transition from a backup state to master state upon failure of the first node.
To set up a second keepalived node as described above, create a keepalived node identical to the first one, with the following changes to the configuration file /etc/keepalived/keepalived.conf
:
- Change the
router_id
(to the hostname, for example) - Change the
state
toBACKUP
- Change the
priority
to something lower than the masters priority (e.g.100
)
Make sure the virtual_router_id
and authentication information is the same on the backup keepalived node as on the master keepalived node.
Now the backup node will notice the master going down and take over. Automatic failback also happens.
Keepalived will automatically manage the secondary IPs, so no need for any additional clustering software like corosync/pacemaker
etc.
Keepalived monitoring
ipvsadm -Ln -t $LOADBALANCER_IP:$LOADBALANCER_PORT ipvsadm -Ln -t $LOADBALANCER_IP:$LOADBALANCER_PORT --stats ipvsadm -Ln -t $LOADBALANCER_IP:$LOADBALANCER_PORT --rate