Template:OXLoadBalancingClustering Database

Revision as of 09:02, 16 September 2015 by Dominik.epple (talk | contribs) (Loadbalancer options)

Overview

You can choose between Galera or two-sided Master/Slave ("Master/Master") replication.

Galera database setup

OX only supports the "Percona XtraDB Cluster 5.5" flavor of the Galera database.


Installation

Debian systems

The following has been adjusted to work with Wheezy, but works similar with Squeeze, only the repo paths need adjustments.

To install the software, we first need to configure the repository and its build key, update our sources lists and install the packages:

gpg --keyserver  hkp://keys.gnupg.net --recv-keys 1C4CBDCDCD2EFD2A
gpg -a --export CD2EFD2A | apt-key add -

cat >/etc/apt/sources.list.d/percona.list <<EOF
deb http://repo.percona.com/apt wheezy main
deb-src http://repo.percona.com/apt wheezy main
EOF

apt-get update
apt-get install percona-xtradb-cluster-client-5.5 percona-xtradb-cluster-server-5.5 percona-xtrabackup

RHEL 6 systems

Should also apply to CentOS 6.

First, disable selinux, iptables, ip6tables. (Galera does not run with selinux. Using iptables and ip6tables should work if you configure it correctly, but documentation thereof is out of scope of this document.) Reboot.

Percona XtraDB Cluster relies on socat which is not shipped by RHEL. We need to install from a different source. The epel repository can be used for that.

yum install epel-release
yum install socat

The installation command itself needs to be a composite remove, install command since yum is not clever enough to resolve the conflicts itself, so we need to tell it how.

wget http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm
yum localinstall percona-release-0.0-1.x86_64.rpm
yum shell
remove mysql-libs
install Percona-XtraDB-Cluster-server-55 Percona-XtraDB-Cluster-client-55
run
quit

Once this is all done, don't forget to run the update command to get the latest Percona packages.

yum update

Configuration

my.cnf configuration file

Galera needs also a my.cnf configuration file. Usually MySQL expects this file to be located at /etc/mysql/my.cnf. But the Percona packages don't ship any; on purpose: https://bugs.launchpad.net/percona-server/+bug/673844

Thus, you need to obtain / install / create one on your own. Make sure it has no settings which are forbidden for Galera. This includes the query_cache (it must not be enabled with Galera) and probably other settings which would contradict the settings explained in the next section.

Default location for my.cnf file based on different Linux Distros.

  • If you are using Debian Linux file is located at /etc/mysql/my.cnf location
  • If you are using Red Hat Linux/Centos Linux/SLES Linux file is located at /etc/my.cnf location


Make sure you apply standard tunings for your memory size, number of allowed connections, and stuff.

We assume in the following that the my.cnf file has a directive like !includedir /etc/mysql/conf.d, such that you can put additional config files ending with .cnf there.

A sample my.cnf file serving as a starting point is provided here: My.cnf. Make sure you read the whole article and adjust that file to suit your needs before actually using it.

Caveat: we found out that Galera performs suboptimal when using innodb_flush_log_at_trx_commit=1. We leave up to you to assess whether this is a no-go for your environment or not. For Galera reasonable values for this parameter are innodb_flush_log_at_trx_commit=2 or ...=0. Make sure to read the documentataion of this parameter and that you understand its implication before using it.

Furthermore, you need to set the "datadir" configurable in the my.cnf file, even if you are on the default and do not want to change it. Some SST methods depend the setting being explicitly present in the configuration file.

wsrep.cnf configuration file

The Galera configuration then happens in a section called "wsrep", "write set replication", which is the internal name for the replication mechanism Galera is based on. A sample /etc/mysql/conf.d/wsrep.cnf file looks like:

[mysqld]
# the following lines are required for galera:
binlog_format=ROW
default_storage_engine=InnoDB
innodb_autoinc_lock_mode=2
innodb_locks_unsafe_for_binlog=1
query_cache_size=0
query_cache_type=0
bind-address=0.0.0.0
wsrep_provider=/usr/lib64/libgalera_smm.so
# NOTE: on Wheezy, use this path:
# wsrep_provider=/usr/lib/libgalera_smm.so
# the following lines need to be adjusted to your environment ... CHANGE THE PASSWORD! :-)
wsrep_cluster_name="my_wsrep_cluster"
wsrep_cluster_address="gcomm://<GALERA_NODE1_IP>,<GALERA_NODE2_IP>,<GALERA_NODE3_IP>"
# Note that xtraback-v2 is the latest version and is the new default,
# while xtrabackup will also work but will be soon deprecated.
wsrep_sst_method=xtrabackup-v2
wsrep_sst_auth=wsrep:5ojijmedUg8
# It is recommended to run Galera in synchronous mode, which makes it possible
# to disable the OX builtin database replication monitor.
# Default is semi-synchronous mode. To enable synchronous mode, use
wsrep_causal_reads=1

When you adjusted those files, make sure they are identical on all nodes.

The replication user will be created later when the DB is running on the first node.

Cluster startup

Whenever not all nodes of a Galera cluster are running (like before starting the cluster for the very first time), the first Galera node needs to get started with the wsrep_cluster_address parameter overridden to the value "gcomm://" in order to denote that the node shall not try to join an existing cluster (which would inevitably fail now, because no other cluster nodes are running yet), but to bootstrap the cluster instead. This override can most conveniently done on the command line, instead of editing to wsrep.cnf file to and fro.

So, for the first node, the startup command is

mysqld_safe --wsrep_cluster_address=gcomm:// &

You should then verify the Galera module is loaded properly using

mysql -e "show status like 'wsrep%';"

You should verify some settings like

| wsrep_local_state_comment  | Synced                               |
| wsrep_cluster_size         | 1                                    |
| wsrep_cluster_status       | Primary                              |
| wsrep_connected            | ON                                   |
| wsrep_provider_name        | Galera                               |
| wsrep_provider_vendor      | Codership Oy <info@codership.com>    |
| wsrep_provider_version     | 2.8(r162)                            |
| wsrep_ready                | ON                                   |

Now you need to create the database user (we will use the same username and password as we defined in the previous section when setting up wsrep.cnf file) for the replication on this first node:

# create wsrep user: in mysql shell:
CREATE USER 'wsrep'@'localhost' IDENTIFIED BY '5ojijmedUg8';
GRANT RELOAD, LOCK TABLES, REPLICATION CLIENT ON *.* TO 'wsrep'@'localhost';
FLUSH PRIVILEGES;

The Galera peers can then be started on the nodes 2 and 3 using

mysqld_safe &

Since the standard service startup scripts cannot account for this special treatment, we recomment not to use them.

You can check the status of the Galera cluster using

mysql -e "show status like 'wsrep%';"

The output is lengthy. The most relevant fields are given as follows:

+----------------------------+----------------------------------------------------------------------+
| Variable_name              | Value                                                                |
+----------------------------+----------------------------------------------------------------------+
| wsrep_local_state_comment  | Synced                                                               |
| wsrep_incoming_addresses   | <GALERA_NODE1_IP>:3306,<GALERA_NODE2_IP>:3306,<GALERA_NODE3_IP>:3306 |
| wsrep_cluster_size         | 3                                                                    |
| wsrep_cluster_status       | Primary                                                              |
| wsrep_connected            | ON                                                                   |
| wsrep_ready                | ON                                                                   |
+----------------------------+----------------------------------------------------------------------+

Troubleshooting

The logs are helpful. Always.

Common mistakes are listed below.

If the Galera module does not get loaded at all:

  • Configuration settings in my.cnf which are incompatible to Galera
  • Wrong path of the shared object providing the Galera plugin in wsrep.cnf (wsrep_provider)

If the first node starts, but the second / third nodes can not be added to the cluster:

  • User for the replication not created correctly on the first Galera node

Notes about configuring OX for use with Galera

Write requests

Open-Xchange supports Galera as database backend only in the configuration where all writes are directed to one Galera node. For availability, it makes sense to not configure one Galera node's IP address directly, but rather employ some HA solution which offers active-passive functionality. Options therefore are discussed below.

Read requests

Read requests can be directed to any node in the Galera cluster. Our standard approach is to recommend to use a loadbalancer to implement round-robin over all nodes in a Galera cluster for the read requests. But you can also chose to use a dedicated read node (the same node, or a different node, than the write node). Each of the approaches has its own advantages.

  • Load balancer based setup: Read requests get distributed round-robin between the Galera nodes. Theoretically by distributing the load of the read requests, you benefit from lower latencies and more throughput. But this has never been benchmarked yet. For a discussion of available loadbalances, see next section. OX-wise, in this configuration, you have two alternatives:
    • The Galera option wsrep_causal_reads=1 option enables you to configure OX with its replication monitor disabled (com.openexchange.database.replicationMonitor=false in configdb.properties). This is the setup which seems to perform best according to our experience as turning off the replication monitor reduces the commits on the DB and thus the write operations per second on the underlying storage significantly, which outweights the drawback from having higher commit latency due to fully synchronous mode.
    • Alternatively, you can run Galera with wsrep_causal_reads=0 when switching on OX builtin replication monitor. This is also a valid setup.
  • Use a designated floating IP for the read requests: This eliminates the need of a load balancer. With this option you will not gain any performance, but the quantitative benefit is unclear anyhow.
  • Use the floating IP for the writes also for the reads: In this scenario, you direct all database queries only to one Galera node, and the other two nodes are only getting queries in case of a failure of that node. In this case, you can even use wsrep_causal_reads=0 while still having OX builtin replication monitor switched off. However we do not expect this option to be superior to the round-robin loadbalancer approach.

Loadbalancer options

While the JDBC driver has some round-robin load balancing capabilities built-in, we don't recommend it for production use since it lacks possibilities to check the Galera nodes health states.

Loadbalancers used for OX -> Galera loadbalancing should be able to implement active-passive instances for the write requests, and active-active (round-robin) instances for the read requests. (If they cannot implement active-passive, you can still take a floating IP therefore.) Furthermore it is required to configure node health checks not only on the TCP level (by a simple connect), but to query the Galera health status periodically, evaluating Galera WSREP status variables. Otherwise split-brain scenarios or other bad states cannot be detected. For an example of such an health check, see the our documentation for setting up a software loadbalancer using keepalived (linked below).

Some customers use loadbalancing appliances. It is important to check that if the (virtual) infrastructure offers "loadbalancer" instances that they satisfy the given requirements. Often this is not the case. In particular, a simple "DNS round robin" approach is not viable.

If you want to create your own loadbalancers based on Linux, we usually recommend LVS (Linux Virtual Servers) controlled by Keepalived. LVS is a set of kernel modules implementing a L4 loadbalancer which performs quite well. Keepalived is a userspace daemon to control LVS rules, using health checks to reconfigure LVS rules if required. Keepalived / LVS requires one (or, for availability, two) dedicated linux nodes to run on. This can be a disadvantage for some installations, but usually, it pays off. Furthermore it has some requirements on the infrastructure, like being able to configure secondary IP addresses. This can be a show stopper for some (virtual / cloud) infrastructures. We gathered some configuration information on Keepalived here.

In case where the Keepalived based approach is not feasible due to its requirements on the infrastructure, it is also possible to use a HAproxy based solution where HAproxy processes run on each of the OX nodes, configured for one round-robin and one active/passive instance. OX is then connecting to the local HAproxy instances. It is vital to configure HAproxy timeouts different from the defaults, otherwise HAproxy will kill active DB connections, causing errors. One design flaw of that approach is that the different HAproxy instances do not speak to each other, so that it can happen that different instances have a different understanding of node health status, which could lead to situations violating the "one write node" requirement. Furthermore in large installations the number of HAproxy instances can get quite large. Some configuration hints for HAproxy are available here.

Master/Master database setup

This section describes the setup process "Master/Master replication" for new Open-Xchange database cluster. During configuration and initialization, other database operations must be prohibited.

The Master/Master replication is a vice versa setup of Master/Slave configurations. This means each server is afterwards the slave of the other.

Server IPs in the example are 1.1.1.1 and 9.9.9.9

Startup both database machines and install the mysql server packages

$ apt-get install mysql-server

During the installation, a dialog will show up to set a password for the MySQL 'root' user.

Open the MySQL configuration file on both servers:

$ vim /etc/mysql/my.cnf

Modify or enable the following configuration options in the mysqld-section, use 1 as ${unique Number} on the server 1.1.1.1 and 2 for 9.9.9.9:

bind-address            = 0.0.0.0
server-id               = ${unique Number}
log_bin                 = /var/log/mysql/mysql-bin.log
binlog_format           = statement
max_allowed_packet      = 16M
  • bindaddress specifies the network address where MySQL is listening for network connections. Since the MySQL slave and both Open-Xchange Servers are dedicated machines it is required to have the master accessible through the network.
  • server-id is just a unique number within a environment with multiple MySQL servers. It needs to be unique for each server in a replication cluster.
  • log-bin enables the MySQL binary log which is required for Master/Master replication. In general every statement triggered at the database is stored there to get distributed through the database cluster.

To apply the configuration changes, restart the MySQL servers.

$ /etc/init.d/mysql restart

Then login to MySQL with the credentials given at the MySQL installation process

$ mysql -u root -p
Enter password:

First Master configuration

Choose one server to start with as the first Master (here we use 1.1.1.1).

Create a MySQL user with rights "REPLICATION". This account is used by the MySQL slave to fetch database updates. In this example, the username is "replication":

 mysql> GRANT REPLICATION SLAVE ON *.* TO 'replication'@'9.9.9.9' IDENTIFIED BY 'secret';

Verify that the MySQL daemon writes a binary log and note the log Position and File name:

mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000001 |     1111 |              |                  |
+------------------+----------+--------------+------------------+

First Slave configuration

On 9.9.9.9, set the MySQL system user as owner of the binary log that has just been copied to the slave.

$ chown mysql:adm /var/log/mysql/*

Configure MySQL on 9.9.9.9 to use 1.1.1.1 as Master Server. (Use the actual log File name and Position which you just obtained with the command SHOW MASTER STATUS on 1.1.1.1. as explained above.)

mysql> CHANGE MASTER TO MASTER_HOST='1.1.1.1', MASTER_USER='replication', MASTER_PASSWORD='secret', MASTER_LOG_FILE='mysql-bin.000001', MASTER_LOG_POS=1111;

Start the MySQL slave replication

mysql> START SLAVE;

And check the status

mysql> SHOW SLAVE STATUS\G;

"Slave_IO_Running" and "Slave_SQL_Running" should be set to "yes". Furthermore "Read_Master_Log_Pos" should be counting and "Seconds_Behind_Master" should be approaching the 0 mark.

Second Master configuration

This means, the first Master/Slave Replication is working and the "reverse" replication needs to be prepared. Please now create the replication user on 9.9.9.9:

 mysql> GRANT REPLICATION SLAVE ON *.* TO 'replication'@'1.1.1.1' IDENTIFIED BY 'secret';

Verify that the MySQL daemon writes a binary log and remember the log Position:

mysql> SHOW MASTER STATUS;
+------------------+----------+--------------+------------------+
| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+----------+--------------+------------------+
| mysql-bin.000009 |      9999|              |                  |
+------------------+----------+--------------+------------------+

Second Slave configuration

1.1.1.1 is now the slave in this context and 9.9.9.9 is the master. Log in to 1.1.1.1

Configure MySQL on 1.1.1.1 to use 9.9.9.9 as Master Server. Use the remembered log and file position from 1.1.1.1.

mysql> CHANGE MASTER TO MASTER_HOST='9.9.9.9', MASTER_USER='replication', MASTER_PASSWORD='secret', MASTER_LOG_FILE='mysql-bin.000009', MASTER_LOG_POS=9999;

start the MySQL slave replication

mysql> START SLAVE;

and check the status

mysql> SHOW SLAVE STATUS\G;

"Slave_IO_Running" and "Slave_SQL_Running" should be set to "yes". Furthermore "Read_Master_Log_Pos" should be counting and "Seconds_Behind_Master" should be approaching the 0 mark.

Also check the syslog if the replication has been sucessfully started

$ tail -fn20 /var/log/syslog
Jul 26 19:03:45 dbslave mysqld[4718]: 090726 19:03:45 [Note] Slave I/O thread: connected to master 'replication@1.1.1.17:3306',  replication started in log 'mysql-bin.000001' at position 10000

Testing Master/Master

On 1.1.1.1, create a new database in MySQL:

mysql> CREATE DATABASE foo;

Verify the database to als be available on 9.9.9.9 afterwards:

mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| foo                |
| mysql              |
+--------------------+

Delete the new database on 9.9.9.9:

mysql> DROP DATABASE foo;

Check if the database has also been removed on 1.1.1.1

mysql> SHOW DATABASES;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
+--------------------+

Creating Open-Xchange user

Now setup access for the Open-Xchange Server database user 'openexchange' to configdb and the oxdb for both groupware server addresses. These databases do not exist yet, but will be created during the Open-Xchange Server installation.

Note: The IPs in this example belong to the two different Open-Xchange Servers, please adjust them accordingly.

mysql> GRANT ALL PRIVILEGES ON *.* TO 'openexchange'@'10.20.30.213' IDENTIFIED BY 'secret';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'openexchange'@'10.20.30.215' IDENTIFIED BY 'secret';