PostgreSQL HA with pgpool-II - Part 5

This is the fifth part of  the tutorial. Other parts are:

Don't be lazy - start from PostgreSQL HA with pgpool-II - Part 1, or you won't be able to catch-up.

In this part we'll deal with configuring pgpool-II, and installing pgpoolAdmin. Again, the main resource I've used is pgpool-II manual, but here I'll provide the essence.

5.1 Preparing PostgreSQL for pgpool-II

Let's first see which scripts / extensions we'll install and use. In the previous part of this tutorial we have prepared the following scripts / extensions:

SQL Script / Extension Comment
insert_lock.sql According to the original documentation, this script is used when pgpool-II runs in replication mode, but we will use master/slave mode instead. As far as I understand, it means that we don't need it, and I won't install it.
pgpool-regclass.sql / pgpool_regclass.control extension According to the original documentation, it is needed only if you are using PostgreSQL version prior to 9.4, so we won't install it.
pgpool-recovery.sql / pgpool_recovery.control extension According to the original documentation, it is needed for online recovery. I'm still not 100% sure if it is actually needed with replication slots, but I will install it.
pgpool_adm.sql / pgpool_adm.control extension It should be installed on every PostgreSQL instance used by pgpool-II, so we'll install it. Note: although original documentation says that it should be installed on every PostgreSQL server, in our case (master/slave streaming replication) it should be installed on the primary server only. As we already know, we cannot change read-only standby server anyway.

As you can see, except for insert_lock.sql, we have an option to install particular feature either by using SQL script (i.e. pgpool-recovery.sql), or by using extension (i.e. pgpool_recovery.control extension). You can use either, but not both. Here I will first show how we can use SQL scripts, and then how to install an extension. Before starting I'll remind you about something: when creating a new database PostgreSQL uses existing template1 database as template. It means that by installing particular script / extension on template1 database we'll ensure that it'll be also applied to any future databases. But if you already have existing databases created before the feature is installed on template1 database - you should install the feature on those databases also.

Here I'll show how to install features of interest by using SQL scripts:

# Navigate to SQL scripts directory:
cd /etc/postgresql/9.5/main/sql
# Execute scripts:
sudo -u postgres psql -f pgpool-recovery.sql template1
sudo -u postgres psql -f pgpool_adm.sql template1

Or the same thing by using extensions:

sudo -u postgres psql template1
=# CREATE EXTENSION pgpool_recovery;
=# CREATE EXTENSION pgpool_adm;
=# \q

5.2. Preparing Scripts

pgpool-II is able to decide when the failover should be performed, but it actually doesn't know how to perform it. For this reason we need to create failover script that will be used by pgpool-II to actually perform the failover. Similarly, pgpool-II needs recovery script. But when it comes to scripts, there's always infinite number of ways to accomplish the task. Basically, the failover script should simply create the trigger file (explained in PostgreSQL HA with pgpool-II - Part 2) on the newly promoted primary server. Similarly, recovery script should do all the steps described in PostgreSQL HA with pgpool-II - Part 2 related to the standby server and establishing the replication. You can (but don't have to) check the following resources to see how it is done:

Personally, I wasn't fully satisfied with any of these, so I'll do the same in my way, and I'll rely on scripts we've already created in PostgreSQL HA with pgpool-II - Part 3.

5.2.1. failover.sh

As the name implies, this script should perform failover. As we already know, it is easy to do - we should simply create a trigger file on the server which should takeover primary role. Here's an example script (which I've picked up from some of the resources enumerated above, and which we won't actually use):

#!/bin/bash -x
FALLING_NODE=$1         # %d
OLDPRIMARY_NODE=$2      # %P
NEW_PRIMARY=$3          # %H
PGDATA=$4               # %R

if [ $FALLING_NODE = $OLDPRIMARY_NODE ]; then
    if [ $UID -eq 0 ]
    then
        su postgres -c "ssh -T postgres@$NEW_PRIMARY touch $PGDATA/trigger"
    else
        ssh -T postgres@$NEW_PRIMARY touch $PGDATA/trigger
    fi
    exit 0;
fi;
exit 0;

The script obviously does what needs to be done - creates trigger file at failover server. You may ask why not using it then? Well, there are two things I don't like about this script:

  • It does not deal with the old primary in any way. As we know, it can be dangerous if the old primary server comes back, still thinking that he's the primary. It is true that we can disable old primary server by using some other script, or some other way, but I believe that the best place to implement this would be the same script which performs failover (to prevent forgetting this step).
  • Another reason is that this script performs failover, not full promotion (see PostgreSQL HA with pgpool-II - Part 2) of the server to the primary server role. I want to perform full promotion immediately.

Let's see the script that we'll actually use:

#!/bin/bash
# By Fat Dragon, 05/25/2016
# Recovers a standby server.

if [ $# -ne 5 ]
then
    echo "failover falling_node oldprimary_node new_primary replication_password trigger_file"
    exit 1
fi

FALLING_NODE=$1         # %d
OLDPRIMARY_NODE=$2      # %P
NEW_PRIMARY=$3          # %H
REPL_PASS=$4
TRIGGER_FILE=$5

echo "failover.sh FALLING_NODE: ${FALLING_NODE}; OLDPRIMARY_NODE: ${OLDPRIMARY_NODE}; NEW_PRIMARY: ${NEW_PRIMARY}; at $(date)\n" >> /etc/postgresql/9.5/main/replscripts/exec.log

if [ $FALLING_NODE = $OLDPRIMARY_NODE ]; then
    if [ $UID -eq 0 ]
    then
        sudo -u postgres ssh -T postgres@$NEW_PRIMARY /etc/postgresql/9.5/main/replscripts/promote.sh -f -p $REPL_PASS -d $OLDPRIMARY_NODE
    else
        ssh -T postgres@$NEW_PRIMARY /etc/postgresql/9.5/main/replscripts/promote.sh -f -p $REPL_PASS -d $OLDPRIMARY_NODE
    fi
    exit 0;
fi;

exit 0;

At the first glance this script is very similar to the previous one, but the key differences are:

  • The new script logs every execution. This will help us to understand at which moment particular script is executed, and to bridge enormous gap in the official documentation this way.
  • Instead of simply creating trigger file at the new primary, this script executes promote.sh script we've created in PostgreSQL HA with pgpool-II - Part 3, which performs the full promotion. Also note that -d flag with the old primary server is specified, meaning that promote.sh script will also try to disable the old primary server.
  • The new script cuts out PGDATA argument which was used in the first script since it is not needed here. On the other side, the new script introduces two new arguments (REPL_PASS and TRIGGER_FILE). Later in this page we'll see how we can instruct pgpool-II to send these parameters while calling failover.sh script.

The last thing to do is to ensure file ownership and permissions:

chown postgres:postgres /etc/pgpool/3.5.2/failover.sh
chmod 0700 /etc/pgpool/3.5.2/failover.sh

5.2.2. recovery_1st_stage.sh

I've already mentioned that the official pgpool-II documentation is poor, but when it comes to recovery script - it becomes even worse! I'll probably dedicate another post to enumerate some important omissions of the official documentation. Here I'll list my conclusions based on painful hours of research:

  • Standby recovery script should be specified in recovery_1st_stage_command key of pgpool.conf file. It is explained below.
  • recovery_1st_stage_command script is not customizable in terms of input arguments. There are few input arguments, and all of them are predetermined.
  • According to the documentation recovery_1st_stage_command must reside in PostgreSQL data directory (/var/lib/postgresql/9.5/main), for security reasons.
  • Current primary server is not specified by input arguments. I've spent significant time trying to understand how to get this information within the script, until I'll realized that the script always executes on the primary server, so we can get primary server hostname by querying $HOSTNAME environment variable. pgpool-II team, thanks for not sharing this with us! If you continue this way you'll end up developing a toy for yourself.

Despite my hard efforts there are still some things I don't know at the moment, and that I'll discover by logging every script execution, and then turning on and off different servers. For example:

  • When recovery_1st_stage_command is executed? Is it executed automatically by pgpool-II in some circumstances, or only when kicked by administrator? pgpool-II team, please don't tell us that! You'll spoil the surprise if you do.

But my frustrations aside, let's see the actual file I've finally came up with:

#!/bin/bash
# By Fat Dragon, 05/25/2016
# Recovers a standby server.

if [ $# -ne 3 ]
then
    echo "recovery_1st_stage datadir remote_host remote_datadir"
    exit 1
fi

PGDATA=$1
REMOTE_HOST=$2
REMOTE_PGDATA=$3

PORT=5433

echo "recovery_1st_stage.sh - PGDATA: ${PGDATA}; REMOTE_HOST: ${REMOTE_HOST}; REMOTE_PGDATA: ${REMOTE_PGDATA}; at $(date)\n" >> /etc/postgresql/9.5/main/replscripts/exec.log

hostnamelower=$(echo "$HOSTNAME" | tr '[:upper:]' '[:lower:]')
remotelower=$(echo "$REMOTE_HOST" | tr '[:upper:]' '[:lower:]')

if [ "$hostnamelower" = "$remotelower" ]; then
	echo "Cannot recover myself."
	exit 1
fi

echo "Checking if primary info file exists..."
if [ ! -f /var/lib/postgresql/9.5/main/primary_info ]; then
	echo "Primary info file not found."
	exit 1
fi

echo "Reading additional data from primary info file..."
source /var/lib/postgresql/9.5/main/primary_info

if [ ! -e $TRIGGER_FILE ]; then
	echo "Trigger file not found."
	exit 1
fi

if [ -e $STANDBY_FILE ]; then
	echo "Standby file found."
	exit 1
fi

if [ $UID -eq 0 ]
then
	sudo -u postgres ssh -T postgres@$REMOTE_HOST /etc/postgresql/9.5/main/replscripts/initiate_replication.sh -f -t $TRIGGER_FILE -s $STANDBY_FILE -H $HOSTNAME -P $PORT -u $REPL_USER -p $REPL_PASSWORD
else
	ssh -T postgres@$REMOTE_HOST /etc/postgresql/9.5/main/replscripts/initiate_replication.sh -f -t $TRIGGER_FILE -s $STANDBY_FILE -H $HOSTNAME -P $PORT -u $REPL_USER -p $REPL_PASSWORD
fi

exit 0;

Explanation:

  • The script logs execution so that we can bridge the missing documentation gap;
  • Checks if $REMOTE_HOST argument refers to the primary host itself, and exits with error if it does;
  • Checks for primary info file created by promote.sh script (explained in PostgreSQL HA with pgpool-II - Part 3), and reads additional data (REPL_USER, REPL_PASSWORD, TRIGGER_FILE and STANDBY_FILE) from it;
  • Checks if trigger and standby files are OK (trigger file must exist, while standby file must not exist), and exits with error if something is wrong;
  • Executes initiate_replication.sh script (explained in PostgreSQL HA with pgpool-II - Part 3) at $REMOTE_HOST through SSH, impersonating postgres user if necessary.

You've might noticed that I've included several checks in this script before the action should be done. I've did so because executing this script against primary server can be dangerous - it would destroy the primary server thus destroying the cluster.

Another thing you've might noticed is that I haven't used pg_start_backup and pg_stop_backup which are often used in other similar scripts you can find online. As far as I know these instructions are needed if we use manual rsync for copying backup, not when pg_basebackup command is used (in my case it is used internally by initiate_replication.sh script). I believe that if these are needed PostgreSQL team would include them in pg_basebackup, right?

Finally let's ensure script ownership and permissions:

chown postgres:postgres /var/lib/postgresql/9.5/main/recovery_1st_stage.sh
chmod 0700 /var/lib/postgresql/9.5/main/recovery_1st_stage.sh

5.2.3. pgpool_remote_start

It is another not-so-necessary script. It is called by pgpool-II after recovery of a standby server is finished, and it's purpose is to start the database. In our case postgresql service is automatically started by initiate_replication.sh script called by recovery_1st_stage.sh script, so a new script isn't needed. But I will create a trivial script that ensures that postgresql service is running:

#!/bin/bash
# By Fat Dragon, 05/25/2016
# Starts PostgreSQL at remote server.

if [ $# -ne 2 ]
then
    echo "pgpool_remote_start remote_host remote_datadir"
    exit 1
fi

REMOTE_HOST=$1
REMOTE_DATADIR=$2

echo "pgpool_remote_start - REMOTE_HOST: ${REMOTE_HOST}; at $(date)\n" >> /etc/postgresql/9.5/main/replscripts/exec.log

if [ $UID -eq 0 ]
then
	sudo -u postgres ssh -T postgres@$REMOTE_HOST service postgresql start
else
	ssh -T postgres@$REMOTE_HOST service postgresql start
fi

exit 0

In this script I've used service postgresql start to ensure that PostgreSQL is started. In other resources you'll often see pg_ctl used for this purpose. In my case there's no difference between the two. You can learn more about the differences in my other post Managing PostgreSQL Process on Ubuntu - service, pg_ctl and pg_ctlcluster.

Again we'll ensure file ownership and permissions:

chown postgres:postgres /var/lib/postgresql/9.5/main/pgpool_remote_start
chmod 0700 /var/lib/postgresql/9.5/main/pgpool_remote_start

5.3. Configuring pgpool-II

Once we have PostgreSQL prepared and all the scripts in place, we can finally start with configuring pgpool-II.

5.3.1. postgresql.conf

We need to slightly change PostgreSQL's main configuration file, so at the end of postgresql.conf file add the following line:

pgpool.pg_ctl = '/usr/lib/postgresql/9.5/bin/pg_ctl'

Adding this line will allow us to use pgpool_pgctl function (which actually will call the function we've specified here). Please confirm that the path I've provided here is valid in your case - it must point to an existing pg_ctl file.

5.3.2. pcp.conf

Let's continue with another easy part - pcp.conf file. This file is used by pgpool-II control interface for authentication, meaning that in this file you'll specify who can access pgpool-II control interface. During the installation of pgpool-II (PostgreSQL HA with pgpool-II - Part 4), we've created sample file (/etc/pgpool2/3.5.2/pcp.conf.sample). Let's copy the sample file and create the version which we'll actually use:

cp /etc/pgpool2/3.5.2/pcp.conf.sample /etc/pgpool2/3.5.2/pcp.conf

The next thing to do is to add one or more lines in the following format:

username:[password encrypted in md5]

where username should be replaced with actual username, and part in square brackets with md5 encrypted password. You can use pg_md5 command to encrypt passwords. Let me show on an example: I'll create user "admin" with password "pa55w0rd". The first thing I'll do is to md5 encrypt the password by executing:

pg_md5 pa55w0rd
97bf34d31a8710e6b1649fd33357f783

The second line is the result, of course. Now I'll use this result and add the following line to pcp.conf file:

admin:97bf34d31a8710e6b1649fd33357f783

And that's it. You should do the same for your user(s) and password(s).

If you'll add postgres user here (or some other superuser), I have to tell you that I couldn't make it work without adding the following lines to pg_hba.conf file on both servers:

host    all             postgres        10.1.10.150/32          trust
host    all             postgres        10.1.10.155/32          trust

I couldn't make it work with md5 - only with trust method. Keep in mind that this can be a significant security weakness.

5.3.3. pgpool.conf

Well, this one will cause more pain, but let's start anyway. We'll start by copying from template file we've created while installing pgpool-II:

cp /etc/pgpool2/3.5.2/pgpool.conf.sample-stream /etc/pgpool2/3.5.2/pgpool.conf

We've selected pgpool.conf.sample-stream template because it is prepared for master/slave streaming replication (our scenario). Now we'll adjust it, bit by bit. I assume that config values that are not mentioned in this section are left unchanged, but if you want (and you know what you are doing) - you can change them also.

Let's start with connection settings:

listen_addresses = '*'
port = 5432
socket_dir = '/var/run/postgresql'

I'll skip commenting values that are obvious, but only ones that are worthwhile noting:

  • port - As you might remember, while installing PostgreSQL (PostgreSQL HA with pgpool-II - Part 2), we've moved its usual port 5432 to 5433 in order to reserve the first one for pgpool-II. Well now we are using it as planned.
  • socket_dir - I've selected /var/run/postgresql not only because it is recommended in the template file, but also because the same directory is set as default PID file directory in /etc/default/pgpool2 file (see PostgreSQL HA with pgpool-II - Part 4).

pgpool communication manager connection settings:

pcp_listen_addresses = '*'
pcp_port = 9898
pcp_socket_dir = '/var/run/postgresql'

These are all defaults except for the last one (pcp_socket_dir), which is again set this way due to the same reasons as socket_dir is.

In backend connection settings we'll actually specify our PostgreSQL instances:

backend_hostname0 = 'IT-RDBMS01'
backend_port0 = 5433
backend_weight0 = 1
backend_data_directory0 = '/var/lib/postgresql/9.5/main'
backend_flag0 = 'ALLOW_TO_FAILOVER'
  
backend_hostname1 = 'IT-RDBMS02'
backend_port1 = 5433
backend_weight1 = 1
backend_data_directory1 = '/var/lib/postgresql/9.5/main'
backend_flag1 = 'ALLOW_TO_FAILOVER'

Configuration option that is worthwhile noting here is backend_weight (0 and 1). It is used in load balancing, and allows you to specify how the load should be distributed. For example, if you set backend_weight1 = 0, the second node won't be used in load balancing at all. If you, for example, want the first node to get twice as many queries than the second node, you can specify backend_weight0 = 2 and backend_weight1 = 1. Effectively it means that about 66.7% of the queries will be sent to the first node, and about 33.3% to the second. Nevertheless, don't forget that only read-only queries are subjected to load balancing. All write queries have to be sent to the primary node anyway.

In FILE LOCATIONS section of the file we'll set:

pid_file_name = '/var/run/postgresql/pgpool.pid'

In REPLICATION MODE section of the file we'll leave default values. Recheck the following:

replication_mode = off

In LOAD BALANCING MODE section we'll leave defaults, ensuring that:

load_balance_mode = on

In MASTER/SLAVE MODE once again we'll leave default values. Ensure that:

master_slave_mode = on
master_slave_sub_mode = 'stream'
sr_check_period = 5
sr_check_user = 'postgres'
sr_check_password = 'changeit'

Notes:

  • sr_check_user - I'm not sure if it has to be postgres;
  • sr_check_password - Well yeah, change it.

In HEALTH CHECK section of the file set the following:

helth_check_period = 5
health_check_timeout = 0
helth_check_user = 'postgres'
health_check_password = 'p0579r35'

Explanations:

  • helth_check_period - By selecting non-zero value we are turning on health check and automatic failover. Setting the value to 5 means that health check will be performed every 5 seconds.
  • helth_check_user - Does not have to be postgres. If you pick another user, make sure that he has read permissions on postgres database, (or the database specified in health_check_database).
  • health_check_password - helth_check_user's password.

In FAILOVER AND FAILBACK section put the following:

failover_command = '/etc/pgpool2/3.5.2/failover.sh %d %P %H myreplicationpassword /etc/postgresql/9.5/main/im_the_master'

Let's explain this piece by piece:

  • /etc/pgpool2/3.5.2/failover.sh - it's failover.sh script file we've created above;
  • %d %P %H - are special characters which are telling pgpool-II that the script needs the following arguments:

    • %d - Backend ID of an attached node;
    • %P - Old primary node ID;
    • %H - Hostname of the new master node.
  • myreplicationpassword - is replication user's password in my case. Put your own here, don't use mine.
  • /etc/postgresql/9.5/main/im_the_master - is a full path of the trigger file.

Note that we've ordered arguments as we need them in failover.sh file created above.

In ONLINE RECOVERY section of the file put the following:

recovery_user = 'postgres'
recovery_password = 'pa55w0rd'
recovery_1st_stage_command = 'recovery_1st_stage.sh'
recovery_2nd_stage_command = ''

Explanations:

  • recovery_user - this time it has to be postgres user;
  • recovery_password - you'll know what to do...;
  • recovery_1st_stage_command - is set to recovery_1st_stage.sh script we've created above. Note that this time only script name is used, without path and parameters. (To remind you: the script has to be placed in PostgreSQL data directory for security reasons).

In WATCHDOG section of the file ensure:

use_watchdog = on
trusted_servers = 'DC1,DC2'

Here I need to explain trusted_servers setting. For my big surprise, there's no single example online with this option set to anything than empty value, although in my opinion this setting is very important. As the comment from pgpool.conf file itself says, list of servers specified there are used to confirm network connectivity.

Let's for a moment forget this setting (leave it empty as everyone else does), and consider what will happen in our own scenario (two machines, each machine having one pgpool-II and one PostgreSQL instance) if one of nodes (machines) looses network connection. Both pgpool-II instances will loose connection to another pgpool-II instance, and to PostgreSQL instance on another machine, but still they'll have connection with their local PostgreSQL instance. According to that, both pgpool-II instances (even the one that actually have lost connection) can conclude that the other is down, and to promote itself to active pgpool-II instance. Even worse, both will conclude that PostgreSQL instance on other machine is dead and that the local one should be promoted to primary, and to perform failover and promotion! When connection is established again, we'll end up with very bad situation having two primary PostgreSQL instances. Disaster caused by loosing network connectivity of one node for just 10 seconds or so!

To prevent this we need to specify trusted_servers settings so that both pgpool-II instances can easily conclude something like: "Hey, I can't connect another pgpool-II and one of backend databases - they are probably dead, let's promote new primary! But wait, I also can't connect any of trusted servers, meaning that another pgpool-II might be OK, and that I'm the one who lost the connection, so I won't change anything." Assuming that this feature is correctly implemented in pgpool-II - it is a lifesaver in mentioned scenario.

To conclude: put list of couple of stable, pingable servers in this configuration key. In my case I've put my domain controllers (DC1 and DC2), but you'll have to put some servers from your network.

There is more to set in the same section:

wd_hostname = 'IT-RDBMS01'
wd_port = 9000
wd_priority = 2

We've came to the part where the settings will be different on different nodes. Explanations:

  • wd_hostname - Hostname or IP address of this watchdog. Meaning on IT-RDBMS01 this value will be IT-RDBMS01, on IT-RDBMS02 it'll be IT-RDBMS02. Just to remind you: IT-RDBMS01 and IT-RDBMS02 are hostnames in my case.- in your case they'll be different.
  • wd_port - It'll be the same on both nodes. Let's leave it at default value (9000).
  • wd_priority - Priority of this watchdog in leader election. Higher value wins, meaning that if there are two watchdogs (two pgpool-II instances), the active instance (master) will be the one with higher wd_priority value. In my case I'll set higher priority to node which hosts primary PostgreSQL instance (IT-RDBMS01). This way I'll decrease network communication needed.

I'll repeat the same settings for my other host (IT-RDBMS02):

wd_hostname = 'IT-RDBMS02'
wd_port = 9000
wd_priority = 1

And there's more to set in the same section:

wd_ipc_socket_dir = '/var/run/postgresql'
delegate_IP = '10.1.10.159'

Explanations:

And now I have to admit one thing: we won't finish configuring pgpool-II in this part of the tutorial. Few things we have to leave for the next one: PostgreSQL HA with pgpool-II - Part 6, where we'll finish with pgpool-II (hopefully), and install pgpoolAdmin. The following configuration values will be configured in the next part: if_cmd_path and arping_path.

But no, it's not over yet. There's more to set in this very section:

wd_lifecheck_method = 'heartbeat'
wd_interval = 3
wd_heartbeat_port = 9694

Heartbeat settings also requires us to set the other pgpool, and it will be different on different nodes, of course. In my case, on IT-RDBMS01 it will be:

heartbeat_destination0 = 'IT-RDBMS02'
heartbeat_destination_port0 = 9694

and on IT-RDBMS02 it will be:

heartbeat_destination0 = 'IT-RDBMS01'
heartbeat_destination_port0 = 9694

Still in the same section we also need to set other pgpool-II settings. Again, it will be different on our two nodes, of course. In my case on IT-RDBMS01 host:

other_pgpool_hostname0 = 'IT-RDBMS02'
other_pgpool_port0 = 5432
other_wd_port0 = 9000

and similarly on IT-RDBMS02:

other_pgpool_hostname0 = 'IT-RDBMS01'
other_pgpool_port0 = 5432
other_wd_port0 = 9000

Where to Go Next?

I believe that you're tired of everything, but believe me, I'm sick of everything! You've spent few hours in this tutorial, while I spent more than a month to gather everything needed and put it together. But both of us have to be patient little more. In PostgreSQL HA with pgpool-II - Part 6 we'll hopefully finish with pgpool-II configuration, and install pgpoolAdmin.