PostgreSQL HA with pgpool-II - Part 2

This is the second part of  the tutorial. Other parts are:

It would be smart to start from PostgreSQL HA with pgpool-II - Part 1, but it's your call.

In this part we'll go through installing PostgreSQL and configuring replication.

2.1. Infrastructure

Just to remind you: we'll use two servers, in my case Ubuntu 14.04:

FQDN IP Purpose 1 Purpose 2
IT-RDBMS01.itenlight.com 10.1.10.150 Primary PostgreSQL instance Active pgpool-II instance
IT-RDBMS02.itenlight.com 10.1.10.155 Standby PostgreSQL instance Standby pgpool-II instance

Virtual IP that will be used is 10.1.10.159.

Keep in mind that PostgreSQL team recommends that all the servers included in replication should be similar, "at least from database's point of view".

2.2. Installing PostgreSQL

This installation should be done on both servers, of course. At the moment of this writing official Ubuntu PostgreSQL packages are still for version 9.3, and we would like to go with a newer version (at least 9.4 since some significant improvements regarding replication are introduced there). For this reason the first thing to do is to add PostgreSQL apt store. It is well described at PostgreSQL wiki, but for your convenience I will repeat here:

sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt/ $(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'

Install prerequisites, repository key, and PostgreSQL itself:

apt-get install wget ca-certificates
wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | apt-key add -
apt-get update
apt-get upgrade
apt-get install postgresql-9.5 postgresql-9.5-pgpool2

Few notes:

  • If you are using sudo, you'll need to prefix all the lines form the previous snippet with sudo, except for the second line where sudo is need by the second command and should be placed after pipe character (|)
  • postgresql-9.5-pgpool2 package is not needed for establishing replication, but it will be needed in the third part of this tutorial when we will install pgpool-II. Unfortunately for me, existence of this package isn't mentioned in pgpool II manual, so I've spent a lot of time trying to compile it from source. But lucky you will skip this pain.  

Once the database is installed it is good practice to change postgres user's password:

sudo -u postgres psql
ALTER USER postgres WITH PASSWORD 'newpassword';
\q

In the rest of the tutorial I will continue with the default cluster created during installation of PostgreSQL. If you want you can change/create new cluster by using initdb command. Note that term cluster used here has different meaning than one we used so far, and we will in the rest of this tutorial. Here it refers to "collection of databases that are managed by a single server instance". Unfortunate and confusing terminology introduced by PostgreSQL, but we have to adopt.

By default during package installation PostgreSQL creates the following directories on Ubuntu:

  • /etc/postgresql/9.5/main - configuration files like postgresql.conf are placed there, so we'll call it configuration directory;
  • /var/lib/postgresql/9.5/main - where the actual data is (and will be) stored, so we'll call it data directory;
  • /usr/lib/postgresql/9.5 - where PostgreSQL binaries are installed. It isn't important for us, but we'll name it installation directory.

2.3. Configuring Replication

The main resource for me in this part was Streaming Replication article at PostgreSQL wiki, but I've also peeked few times at this article and this pgpool.net article. The latest one is pretty old, dealing with PostgreSQL 9.0 (way before new replication features), but it was useful for comparing things. Anyway, you won't need to peek anywhere else besides this very article you're enjoying in so much right now.

This procedure should also be done on both servers. Let's start with creating an user named replication with REPLICATION privileges:

CREATE ROLE replication WITH REPLICATION PASSWORD 'reppassword' LOGIN;

Obviously replication will be performed by using previously created account. In some cases (i.e. pg_basebackup command used below) you won't be able to specify the password. For this reason you need to create .pgpass file and store the password there. Password file resides in user's home directory, but in case of postgres user it is not /home/postgres as you might expect. For security reasons its home directory is /var/lib/postgresql instead. So you need to create / modify /var/lib/postgresql/.pgpass file and ensure that it contains the following line:

*:*:*:replication:reppassword

The first three asterisks denote any host, any port, any database. The last two fields are username and password, respectively. Basically, we've just allowed postgres user to execute commands as replication user. Password file requires strict permissions, so we also need to execute:

chown postgres:postgres /var/lib/postgresql/.pgpass
chmod 0600 /var/lib/postgresql/.pgpass

The password file is needed on standby server, but it won't harm if you create it on both servers. By the way, we had to do all this password file thing only because PostgreSQL team want us suffer; everything would be much easier if pg_basebackup could simply be called with specified password. But no. They decided to implement some pretty useless flags (i.e. --password and --no-password), but not an option to actually specify the password. Why? They would probably answer "for security reasons", but the truth is that they simply want you suffer.

Next change the following entries in postgresql.conf file:

listen_addresses = '*'
port = 5433

Again few notes:

  • PostgreSQL instance does not have to listen to all IP addresses. Precisely, it does not have to listen virtual IP address, but it has to listen on the main server's IP so that pgpool-II installed on the other server can access it, and it should listen on localhost address if pgpool-II instance installed on the same server accesses it this way. Anyway, there's no harm in setting PostgreSQL to listen all available addresses.
  • Note that I've changed default PostgreSQL port (5432) to 5433. I've did it because I want to use 5432 for pgpool-II instance so that all outside world clients can connect to pgpool-II by using this well known port. If you don't want to set ports in such way - you don't have to, of course.

Add/change the following entries in pg_hba.conf file:

host  replication     replication     10.1.10.150/32          md5
host  replication     replication     10.1.10.155/32          md5
host  all             postgres        10.1.0.0/16             md5

Notes:

  • The first two lines are allowing replication user to access the database from the IP address specified (you should change actual IP addresses appropriately). Basically not both pg_hba.conf files (on both servers) have to contain both lines. File on the primary server can contain only the second line, while the file on the standby server can contain only the first line, but again there's no harm in having the same file with both lines on both servers.
  • The third line is not needed for establishing replication, but I've added it so that I can access the server with postgres account from my local network, to be able to administer it remotely. You can skip this line if you want. If you'll keep it - change IP network appropriately, of course.

2.3.1. Configuring Primary Server

2.3.1.1. Replication Slots

This is the point where the tough part starts, and where we must give up on many resources, including the most important one - PostgreSQL wiki. It's because we are choosing to take slightly different (and better) direction - we'll use so-called replication slots. This feature is introduced in PostgreSQL 9.4, and it is intended for logical replication (not to be explained here), but it also can be used with streaming replication we are planning to implement. You can read more about the technology here and here, but you don't have to - I've already did, and I'll present the essence here.

First let introduce the new technology as short as possible. To do that I first need to shortly explain how log-shipping replication works: basically it transfers transaction log (WAL files) from primary to standby server, and standby uses these WAL files to reconstruct database state. In this type of replication standby is basically in constant recovery state (constantly recovering itself by reading new WAL files). Every once and while primary server frees its pg_xlog by deleting old WAL files. The problem with such replication arises when standby server gets too far behind the primary server (for example after long period being down). When standby tries to catch-up again - it can't get WAL files because they are deleted, meaning that replication would fail. Prior to replication slots the problem was solved by one of two means (or both combined as, for example, in PostgreSQL wiki):

  • By defining minimal amount of WAL files kept (wal_keep_segments parameter in postgresql.conf). Basically, we were able to set this parameter to be high enough so that primary server keeps WAL files long enough for standby to catch-up.
  • Instead of deleting WAL files - to archive and store them in a place where standby can access them (archive_mode and archive_command parameters in postgresql.conf).

Replication slots are introducing a new approach: they are basically allowing primary server to be aware of each standby and its replication state, and to keep WAL files as long as needed - no more, no less. With this technology primary server will retain WAL files basically forever, waiting for standby to pick them up (unless itself goes down due to pg_xlog overflow). It means that we can simply turn on standby server weeks after it went down, and it will catch-up without any additional intervention on our side. On the other hand it also means that if we gave up on some standby for good, we have to tell that to primary server; otherwise it will go down sooner or later. For this reason application slots are not created nor deleted automatically. We have to create the slot before connecting standby, and we also have to delete the slot after giving up on particular standby.

Replication slots are making our life easier when it comes to recovery after longer delays - there's no need for manual re-synchronizing. But they are also taking away one interesting feature that were available with WAL archiving - so-called moment in time recovery. With WAL archiving we were able not only to restore new standby to current state of primary server, but also to restore it in a state of the database in any moment before (for example before you've accidentally deleted some table). With replication slots it is not possible; standby has to be restored in the current state of primary server.

In my case I'll go with replication slots, but I'll also provide instructions for those who decide to go with WAL archiving.

If you'll go with replication slots, you need to create the replication slot(. To do that, on primary server execute the following command:

SELECT * FROM pg_create_physical_replication_slot('it_rdbms02');

I've named the slot it_rdbms02 (obviously to correspond to hostname of my standby server), but you can name it as you want. Also on primary server you need to additionally change postgresql.conf as follows:

wal_level = hot_standby
max_replication_slots = 3
max_wal_senders = 3

Notes:

  • The first line tells the primary server that replicas will be working in hot_standby mode, meaning that it should send data without delay (streaming replication).
  • In the second line I've set maximal number of replication slots to 3 although I will use only one for now.
  • The third line defines maximal number of concurrent connections from standby servers.

Finally start (or restart) PostgreSQL.

2.3.1.2. WAL Archiving

wal_level = hot_standby
max_wal_senders = 3
wal_keep_segments = 32
archive_mode    = on
archive_command = 'cp %p /path_to/archive/%f'

Start (or restart) PostgreSQL.

2.3.2. Configuring Standby Server

The first and very important step is to stop PostgreSQL server.

Next thing to do is to delete everything from PostgreSQL data directory, including any any tablespace directories. PostgreSQL wiki explains how you can do this, but instead I prefer shortcut way - to delete data directory itself. After that you will execute pg_basebackup command in order to get initial state from primary server. Everything mentioned will be accomplished by executing the following commands as postgres user:

sudo -i -u postgres
cd /var/lib/postgresql/9.5
rm -rf main
pg_basebackup -v -D main -R -P -h 10.1.10.150 -p 5433 -U replication
logout

Explanation:

  • The first line enters postgres user impersonation session;
  • The second navigates to data directory's parent folder;
  • The third deletes data directory;
  • The fourth initiates importing data from primary server to newly created data directory main;
  • The last exits postgres user impersonation session.

Add / change the following line in postgresql.conf file:

hot_standby = on
hot_standby_feedback = on

Explanations:

  • The first line tells standby server that it will be used for read-only queries (load balancing).
  • The third line prevents "pruning of tuples whose removal would cause replication conflicts" whatever it means. At the moment I'm not sure if it relates only to replication slots or can be used with WAL archives also, but I suggest setting it to 'on' in either case.

2.3.2.1. recovery.conf for Replication Slots

You need to create / change recovery.conf file from data directory so that it contains the following:

standby_mode = 'on'
primary_slot_name = 'it_rdbms02'
primary_conninfo = 'host=10.1.10.150 port=5433 user=replication password=reppassword'
trigger_file = '/etc/postgresql/9.5/main/im_the_master'

Explanations:

  • The first line specifies that the server should be started as a standby;
  • The second line tells the server that replication slots will be used, and the slot name it should use (must be the same as defined above while creating the slot);
  • The third line represents a connection string which is used by the standby server to connect with the primary (change IP address, port and password appropriately);
  • The fourth line specifies a trigger file (mentioned in PostgreSQL HA with pgpool-II - Part 1) whose presence should cause streaming replication to end - meaning failover. You can define any path and name for a trigger file. I've selected configuration directory for location (since it is the first place an administrator usually checks), and descriptive name im_the_master. Since actual primary server (as defined above) does not contain recovery.conf file, it neither have to contain trigger file in order to be the primary server. But for consistency I suggest you always have this file on the primary server - this way its role is obvious at the first glance. Of course, you should not allow more than one server from the same cluster to have this file.

Start standby server.

2.3.2.2. recovery.conf for WAL Archiving

recovery.conf file in this case is similar to one used in replication slots scenario, with few changes:

standby_mode = 'on'
primary_conninfo = 'host=10.1.10.150 port=5433 user=replication password=reppassword'
trigger_file = '/etc/postgresql/9.5/main/im_the_master'
restore_command = 'cp /path_to/archive/%f "%p"'

Note that restore_command must point to the same location as archive_command defined in postgresql.conf on primary server above.

Start standby server.

2.4. Testing Replication

Before actually test the replication you can first check postgresql service status. If the replication is running you should get the following output on Ubuntu (and something similar on other OS):

# Primary server:
service postgresql status
9.5/main (port 5433): online
  
# Standby server:
service postgresql status
9.5/main (port 5433): online,recovery

As already mentioned, in replication the standby server is always in recovery state.

The next test is obvious, and actually proves that the replication works. First we'll create temporary database on the primary server:

sudo -u postgres psql
postgres=# CREATE DATABASE replicationtest;
CREATE DATABASE
postgres=# \l

(The third line is not a command you should emit, but response gotten from executing the command from the second line.) The last command (\l) lists existing databases, so you'll get:

                                     List of databases
      Name       |  Owner   | Encoding |   Collate   |    Ctype    |   Access privileges
-----------------+----------+----------+-------------+-------------+-----------------------
 postgres        | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 replicationtest | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 |
 template0       | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
                 |          |          |             |             | postgres=CTc/postgres
 template1       | postgres | UTF8     | en_US.UTF-8 | en_US.UTF-8 | =c/postgres          +
                 |          |          |             |             | postgres=CTc/postgres
(4 rows)

You can close this list by pressing q key. Now you should get the same list of databases on the standby server, by executing:

sudo -u postgres psql
postgres=# \l

You can also try to delete the newly created database on the standby server by executing:

postgres=# DROP DATABASE replicationtest;
ERROR:  cannot execute DROP DATABASE in a read-only transaction

Obviously we cannot delete the database on the standby server, and this is OK. Let's try the same command on the primary server:

postgres=# DROP DATABASE replicationtest;
DROP DATABASE

On the primary server deletion obviously succeeded. You can recheck database list on both servers to confirm that test database is absent.

2.5. How to Recover Standby Server?

Depending on what happened with the standby server after its failure there are two scenarios:

  1. If standby server is repaired without loosing its content, meaning that old data is still there, you can simply connect repaired standby server and turn it on. It will synchronize automatically after some time.
  2. If the old standby server is lost, and a new, freshly installed one will take its place, the first thing you need to do at primary server is to delete replication slot which was used by the old standby server, and create a new slot for the new standby server. Then you need to configure new standby server in the same way you've configured the old one, by following the same exact steps described above.

2.6. How to Recover Primary Server?

Well, sad news is that you cannot do this. Primary server cannot be recovered.

Image

I would really like to see your face right now! You probably think that I'm fooling you, but no, it is true: primary server cannot be recovered. But it is also true that we wouldn't ever want to recover it anyway. The trick is that the question: "How to recover primary server?" is wrong. The right one would be: "What to do when primary server fails?". So let's start again, this time using the right question:

2.6. What to Do when Primary Server Fails?

The first thing to do when primary server fails (if it is not already done by some tool as pgpool-II), is to promote standby server to primary role.

You can easily make standby server take over primary role - simply by creating the trigger file. But you should be aware that there is more to do, sooner or later. I'll refer to these remaining steps that need to be done as full promotion.

2.6.1. Failover vs Full Promotion

When the failover is performed (by creating the trigger file), the failover server starts to behave as primary, but it still isn't full primary server; it's more like TDY primary server. Let me explain. The new server will become writable, and the cluster will behave normally looking from the outside world. But the new server isn't capable of accepting standby servers (existing or new ones). It means that all other standby servers (if any) won't be used in any way, as if they are failed together with the old primary server. In order to join them back (as well as any new standby server), we need to fully promote TDY primary server. Full promotion basically assumes:

  • Delete recovery.conf file;
  • Change prostgresql.conf file appropriately for the new role (as described in this page) and restart PostgreSQL service;
  • Create replication slots for standby server(s) (as described in this very page).

Long story in short - full promotion must take place sooner or later, and we can choose when to perform it. As usual, there are some pros and cons:

  • Failover itself does not require postgresql service restart, meaning that no additional downtime is introduced. On the other side, full promotion does require the service to be restarted, which will cause minimal additional interrupt (probably no more than one second). In my case this up-to-one-second interrupt is acceptable, but someone else can decide that it is better to wait few hours and do this during low-load period (night time for example). Nevertheless, don't forget that we've already had a few seconds downtime - period between old primary server failure and failover procedure. Even if it is done automatically by pgpool-II - it is not instantly (some time is needed for pgpool-II to decide that failover should take place).
  • On the other side failover without full promotion has huge disadvantage: as long as you are running without full promotion, and at least one standby server - your system is in so-called degraded state, meaning that there's no more alternatives - if the failover server fails - you'll end up with data loss.

To conclude: my decision is to perform full promotion immediately, and to join at least one standby as soon as possible.

There's another thing you should be aware of: even when the primary server is fully promoted, you should be cautious with joining standby servers, if you plan to join more than one. Don't forget that each standby server will actually be restored from scratch (all it's previous data will be deleted). It means that if you have a lot of data synchronization can take a while, and put some load on your network infrastructure. I'm not sure if this is handled internally by PostgreSQL in some smart prioritizing way, but if not you can get performance degradation due to network overload. For this reason I suggest joining only one standby immediately, and all others later, during low-load period, one by one, waiting for current to fully synchronize before starting the next one.

But I Insist to Keep the Same Server as Primary!

If for any reason you really want to have the same server as primary again after it is repaired (for example if it has slightly better hardware), you can achieve this. But first you need to create it as a standby, and give it some time to synchronize all the data. After that you can promote it again to primary server by intentionally killing current primary server. Then you'll have to repeat all the steps to join killed server as a standby again.

Where to Go Next?

Well, it depends on your success so far. If your replication doesn't work - start again from the top of this page, or even from the start of this tutorial. But if your replication works as expected - you're lucky! Then the next step for us hard working people is BEER TIME!!!

Image

Tomorrow we'll continue with PostgreSQL HA with pgpool-II - Part 3, where we'll automate the procedure explained here.