PostgreSQL HA with pgpool-II - Part 6

This is the sixth (and hopefully the last) part of the tutorial. Other parts are:

Start from the first part or give up!

6.1 Additional Packages

We need to install few more packages:

$ apt-get install iputils-arping apache2 php5 libapache2-mod-php5 php5-pgsql

First, I must admit that I'm not sure if iputils-arping is the right choice. Watchdog needs arping command, but there are two packages in Ubuntu apt store that are offering it: iputils-arping (I've used above), and arping. I'm not 100% sure which should be installed, and I selected the first one only because it understands flags that are used in pgpool.conf template file (arping_cmd = 'arping -U $_IP_$ -w 1'). If you install arping package, arping command will complain about -U flag.

Apache and PHP are needed for pgpoolAdmin we'll install later.

6.2 ip and arping

If you remember from the last part, we've left two configuration options from pgpool.conf file (if_cmd_path and arping_path) for the next (this) part. Let's explain what is the issue with these: In order to be able to manipulate virtual IP pgpool-II needs to be able to execute ip and arping commands. But there's a catch: these commands are requiring root access, and as you might remember pgpool-II service runs under postgres user which doesn't have root permissions. It is true that we could let the service run as root, but again it wouldn't solve the problem - since we'll install and use pgpoolAdmin (which runs under Apache), www-data user (it is Apache user on Ubuntu) also needs to be able to execute these commands.

There are several ways to accomplish this, and many times mentioned pgpool-II Tutorial [watchdog in master-slave mode] from pgpool.net uses copying command binaries to user's home and changing permissions appropriately. Nevertheless, the tutorial also says:

"Note that explained above should be used for tutorial purpose only. In the real world you'd better create setuid wrapper programs to execute ifconfig and arping. This is left for your exercise."

Well guyz, thanks for exercising me! But it would be much more helpful if you've actually showed how it should be done. Thanks for the tutorial that shows how it shouldn't be done!

Once again I'm left alone to find a way. I've already mentioned that there're a lot of ways to accomplish this, and I've select one, not necessarily the best or the easiest.

6.2.1 sudoers file

The first thing I'll do is to allow postgres and www-data users to sudo execute these commands without being prompted for root password. I've accomplished this by adding the following lines to sudoers file:

postgres ALL=(root) NOPASSWD: /bin/ip
www-data ALL=(root) NOPASSWD: /bin/ip
postgres ALL=(root) NOPASSWD: /usr/bin/arping
www-data ALL=(root) NOPASSWD: /usr/bin/arping

You can add these lines at the end of the file. Of course, confirm that paths provided are correct, and that they are pointing to an existing ip and arping files.

6.2.2 Command Wrappers

The next thing we'll do is creating simple wrapper scripts for mentioned commands. I'll place each script in the same directory where the command it wraps is, and I'll name the scripts by the wrapped command with suffix _w. Let's start with the wrapper for ip command:

#!/bin/bash
# By Fat Dragon, 05/26/2016
# Wraps ip command
 
if [ $UID -eq 0 ]
then
        #echo "Executing: /bin/ip $@"
        /bin/ip $@
else
        #echo "Executing: sudo /bin/ip $@"
        sudo /bin/ip $@
fi
 
exit 0

Similarly, the script for arping command will be:

#!/bin/bash
# By Fat Dragon, 05/26/2016
# Wraps arping command
 
if [ $UID -eq 0 ]
then
        #echo "Executing: /usr/bin/arping $@"
        /usr/bin/arping $@
else
        #echo "Executing: sudo /usr/bin/arping $@"
        sudo /usr/bin/arping $@
fi
 
exit 0

Basically, what the scripts do is forcing sudo execution of wrapped command if currently executing user is not root.

After the scripts are saved you'll need to set the permissions:

$ chmod 0755 /bin/ip_w
$ chmod 0755 /usr/bin/arping_w

After that you can confirm that postgres user is able to sudo-execute the commands without being prompted for password:

root@IT-RDBMS01:~# sudo -i -u postgres
postgres@IT-RDBMS01:~$ ip_w a
Executing: sudo /bin/ip a
1: lo: <loopback,up,lower_up> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <broadcast,multicast,up,lower_up> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:05:05:20 brd ff:ff:ff:ff:ff:ff
    inet 10.1.10.150/16 brd 10.1.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fe05:520/64 scope link
       valid_lft forever preferred_lft forever
postgres@IT-RDBMS01:~$ logout
root@IT-RDBMS01:~#

Notice the third line (Executing: sudo /bin/ip a) - obviously we've succeeded executing ip command with sudo, without being prompted for password. BTW. this line will not appear if you comment out tracing echo statements in the scripts, as I've already did above. Actually I've reactivated echo lines only for this test execution, and commented them out again (a good wrapper should return the exact output gotten form the original command, nothing more).

6.3 Finishing pgpool-II Configuration

Finally, we are ready to finish pgpool-II configuration. Set the following values in pgpool.conf file:

if_cmd_path = '/bin'
if_up_cmd = 'ip_w addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = 'ip_w addr del $_IP_$/24 dev eth0'
arping_path = '/usr/bin'
arping_cmd = 'arping_w -U $_IP_$ -w 1'

Notes:

  • Previous values are very similar to default ones. I've changed only paths (if_cmd_path and arping_path), and command names (ip_w instead of ip and arping_w instead of arping).
  • I want to bring your attention to $_IP_$/24 part: Actual IP address will be set automatically by pgpool (replacing $_IP_$ placeholder). But you should set the subnet (/24) part appropriately, depending on your network.

6.4 Authentication, Access Control and pool_hba.conf

When you think that the beast is defeated - pgpool-II strikes back with yet another catch. This time it is authentication, and this time is really hard to find any sane reason (besides we-want-you-suffer one) for them to implement authentication in a way they did. Let me explain what is all about.

pgpool-II introduces its own authentication mechanism, so that no client can connect to it if not authenticated properly. Sounds like reasonable decision, right? But wait, you need to know the whole story. Unreasonable thing is that pgpool's authentication does not replaces PostgreSQL authentication (which would also be wrong), but it simply adds another layer of authentication, so that a user first must to authenticate with pgpool, and then with PostgreSQL itself in order to execute a query. Having two authentications for a single query (single SELECT statement if you want) is already pointless and wrong. But actual problem is even bigger due to the fact that pgpool's authentication is poorly implemented, and does not uses existing PostgreSQL mechanism. It means that for every database user you'll have to:

  • Manage his password in two different systems. The password has to be the same in both systems, but you must manage it separately, meaning that if you want to change the password you'll have to do so in PostgreSQL and pgpool.
  • You'll have to manage two host-based access (hba) files - pg_hba.conf (well known PostgreSQL's hba file) and pool_hba.conf (pgpool's hba file). Again, the setting in these files must be equivalent. For example I initially planned to set trust authentication for all users in pool_hba.conf and then md5 in pg_hba.conf, to basically disable pgpool authentication and do the actual authentication at PostgreSQL. But it is not possible - if different authentication methods are used - authentication will fail.

It makes sense to have authentication for pgpool's administrative tasks, but introducing another authentication layer for query execution is pointless, and frankly speaking - stupid. When it comes to query execution pgpool should simply pass-through - it is not its responsibility to authenticate. It is not a special-ultra-security product, but failover / load-balancing product.

Sorry for criticism! Let's do this. There are few things we need to do, so let's start by setting the following in pgpool.conf file:

enable_pool_hba = on
pool_passwd = 'pool_passwd'

Next we'll create pool_hba.conf file, by copying from template (remember, while installing pgpool-II in PostgreSQL HA with pgpool-II - Part 4 we've prepared some templates):

$ cp /etc/pgpool2/3.5.2/pool_hba.conf.sample /etc/pgpool2/3.5.2/pool_hba.conf

pool_hba.conf is very similar to pg_hba.conf, except for few limitations (see pgpool manual for details). For the purpose of this tutorial I'll only add one line that allows all users to access all databases from my network (10.1.0.0/16) by using md5 authentication:

host all all 10.1.0.0/16 md5

In order to enable md5 authentication we have to create pool_passwd file. Path and the name of the file are specified in pgpool.conf file (see above). Another interesting pgpool team's decision is that path of the file is specified relative to pgpool.conf file itself, meaning that in our case pool_passwd file has to be placed in the same directory as pgpool.conf file. Content of the pool_passwd file is (in a way) very similar to content of pcp.conf file we've created in the previous part of this tutorial, but pool_passwd file cannot contain comments nor empty lines. Another difference is that md5 hash of the password cannot be created in the same way as for pcp.conf file (no doubt they want you suffer). Entries in pool_passwd file should be created in the following way:

$ pg_md5 -f /etc/pgpool2/3.5.2/pgpool.conf -m -u postgres postgrespassword

Here I've added user postgres with password postgrespassword to pool_passwd file. The command will execute without any command output - it adds user and password to pool_passwd file automatically (if the file doesn't exist it'll be created). As you can see it also requires path to pgpool.conf file as input argument (obviously to read from there where pool_passwd file is). After the execution you can check /etc/pgpool2/3.5.2/pool_passwd file content, and you'll find something like the following there:

postgres:md55cb5bf77d7027e6c4e50fa4112df4d63

If you have multiple users - you'll have multiple line in the file. You can also add lines manually, but if you do so you need to:

  • Ensure that newly added line ends with new-line character. If there's no new-line character at the next execution pg_md5 command will concatenate the next user and its password in the same line.
  • Find a way to create password hash. As I've already mentioned it is not the same hash as in pcp.conf file, and I don't have a clue how this one can be generated.

Finally, you need to ensure that md5 access for a particular user is also enabled in pg_hba.conf.

6.5 Starting pgpool-II

We have finally finished with pgpool-II configuration, so we can enable and start the service:

$ update-rc.d pgpool2 enable
$ service pgpool2 start

6.6 Testing pgpool-II

If you've done everything right now you should be able to see pgpool-II running. You can test the following:

  • service pgpool2 status should report that the service is running on both machines;
  • ifconfig -a should show that one machine has additional IP address (virtual IP) labeled with eth0:0;
  • You should be able to connect to pgpool-II from any other server by using virtual IP and port 5432. You can try with pgAdmin3 for example.

6.7 Installing pgpoolAdmin

It is another part of the procedure that isn't well documented, but luckily it is not too hard figure it out. About the official documentation I'll tell you just two things:

  • It dates back to 2006-2008, and its created for version 2.1 (the current version is 3.5.2);
  • It is written in almost not-understandable English (i.e. "To execute pgpool and the pcp tool set up from the Apach user, the right of access is set.")

6.7.1 Preparing .pcppass File

This file is needed by pgpoolAdmin for authentication for pcp commands. As you may remember that in the previous part we've created pcp.conf file that contains authentication info for executing pcp commands (i.e. pcp_node_count, pcp_node_info, etc.). The very same file and the same authentication info is used for authenticating to pgpoolAdmin portal. For example, we've created there user admin with password pa55w0rd, and now we'll use this combination to login to pgpoolAdmin portal. But it is not the end of authentication - even when you are logged in to pgpoolAdmin portal, different portal functions are trying to execute pcp commands in background, and every execution must be authenticated. Although you've submitted username/password while logging in, the portal does not store this info, and needs username/password for every pcp command execution. This is where .pcppass file becomes important - the portal will always read username/password from the file, without bothering you to enter it again and again.

.pcppass file is very similar to .pgpass file we've created in PostgreSQL HA with pgpool-II - Part 4, with only difference that .pcppass file misses database part. It means that format of .pcppass file entries is:

hostname:port:username:password

In our case we'll create .pcppass file as:

*:*:admin:pa55w0rd

You can read this as: all hosts, all ports, user admin, password pa55w0rd. There are several things to note about the file and its content:

  • Username / password combination must match to one used while creating pcp.conf file, with a difference that in pcp.conf file we need to use md5 hash of the password, while in .pcppass file we need to use password in plain text.
  • .pcppass file should be placed in user's home directory. Actually, it is possible to place the file wherever you want and specify its location in an environment variable (as explained here), but we'll use the first approach.
  • When the file is used by pgpoolAdmin portal (we are doing right now), it should be placed in home directory of the user account under which Apache runs, meaning in www-data user's home directory. Home directory of www-data user is /var/www, so we need to place .pcppass file there.
  • When used with pgpoolAdmin portal the file must contain the same username/password used when we've logged in to the portal.

6.7.1.1 .pcppass File Beyond Apache

Although it is not related to pgpoolAdmin portal we are dealing with here, it is worthwhile noting that the file can be useful even when you are executing pcp commands from command line. In this case it will save you from always being prompted for password. But in this case it has:

  • To be placed in home directory of the user who is executing the commands. For example, if you are executing commands as root user, the file should be placed in /root/.pcppass.
  • To contain appropriate username/password combination. When you are executing pcp commands from command line, you can specify the username by using -u flag (i.e. pcp_node_info -u admin ...). If you've specified username this way, the file must contain this username and its corresponding password. Of course, the same username/password must also be specified in pcp.conf file. If you don't specify username by using -u flag, then your UNIX username will be used, meaning that if you are executing command as root user, .pcppass file (and again pcp.conf file also) must contain root username and its corresponding password. In this case your UNIX password does not have to match to the password specified in pcp.conf and .pcppass files.

Still talking about command line usage, let me tell you how you can know if .pcppass file is set as needed: When you try to execute any pcp command, if you get password prompt it means that the file is not set in the appropriate way.

6.7.2 pgpool.conf and pcp.conf File Permissions

Among other things, pgpoolAdmin portal provides an interface for changing pgpool.conf file, and interface for changing password (meaning changing pcp.conf file). But to be able to save the changes, the portal has to have write permissions on mentioned files. For this reason we'll change file permissions of these files:

$ chown www-data /etc/pgpool2/3.5.2/pgpool.conf
$ chown www-data /etc/pgpool2/3.5.2/pcp.conf

This ownership change won't break anything in our previous setup: postgres user (pgpool2 service runs under) doesn't change these files anyway, only reads them, so it doesn't have to be the owner.

6.7.3 Installing the Portal

Finally we can install the portal itself. As I've mentioned, pgpoolAdmin installation is not too hard, and I'll simply provide a script that performs it, with explanations in comments:

# Navigate to temporary directory
cd /tmp
 
# If archive exists delete it
if [ -f pgpoolAdmin-3.5.2.tar.gz ]; then
    rm pgpoolAdmin-3.5.2.tar.gz
fi
 
# Download installation archive
wget http://www.pgpool.net/download.php?f=pgpoolAdmin-3.5.2.tar.gz -O pgpoolAdmin-3.5.2.tar.gz
 
# If extracted directory exists delete it
if [ -d pgpoolAdmin-3.5.2 ]; then
    rm -r pgpoolAdmin-3.5.2.tar.gz
fi
 
# Extract the archive
tar -xzf pgpoolAdmin-3.5.2.tar.gz
 
# Delete archive file
rm pgpoolAdmin-3.5.2.tar.gz
 
# If virtual directory exists delete it
if [ -e /var/www/html/pgpooladmin ]; then
    rm -r /var/www/html/pgpooladmin
fi
 
# Move extracted archive to the new location (under Apache root directory)
mv pgpoolAdmin-3.5.2 /var/www/html/pgpooladmin
 
# Change ownership of the directory
chown root:root -R /var/www/html/pgpooladmin
 
# Adjust file and folder permissions
chmod 0777 /var/www/html/pgpooladmin/templates_c
chown www-data /var/www/html/pgpooladmin/conf/pgmgt.conf.php
chmod 0644 /var/www/html/pgpooladmin/conf/pgmgt.conf.php

After executing the script you should be able to access the portal, so first check if Apache / PHP are working as expected by opening http://it-rdbms01/pgpooladmin/install/phpinfo.php (you should change hostname appropriately). At this location you should find standard phpinfo page starting with something like:

Image

Two parts are especially important: Multibyte Support should be enabled:

Image

And PostgreSQL Support should be enabled:

Image

If you don't see pgsql section at all, it's probably because Apache server was started before you've installed php5-pgsql package. Try with restarting Apache (service apache2 restart) and refreshing the page. If the section is still missing ensure that php5-pgsql package is actually installed.

6.7.3.1 Installation Wizard

Once you've ensured that everything is ok with PHP and Apache, start installation wizard by opening http://it-rdbms01/pgpooladmin/install/index.php (again change the host). You should get something like:

Image

Select your language and click "Next".

The second step of the wizard is "Directory Check". You should see two green checks. If not - you probably haven't set the appropriate file permissions (recheck the last three lines form the script above). Click "Next".

In the third step there are many fields with green checks and red X's. In order to finish the installation we need to ensure that there are no more red X's, but:

Probably the only cause for red X's is wrong path, and we'll fix it. Here's what I've had to change to make it green:

Field Value
pgpool.conf File /etc/pgpool2/3.5.2/pgpool.conf
pcp.conf File /etc/pgpool2/3.5.2/pcp.conf
pgpool Command /usr/sbin/pgpool
PCP directory /usr/sbin

If you've used different directories during pgpool-II installation process - you'll have to appropriately change the values here also.

After changing the values click "Check" button again and you should get all greens. Click "Next". You should get something like:

Image

As you can see, we should now delete /var/www/html/pgpooladmin/install directory, so do it. Click "top" button, and you'll end up on a login page. Enter username / password (you've defined while creating pcp.conf file in PostgreSQL HA with pgpool-II - Part 5), and you should be in. You should see something like:

Image

If you see some error messages like "failed to execute pcp_...", "e1002", etc., chances are that you haven't set .pcppass file appropriately. Look again above and recheck that the file is set as needed.

Note that since I've used admin user (which is not PostgreSQL user) to login to this session, there's "superuser: unknown (Connection error)" on the previous screenshot. Also majority of buttons are grayed out. But if you login by using postgres user, you'll get something like:

Image

6.8 All Problems Solved???

Well, no. Obviously pgpool-II is very immature (probably even too immature) product that won't ever stop causing some headache. At this moment I'm struggling with a new question: why two pgpool-II instances (which are targeting the same cluster) are showing different info??? I'll give you an example - screenshot of my IT-RDBMS01 pgpool instance (the same screen as the previous one):

Image

As you can see even a single instance provides contradictory info: node 1 is at same time "Down" and "Up". And there's button "Return" which is never and nowhere described. What the hack this button does? And what it means in the first place? Term "return" hasn't been used in PostgreSQL replication terminology, so obviously pgpool team has introduced a new one without any explanation. Guys, you can name a button "Pray it back" or "Normalize it up", but at least you should explain what actually this button does. Does this button initializes recovery (recovery_1st_stage, etc.) or not?

And here's the screenshot from my IT-RDBMS02 instance:

Image

The same cluster, the same servers, the same moment, two pgpool-II instances. Disappointing... Very disappointing.

6.9 Where to Go Next?

Well, go wherever you want - I'm not your mom!  

Just kidding, of course. Now you know how to implement PostgreSQL High Availability with pgpool-II so the world is yours! But if you insist on my advice - beer is always a good choice.

Image

Comments

Submitted byfatdragon on Tue, 05/31/2016 - 01:38

Congrats to myself! I've did it! So let's get this beer...

Submitted byAnselme Rabouan (not verified) on Wed, 06/08/2016 - 13:02

Congratulations!
What a very useful and well explained documentation!
Correct me if I am wrong but at point 6.4, you can also mention that client authentication can be simply disabled by setting up "enable_pool_hba" to false.
Doing so, authentication is only done by postgreSQL servers.

Submitted byfatdragon on Wed, 06/08/2016 - 13:12

In reply to by Anselme Rabouan (not verified)

Thanks for feedback Anselme!
Regarding enable_pool_hba - you might be right. I must admit that I haven't tried (I was always assuming for some reason that it should be 'on'...) Shame on me!
Have you tried doing so?
If you're right I'll have to apologize to pgpool crew... :)
Btw. can you tell me how you've reached this post? (I'm curious because it is still pretty new website...)
Thanks again!

Submitted byAnselme Rabouan (not verified) on Wed, 06/08/2016 - 14:42

In reply to by fatdragon

I tried yes, it works :-)
I landed on your post after having searched "pgpool high availability" in Google.
I then browsed the "http://dba.stackexchange.co..." result where you (fortunately) mention your blog article :-)

Submitted byTheBigNoob (not verified) on Fri, 07/29/2016 - 03:06

This has been a really useful tutorial series! I have a 4 node cluster setup now with PgPool2 thanks to you. I have a question when it concerns failover handling, are the slave nodes supposed to be pointing to the new master after failover, or is that something I have to initiate myself? As I currently have it, it only seems to failover to the next slave, and does not initate replication of the old slave unto the new (hopefully the beer money will suss out a response ;D ). Either way, much better then static single points of failures I use to have! Cheers!

Submitted byfatdragon on Fri, 07/29/2016 - 04:54

In reply to by TheBigNoob (not verified)

:) Thanks for the beer mate! I would answer anyway, but it's kinda easier with beer :) So here's the answer:

You can think of failover as a major event in your infrastructure after which everything should be rebuilt. As described in the second part of the tutorial, after failover you'll literally kill all previous slaves (expect for the one that has become the new primary, of course), and recreate them from scratch, as if they've never exist. Check this section again: /blog/2016/05/postgresql-ha-pgpool-ii-part-2#failover-vs-full-promotion

Rebuilding all the slaves may sound scary, but actually it's not. In the third part I've provided automation scripts that will allow you doing this easily.

Another question is should you do this automatically. It is possible to setup pgpool-II to automatically execute the script after the failover is performed, and you can get the whole infrastructure setup without any action on your side. But I don't like this approach because of reasons explained in the section linked above. Instead I would bring back only one slave immediately, and all the other slaves during the first low-load period (after office hours, night time, etc.) But if it is important for you to have all the slaves back as soon as possible (for example if number of slaves is important for load balancing) - you can do this.

Submitted byTheBigNoob (not verified) on Fri, 07/29/2016 - 16:58

In reply to by fatdragon

Many thanks for the clarification Aleksandar! I was under the impression this was the case, just wanted to be sure. I think that just means I have one follow up and I should be squared away here. Here's my show pool_nodes post cluster setup:

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt
---------+-------------------------------------------+------+--------+-----------+---------+------------
0 | ip-172-31-2-4.us-west-1.compute.internal | 5433 | 2 | 0.250000 | primary | 0
1 | ip-172-31-2-20.us-west-1.compute.internal | 5433 | 2 | 0.250000 | standby | 0
2 | ip-172-31-2-22.us-west-1.compute.internal | 5433 | 2 | 0.250000 | standby | 0
3 | ip-172-31-2-21.us-west-1.compute.internal | 5433 | 2 | 0.250000 | standby | 0

and here's what happens when I fail the the master either by turning the vm off or by killing postgres

postgres=# show pool_nodes;
node_id | hostname | port | status | lb_weight | role | select_cnt
---------+-------------------------------------------+------+--------+-----------+---------+------------
0 | ip-172-31-2-4.us-west-1.compute.internal | 5433 | 3 | 0.250000 | standby | 0
1 | ip-172-31-2-20.us-west-1.compute.internal | 5433 | 2 | 0.250000 | primary | 0
2 | ip-172-31-2-22.us-west-1.compute.internal | 5433 | 3 | 0.250000 | standby | 0
3 | ip-172-31-2-21.us-west-1.compute.internal | 5433 | 3 | 0.250000 | standby | 0

Postgres clearly fails over to node 1, and marks the rest of the nodes as unavailable as they are either no longer available (node 0) or no longer in sync with the new master (2, 3). Let's say now I have initiated replication on the old slaves with with the new master, how do I get pgpool to see that they are available again? You're correct though the pgpool documentation is quite a bit lacking

I've been putting a lot of time automating all this with saltstack in aws, and although I'm wary of the adversity of having the old slaves connect to new master through complete automation, I'm trying to get to the point where I can call a simple state and it sets it to the new master and readds the nodes back into pgpool.

Submitted byfatdragon on Fri, 07/29/2016 - 20:52

In reply to by TheBigNoob (not verified)

It is very important to actually completely understand the procedure and all the steps. For this reason let's first forget about automating everything, and try to finish everything manually (by manually executing the scripts). Also for the moment forget about pgpool-II - let's focus only on PG and replication. So after the failover the following steps should be done:

1) Perform "full promotion" of the new primary server. You need to understand the difference between failover and "full promotion", explained in the second part of this tutorial. You need to know how to perform "full promotion" and when "full promotion" takes place. If you are not 100% confident that you understand and know that - let me know and I'll try to help. If you know how to do that - do it manually.

2) After the new primary is "fully promoted" you should join old slaves to it (and old primary as a slave eventually). You need to know how to do this. If you are using replication with replication slots - it is very simple - you should simply repeat the same procedure you've done when initially creating them. Again, if you know how to do this - do it manually. Let me know if you need any further help.

3) If you done the previous two steps pgpool-II should automatically pickup and show status of the new primary and its slaves.

After you've done (and understand) everything as described here you can plan automating these steps. Automating these steps is actually very simple - you should simply "tell" pgpool-II which scripts to execute in which moment (in pgpool config).

Please let me know if you've done everything described here, and if you've had any problems.

Submitted byfatdragon on Tue, 08/02/2016 - 12:32

In reply to by TheBigNoob (not verified)

Just to check if everything is OK? Have you performed full promotion and other stuff? Do you need help with automating?

Submitted byTheBigNoob (not verified) on Wed, 08/03/2016 - 01:05

In reply to by fatdragon

Hey Aleksandar,

Thanks again for your follow ups, I really appreciate your help on this, and I apologize for being MIA the past 4 days. I really want to make sure I understand the process before cluttering up your comment section ;)

Since my salt states are at the point where I can rapidly bring up a new cluster without much effort I've been able to test a variety of scenarios. I _think_ I understand the difference between full promotion (removing recovery.conf, restarting service) and failover (touching a trigger file to exist).

As it stands it setups up a Postgres 9.1 async streaming replication along with wal archving with wal-e to s3. I've modified the scripts so they're not to create replication slots, and use the wal-e recovery (and is tested working correctly) Since this is being set up in AWS, I've also setup some IAM roles and appropriate security groups along with a wd_escalation_command to reallocate the private ip address to the machine performing the if_up command with the ec2 command line tools. This also seems to be working correctly since when i power off the master machine the virtual ip does get moved over to the next node with a wd_priority of 1.

Now to prevent extra clutter, here's a paste bin where I go through my sanity checks of replication correctly setting up, as well as starting pgpool and causing a failover scenario http://pastebin.com/PPjFjraZ

Let me know if you're confused by any of this, have any follow up questions, or let me know if I'm being a big dummy and I forgot to do something crucial :D

Submitted byfatdragon on Wed, 08/03/2016 - 03:53

In reply to by TheBigNoob (not verified)

Well, you've done a lot :) My comments:
1) I don't have any experience with WAL archiving, and I don't know how pgpool-II behaves in this scenario (how it detects if slaves are online or not), so I can't help you with your questions. You're more or less alone there.
2) As far as I can see there's no any crucial mistake in your approach.
3) You are obviously going "the right" way when it comes to understanding everything - there's no better way than countless try-and-learn cycles. This is the only way to get the full understanding what's happening behind, and once you have this understanding it's piece of cake to setup automation scripts and to administer the whole infrastructure.
4) The only thing I would do differently is that I would reduce number of pgpool instances. I believe that you are aware that you don't have to have the same number of pogpool instances as pg nodes? With postgresql itself number of nodes can be important if you've setup load balancing (read queries will be distributed among all the nodes), but no matter how many pgpool instances you have - only one works at any given moment - others are simply backup. But who knows - you've approach might be even simpler for administration because things are pretty much unified (every server is the same...)

If you stuck - feel free to contact me - although WAL is not "my cup of tea", it might happen that I can help. (But please be aware that I'm not a doctor on the subject. Actually I believe that you've already reached my level of knowledge, and that in the future I'll have to buy you a beer to learn something from you :))

Submitted byTheBigNoob (not verified) on Fri, 08/05/2016 - 05:25

In reply to by fatdragon

Here's what I understand so far, with no response from the usergroup email thread, and again long winded.

1/2) I'm using WAL-E for point in time recovery. If we bite the dust, it's pretty easy to rewind, especially storing everything to S3. I've recently read PostgreSQL Replication Second Edition by Hans-Jürgen Schönig and it also has some really good introductions into some behind the scenes stuff that happens in different PostgreSQL replication environments and also something else interesting (at least to me), you can run WAL archiving simultaneously with streaming replication running as well, so that's what I've done here. This could also be the reason that I'm experiencing this behavior, I'll go back once I get the first set of changes out the door, but it's an overall improvement from what was in place.

3/4 and beyond) This project was my first big foray into using multiple parts of the saltstack system, the idea here is that a server can be disposable without much fear since everything is also getting stored in s3, multiple slaves, everything getting setup through the same automation and checking the same source of truth for the same informatinon (who is master, salt has it's own way of storing and updating those variables on failover that make it very useful), etc. if pgpool is idling on other machines so be it, more machines to keep quorum. I'm almost to the point where any new postgres server fails, or one is added to the cluster the new server's information is updated to the rest of the servers in the cluster.

My problem might just be a potential design limitation with PostgreSQL 9.1, I'm by no means a certified DBA, but it might be directly related to replication slots and pgpool, or something to do with logical replication also in play after 9.4, this is merely speculation from what I've read. Right now I'm not having pgpool automatically promote a new master, however pgpool will move a VIP if necessary. I will have to promote for right now, which with everything laid out here, it is no big deal. The load balancing and connection pooling was certainly what I was looking for in a generally read heavy db setup I have (and I may even take advantage of pgpool's memcached settings). I've also read you can alleviate client disconnection in failover by putting pgbouncer in front of pgpool, I think that is due to pgbouncer's event driven nature. Part of me is also unsure if putting that much middleware in place is that good of an idea as well, only one way to find out really, but even that is further down the road.

This is seriously (for me) the best thing out there right now for an introduction to getting pgpool up and running for watchdog/vip/failover in my opinion, and thank you again for sharing all of this.

Cheers!

Submitted byNesh (not verified) on Mon, 08/29/2016 - 19:48

Hey Aleksandar/Fat Dragon:

I want to start off by saying that this tutorial was/is pure perfection, I cannot imagine have done all this without you. Impressive job and dedication (the pgpool team want us all to burn in hell).

I have gone through it all but the only thing that I didn't manage to get to work is the `superuser` part. I re-read your steps and tried to further research it but nothing, I still get: `superuser: unknown (Connection error)`.

It's the same pass as the postgres user, I put `trust` on both servers for the internal network in the hba and also tried to query the DB to see if it worked:

`sudo -u postgres psql -p 5433 -U postgres -W template1 -c "SELECT usesuper FROM pg_user WHERE usename = 'postgres'"`

usesuper
----------
t
(1 row)

What could be wrong?

Btw, FYI, I had to make some changes to get this tutorial to work on 16.04, here are the changes (mainly when setting up pgpooladmin and apache2):

```
`LC_ALL=C.UTF-8 add-apt-repository -y ppa:ondrej/php5-5.6`
sudo apt-get update
sudo apt-get install php5.6

apt-get install iputils-arping apache2 libapache2-mod-php5.6 php5.6-pgsql -y
```

This is because php5 is not available on Xenial and pgpoolAdmin is not compatible with php7.0 (it uses some php5-only stuff).

Submitted byfatdragon on Thu, 09/01/2016 - 17:12

In reply to by Nesh (not verified)

Hi Nesh

Thanks for you feedback! Sadly, since I've gone in another direction since writing this tutorial, I'm not able to help. Simply I can't remember...

I hope you'll find your way without my help.

Submitted byNesh (not verified) on Mon, 09/12/2016 - 21:11

In reply to by fatdragon

Hey Aleksandar,

Is there anything you remember about the superuser? Really, I have gone through your guide countless times and I just can't get it to work. The web is plain useless without the superuser and it would be a shame, since it would make the whole tutorial less complete.

Let me know!

Submitted byfatdragon on Tue, 09/13/2016 - 01:31

In reply to by Nesh (not verified)

I've just remembered (although I'm not 100% sure) that you should login to pgpool-II web console as postgres user. Have you tried that?

Submitted byNesh (not verified) on Tue, 09/13/2016 - 09:16

In reply to by fatdragon

Yeah, I did that and I get exactly the same thing but with the postgres user (instead of the admin). I also put the same password there as the postgres user but still nothing.

It's just so weird.

Submitted byfatdragon on Tue, 09/13/2016 - 09:39

In reply to by Nesh (not verified)

Hmmm, weird...
I'm pretty sure that I've wrote everything I've done, and as you can see on the screenshots - I've got superuser privileges. So only thing that I can suggest is going through the whole tutorial carefully and triple-checking every detail. Painful, I know...
BTW. I was so frustrated with everything that I gave up not only on PostgreSQL and pgpool-II, but on RDBMS in general. I've migrated all my projects to Cassandra and now I'm enjoying in it's native HA... Very different world, but incomparably more convenient for large scale systems.

Submitted byNesh (not verified) on Wed, 09/14/2016 - 08:05

In reply to by fatdragon

Oh, I see.
Well, both me and the guy above seemingly did all the steps, so it must be some update somewhere, I guess...
Or we both made the exact same mistake.

Submitted byfatdragon on Thu, 09/15/2016 - 01:58

In reply to by Nesh (not verified)

Don't get me wrong - it's not my intention to defend the tutorial (I don't have any reason to), but don't forget that although you two guys are experiencing the same issue there are quite a few people who confirmed that "everything works". So I believe that you should recheck the every detail.

Good luck anyway!

Submitted byMohamed Lotfi … (not verified) on Thu, 09/01/2016 - 00:00

Hey Mr Aleksandar thank you for this tutorial :)
please can you help me,
everything works but i can not login with "postgres" user in pgpoolAdmin and when I restarts pgpool service I have this message

* Restarting pgpool-II pgpool
FD - Starting pgpool-II by executing:
/usr/sbin/pgpool -f /etc/pgpool2/3.5.2/pgpool.conf -F /etc/pgpool2/3.5.2/pcp.conf -m fast stop
2016-08-31 22:51:13: pid 4611: WARNING: failed while loading hba configuration from file:"/usr/share/pgpool2/3.5.2/etc/pool_hba.conf"
2016-08-31 22:51:13: pid 4611: DETAIL: fopen failed with error: "No such file or directory"
2016-08-31 22:51:13: pid 4611: LOG: stop request sent to pgpool. waiting for termination..

I look forward to your response.

sorry for my bad english

Submitted byDirk (not verified) on Wed, 11/30/2016 - 11:04

In reply to by Mohamed Lotfi … (not verified)

Hey Mohamed, it now 3 months ago, how do solve the problem?

Submitted byMohamed Lotfi … (not verified) on Thu, 09/01/2016 - 16:42

update
everything works :)

Submitted byfatdragon on Thu, 09/01/2016 - 17:14

In reply to by Mohamed Lotfi … (not verified)

Hi Mohamed

Thanks for the feedback! I'm glad to hear that you've find the solution meanwhile. I won't be able to help anyway since I'm not in the subject anymore. Anyway, if you think that your solution can help others - please share it here.

Thanks again

Submitted byGheorghe Chirica (not verified) on Tue, 09/06/2016 - 12:03

Thx Aleksandar, great tutorial(enjoy a beer from me ;)). I manage to have smth working, but not fully. I just want to ask anyone here: how can we automatically manage online recovery? Currently as I see only recovery is triggered from the admin panel. Now, I want to make it so that: once a new master node is promoted, recovery is triggered automatically and the standby node is recovered(of course it can be down, but we can manage that). Any ideas how can I accomplish that?

Thx, George.

Submitted byfatdragon on Tue, 09/13/2016 - 01:29

In reply to by Gheorghe Chirica (not verified)

George, please sorry for my late answer! And thank you for the compliments and for the beer!

Sadly, I'm not able to help you with your question. I'm not actively in PostgreSQL / pgpool anymore, and I've forgot details...

Sorry, and thanks!

Submitted byKarol Małyszko (not verified) on Mon, 09/12/2016 - 13:26

Hey Alexandar
A masterpiece of a tutorial :) I've got one question, though. How in the name of all that's holy You managed to activate 'superuser' mode in pgpoolAdmin? I've been using 'postgres' user everywhere, with the same password and followed the tutorial slavishly, but with (and without) pool_hba i still got 'connection error'. And without superuser priviledges I can't promote slave server to master role via web GUI.
Any thoughts? Hints? Places to sacrifice a black goat in the black of the night?

Submitted byNesh (not verified) on Mon, 09/12/2016 - 21:09

In reply to by Karol Małyszko (not verified)

Welcome to my pain, Karol!

I have tried following Aleksandar's directions and failed. I have tried going to IRC and ask around and failed. I am quite hopeless now: phppgadmin without superuser is just plain useless.

Keep me updated, will ya? :D

Submitted byDirk (not verified) on Fri, 11/25/2016 - 08:49

In reply to by Nesh (not verified)

Yes your going right. I Have the same problem. How do you solve it?

Submitted byfatdragon on Tue, 09/13/2016 - 01:20

In reply to by Karol Małyszko (not verified)

Thanks for the compliments guyz!

Regarding the superuser: as far as I remember you should login to pgpool-II web console as postgres user. Have you tried that?

Submitted byKarol Małyszko (not verified) on Tue, 09/13/2016 - 08:48

In reply to by fatdragon

Yup. Logged in as postgres, with password md5-hashed in pcp.conf and pool_passwd, in .pcppass just plain-texted. Password is the same as in PostgreSQL.
On the other hand, i may have messed pg_hba some. Checking this right now.

Submitted byDirk (not verified) on Fri, 11/25/2016 - 08:48

In reply to by Karol Małyszko (not verified)

It's two moths ago. I have the same problem. do you solve the problem and how?

Submitted byKarol Małyszko (not verified) on Sat, 11/26/2016 - 17:21

In reply to by Dirk (not verified)

Nope, still in the same dark place. I had to drop the project though, I've got people at my back ranting about upgrading to Ubuntu 16, on which pgpool-admin doesn't work at all (tried it on my first attempt). Right now i'm trying to create a master-master replica via Bucardo and do loadbalancing and watchdog with HAproxy and keepalived, respectively. If it works, I'll let you guys know. Keep fingers crossed ;)

Submitted byKarol Małyszko (not verified) on Thu, 12/01/2016 - 15:17

In reply to by Dirk (not verified)

Hey
You might want to look very closely at postgres security settings in pg_hba.conf -> md5, trust and what-nots. Bumped into a problem with streaming replication, where I followed directives from part 2 of this tutorial. When I set

CREATE ROLE replication WITH REPLICATION PASSWORD 'reppassword' LOGIN;and then done .pgpass as i should AND set pg_hba.conf with
host replication replication 10.1.10.155/32 md5
i could not initiate pg_basebackup from slave.
When I did 'create role replication with encrypted password 'reppassword' login;" it went OK. But from then on we have a difference between tutorial and authentication methods required by pgpool. This may be our perp.

HTH

Cheers

Submitted bySudhakar Pandey (not verified) on Thu, 10/20/2016 - 10:51

Amazing tutorial.. it really help me.. Everything is explained regarding pgpool in this series..

Submitted byDirk (not verified) on Fri, 11/25/2016 - 09:02

Starting pgpool-II don't works. Welcome in the permisson hell!

host$ service pgpool2 start

Every user has the same Password!

I got a password query, enter it and got a legitimation error. :-(

pgp.conf and pool_passwd looks ok and are NOT the same how its explained!

Can someone help me?

Submitted byDirk (not verified) on Wed, 12/07/2016 - 09:20

The both failover files don't has the same number/values
of parameter.

old failover.sh

#!/bin/bash -x
FALLING_NODE=$1 # %d
OLDPRIMARY_NODE=$2 # %P
NEW_PRIMARY=$3 # %H
PGDATA=$4 # %R

new failover

FALLING_NODE=$1 # %d
OLDPRIMARY_NODE=$2 # %P
NEW_PRIMARY=$3 # %H
REPL_PASS=$4
TRIGGER_FILE=$5

Now the Question: How does it works with the
pgpoolAdmin together, I can't see the call :-/

Submitted byDavid Berardozzi (not verified) on Thu, 12/29/2016 - 16:40

THANKS A LOT FOR THIS EXCELLENT TUTORIAL !!
For those who encountered the issue with superuser "connection error" in pgpooladmin, I finally figured out that it was related to an incompatibility between pg_hba.cong and pool_hba.conf. See the references stated in this answer http://stackoverflow.com/a/41382767/6202361
Regards

Submitted byLeonard Ödalen Frank (not verified) on Tue, 03/07/2017 - 08:56

I encountered the same issue with "Connection Error" for superuser. Solved it by giving postgres user access to all databases in pg_hba.conf.

Example.

host all postgres x.x.x.101/32 md5
host all postgres x.x.x.102/32 md5

Another thing worth metioning is that you don't need to use pool_hba, just set enable_pool_hba = off in pgpool.conf and let PostgreSQL handle the authentication.

Submitted byRitesh Rajput (not verified) on Sat, 03/18/2017 - 22:20

really great tutorial. I managed to run failover and recovery 1st stage commands through command line as postgres user. However, once i try to run the same commands using pgpooladmin, it does not work. The main reason behind that is when we run the commands through pgpooladmin, it wuns through apache/www-data user and it throws the error as "host verificaition failed". Can anyone help me with this ?

Submitted byLudovic Darcos (not verified) on Thu, 04/20/2017 - 12:35

Hi! Thanks for this great tutorial.
I easily managed to adapt it and use it on CentOS 7.x with the last 9.6 release of PostgreSQL and its associated pgpool-II, available as RPM in the postgres yum repository, thanks of your detailed explanations of how it actually works.

Concerning the PgpoolAdmin portal and its contradictorious infos, I have a similar situation. But well, it works :)

For sure, as I'm with you thinking it's not pgpool2 role to manage (again) the postgresql authentication, I set the pool_hba.conf with a single host all all 0.0.0.0/0 trust, and set my pg_hba.conf as needed by my requirements. It works like a charm, and I don't have to update my authentication twice.

As you say... it's time for me having a beer!

Cheers!

Add new comment

Anonymous comments require solving captcha, and waiting for administrator's approval.