PostgreSQL HA with pgpool-II - Part 6

This is the sixth (and hopefully the last) part of the tutorial. Other parts are:

Start from the first part or give up!

6.1. Additional Packages

We need to install few more packages:

apt-get install iputils-arping apache2 php5 libapache2-mod-php5 php5-pgsql

First, I must admit that I'm not sure if iputils-arping is the right choice. Watchdog needs arping command, but there are two packages in Ubuntu apt store that are offering it: iputils-arping (I've used above), and arping. I'm not 100% sure which should be installed, and I selected the first one only because it understands flags that are used in pgpool.conf template file (arping_cmd = 'arping -U $_IP_$ -w 1'). If you install arping package, arping command will complain about -U flag.

Apache and PHP are needed for pgpoolAdmin we'll install later.

6.2. ip and arping

If you remember from the last part, we've left two configuration options from pgpool.conf file (if_cmd_path and arping_path) for the next (this) part. Let's explain what is the issue with these: In order to be able to manipulate virtual IP pgpool-II needs to be able to execute ip and arping commands. But there's a catch: these commands are requiring root access, and as you might remember pgpool-II service runs under postgres user which doesn't have root permissions. It is true that we could let the service run as root, but again it wouldn't solve the problem - since we'll install and use pgpoolAdmin (which runs under Apache), www-data user (it is Apache user on Ubuntu) also needs to be able to execute these commands.

There are several ways to accomplish this, and many times mentioned pgpool-II Tutorial [watchdog in master-slave mode] from pgpool.net uses copying command binaries to user's home and changing permissions appropriately. Nevertheless, the tutorial also says:

"Note that explained above should be used for tutorial purpose only. In the real world you'd better create setuid wrapper programs to execute ifconfig and arping. This is left for your exercise."

Well guyz, thanks for exercising me! But it would be much more helpful if you've actually showed how it should be done. Thanks for the tutorial that shows how it shouldn't be done!

Once again I'm left alone to find a way. I've already mentioned that there're a lot of ways to accomplish this, and I've select one, not necessarily the best or the easiest.

6.2.1. sudoers file

The first thing I'll do is to allow postgres and www-data users to sudo execute these commands without being prompted for root password. I've accomplished this by adding the following lines to sudoers file:

postgres ALL=(root) NOPASSWD: /bin/ip
www-data ALL=(root) NOPASSWD: /bin/ip
postgres ALL=(root) NOPASSWD: /usr/bin/arping
www-data ALL=(root) NOPASSWD: /usr/bin/arping

You can add these lines at the end of the file. Of course, confirm that paths provided are correct, and that they are pointing to an existing ip and arping files.

6.2.2. Command Wrappers

The next thing we'll do is creating simple wrapper scripts for mentioned commands. I'll place each script in the same directory where the command it wraps is, and I'll name the scripts by the wrapped command with suffix _w. Let's start with the wrapper for ip command:

#!/bin/bash
# By Fat Dragon, 05/26/2016
# Wraps ip command
 
if [ $UID -eq 0 ]
then
        #echo "Executing: /bin/ip $@"
        /bin/ip $@
else
        #echo "Executing: sudo /bin/ip $@"
        sudo /bin/ip $@
fi
 
exit 0

Similarly, the script for arping command will be:

#!/bin/bash
# By Fat Dragon, 05/26/2016
# Wraps arping command
 
if [ $UID -eq 0 ]
then
        #echo "Executing: /usr/bin/arping $@"
        /usr/bin/arping $@
else
        #echo "Executing: sudo /usr/bin/arping $@"
        sudo /usr/bin/arping $@
fi
 
exit 0

Basically, what the scripts do is forcing sudo execution of wrapped command if currently executing user is not root.

After the scripts are saved you'll need to set the permissions:

chmod 0755 /bin/ip_w
chmod 0755 /usr/bin/arping_w

After that you can confirm that postgres user is able to sudo-execute the commands without being prompted for password:

root@IT-RDBMS01:~# sudo -i -u postgres
postgres@IT-RDBMS01:~$ ip_w a
Executing: sudo /bin/ip a
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 00:15:5d:05:05:20 brd ff:ff:ff:ff:ff:ff
    inet 10.1.10.150/16 brd 10.1.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::215:5dff:fe05:520/64 scope link
       valid_lft forever preferred_lft forever
postgres@IT-RDBMS01:~$ logout
root@IT-RDBMS01:~#,multicast,up,lower_up>,up,lower_up>

Notice the third line (Executing: sudo /bin/ip a) - obviously we've succeeded executing ip command with sudo, without being prompted for password. BTW. this line will not appear if you comment out tracing echo statements in the scripts, as I've already did above. Actually I've reactivated echo lines only for this test execution, and commented them out again (a good wrapper should return the exact output gotten form the original command, nothing more).

6.3. Finishing pgpool-II Configuration

Finally, we are ready to finish pgpool-II configuration. Set the following values in pgpool.conf file:

if_cmd_path = '/bin'
if_up_cmd = 'ip_w addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = 'ip_w addr del $_IP_$/24 dev eth0'
arping_path = '/usr/bin'
arping_cmd = 'arping_w -U $_IP_$ -w 1'

Notes:

  • Previous values are very similar to default ones. I've changed only paths (if_cmd_path and arping_path), and command names (ip_w instead of ip and arping_w instead of arping).
  • I want to bring your attention to $_IP_$/24 part: Actual IP address will be set automatically by pgpool (replacing $_IP_$ placeholder). But you should set the subnet (/24) part appropriately, depending on your network.

6.4. Authentication, Access Control and pool_hba.conf

When you think that the beast is defeated - pgpool-II strikes back with yet another catch. This time it is authentication, and this time is really hard to find any sane reason (besides we-want-you-suffer one) for them to implement authentication in a way they did. Let me explain what is all about.

pgpool-II introduces its own authentication mechanism, so that no client can connect to it if not authenticated properly. Sounds like reasonable decision, right? But wait, you need to know the whole story. Unreasonable thing is that pgpool's authentication does not replaces PostgreSQL authentication (which would also be wrong), but it simply adds another layer of authentication, so that a user first must to authenticate with pgpool, and then with PostgreSQL itself in order to execute a query. Having two authentications for a single query (single SELECT statement if you want) is already pointless and wrong. But actual problem is even bigger due to the fact that pgpool's authentication is poorly implemented, and does not uses existing PostgreSQL mechanism. It means that for every database user you'll have to:

  • Manage his password in two different systems. The password has to be the same in both systems, but you must manage it separately, meaning that if you want to change the password you'll have to do so in PostgreSQL and pgpool.
  • You'll have to manage two host-based access (hba) files - pg_hba.conf (well known PostgreSQL's hba file) and pool_hba.conf (pgpool's hba file). Again, the setting in these files must be equivalent. For example I initially planned to set trust authentication for all users in pool_hba.conf and then md5 in pg_hba.conf, to basically disable pgpool authentication and do the actual authentication at PostgreSQL. But it is not possible - if different authentication methods are used - authentication will fail.

It makes sense to have authentication for pgpool's administrative tasks, but introducing another authentication layer for query execution is pointless, and frankly speaking - stupid. When it comes to query execution pgpool should simply pass-through - it is not its responsibility to authenticate. It is not a special-ultra-security product, but failover / load-balancing product.

Sorry for criticism! Let's do this. There are few things we need to do, so let's start by setting the following in pgpool.conf file:

enable_pool_hba = on
pool_passwd = 'pool_passwd'

Next we'll create pool_hba.conf file, by copying from template (remember, while installing pgpool-II in PostgreSQL HA with pgpool-II - Part 4 we've prepared some templates):

cp /etc/pgpool2/3.5.2/pool_hba.conf.sample /etc/pgpool2/3.5.2/pool_hba.conf

pool_hba.conf is very similar to pg_hba.conf, except for few limitations (see pgpool manual for details). For the purpose of this tutorial I'll only add one line that allows all users to access all databases from my network (10.1.0.0/16) by using md5 authentication:

host all all 10.1.0.0/16 md5

In order to enable md5 authentication we have to create pool_passwd file. Path and the name of the file are specified in pgpool.conf file (see above). Another interesting pgpool team's decision is that path of the file is specified relative to pgpool.conf file itself, meaning that in our case pool_passwd file has to be placed in the same directory as pgpool.conf file. Content of the pool_passwd file is (in a way) very similar to content of pcp.conf file we've created in the previous part of this tutorial, but pool_passwd file cannot contain comments nor empty lines. Another difference is that md5 hash of the password cannot be created in the same way as for pcp.conf file (no doubt they want you suffer). Entries in pool_passwd file should be created in the following way:

pg_md5 -f /etc/pgpool2/3.5.2/pgpool.conf -m -u postgres postgrespassword

Here I've added user postgres with password postgrespassword to pool_passwd file. The command will execute without any command output - it adds user and password to pool_passwd file automatically (if the file doesn't exist it'll be created). As you can see it also requires path to pgpool.conf file as input argument (obviously to read from there where pool_passwd file is). After the execution you can check /etc/pgpool2/3.5.2/pool_passwd file content, and you'll find something like the following there:

postgres:md55cb5bf77d7027e6c4e50fa4112df4d63

If you have multiple users - you'll have multiple line in the file. You can also add lines manually, but if you do so you need to:

  • Ensure that newly added line ends with new-line character. If there's no new-line character at the next execution pg_md5 command will concatenate the next user and its password in the same line.
  • Find a way to create password hash. As I've already mentioned it is not the same hash as in pcp.conf file, and I don't have a clue how this one can be generated.

Finally, you need to ensure that md5 access for a particular user is also enabled in pg_hba.conf.

6.5. Starting pgpool-II

We have finally finished with pgpool-II configuration, so we can enable and start the service:

update-rc.d pgpool2 enable
service pgpool2 start

6.6. Testing pgpool-II

If you've done everything right now you should be able to see pgpool-II running. You can test the following:

  • service pgpool2 status should report that the service is running on both machines;
  • ifconfig -a should show that one machine has additional IP address (virtual IP) labeled with eth0:0;
  • You should be able to connect to pgpool-II from any other server by using virtual IP and port 5432. You can try with pgAdmin3 for example.

6.7. Installing pgpoolAdmin

It is another part of the procedure that isn't well documented, but luckily it is not too hard figure it out. About the official documentation I'll tell you just two things:

  • It dates back to 2006-2008, and its created for version 2.1 (the current version is 3.5.2);
  • It is written in almost not-understandable English (i.e. "To execute pgpool and the pcp tool set up from the Apach user, the right of access is set.")

6.7.1. Preparing .pcppass File

This file is needed by pgpoolAdmin for authentication for pcp commands. As you may remember that in the previous part we've created pcp.conf file that contains authentication info for executing pcp commands (i.e. pcp_node_count, pcp_node_info, etc.). The very same file and the same authentication info is used for authenticating to pgpoolAdmin portal. For example, we've created there user admin with password pa55w0rd, and now we'll use this combination to login to pgpoolAdmin portal. But it is not the end of authentication - even when you are logged in to pgpoolAdmin portal, different portal functions are trying to execute pcp commands in background, and every execution must be authenticated. Although you've submitted username/password while logging in, the portal does not store this info, and needs username/password for every pcp command execution. This is where .pcppass file becomes important - the portal will always read username/password from the file, without bothering you to enter it again and again.

.pcppass file is very similar to .pgpass file we've created in PostgreSQL HA with pgpool-II - Part 4, with only difference that .pcppass file misses database part. It means that format of .pcppass file entries is:

hostname:port:username:password

In our case we'll create .pcppass file as:

*:*:admin:pa55w0rd

You can read this as: all hosts, all ports, user admin, password pa55w0rd. There are several things to note about the file and its content:

  • Username / password combination must match to one used while creating pcp.conf file, with a difference that in pcp.conf file we need to use md5 hash of the password, while in .pcppass file we need to use password in plain text.
  • .pcppass file should be placed in user's home directory. Actually, it is possible to place the file wherever you want and specify its location in an environment variable (as explained here), but we'll use the first approach.
  • When the file is used by pgpoolAdmin portal (we are doing right now), it should be placed in home directory of the user account under which Apache runs, meaning in www-data user's home directory. Home directory of www-data user is /var/www, so we need to place .pcppass file there.
  • When used with pgpoolAdmin portal the file must contain the same username/password used when we've logged in to the portal.

6.7.1.1. .pcppass File Beyond Apache

Although it is not related to pgpoolAdmin portal we are dealing with here, it is worthwhile noting that the file can be useful even when you are executing pcp commands from command line. In this case it will save you from always being prompted for password. But in this case it has:

  • To be placed in home directory of the user who is executing the commands. For example, if you are executing commands as root user, the file should be placed in /root/.pcppass.
  • To contain appropriate username/password combination. When you are executing pcp commands from command line, you can specify the username by using -u flag (i.e. pcp_node_info -u admin ...). If you've specified username this way, the file must contain this username and its corresponding password. Of course, the same username/password must also be specified in pcp.conf file. If you don't specify username by using -u flag, then your UNIX username will be used, meaning that if you are executing command as root user, .pcppass file (and again pcp.conf file also) must contain root username and its corresponding password. In this case your UNIX password does not have to match to the password specified in pcp.conf and .pcppass files.

Still talking about command line usage, let me tell you how you can know if .pcppass file is set as needed: When you try to execute any pcp command, if you get password prompt it means that the file is not set in the appropriate way.

6.7.2. pgpool.conf and pcp.conf File Permissions

Among other things, pgpoolAdmin portal provides an interface for changing pgpool.conf file, and interface for changing password (meaning changing pcp.conf file). But to be able to save the changes, the portal has to have write permissions on mentioned files. For this reason we'll change file permissions of these files:

chown www-data /etc/pgpool2/3.5.2/pgpool.conf
chown www-data /etc/pgpool2/3.5.2/pcp.conf

This ownership change won't break anything in our previous setup: postgres user (pgpool2 service runs under) doesn't change these files anyway, only reads them, so it doesn't have to be the owner.

6.7.3. Installing the Portal

Finally we can install the portal itself. As I've mentioned, pgpoolAdmin installation is not too hard, and I'll simply provide a script that performs it, with explanations in comments:

# Navigate to temporary directory
cd /tmp
 
# If archive exists delete it
if [ -f pgpoolAdmin-3.5.2.tar.gz ]; then
    rm pgpoolAdmin-3.5.2.tar.gz
fi
 
# Download installation archive
wget http://www.pgpool.net/download.php?f=pgpoolAdmin-3.5.2.tar.gz -O pgpoolAdmin-3.5.2.tar.gz
 
# If extracted directory exists delete it
if [ -d pgpoolAdmin-3.5.2 ]; then
    rm -r pgpoolAdmin-3.5.2.tar.gz
fi
 
# Extract the archive
tar -xzf pgpoolAdmin-3.5.2.tar.gz
 
# Delete archive file
rm pgpoolAdmin-3.5.2.tar.gz
 
# If virtual directory exists delete it
if [ -e /var/www/html/pgpooladmin ]; then
    rm -r /var/www/html/pgpooladmin
fi
 
# Move extracted archive to the new location (under Apache root directory)
mv pgpoolAdmin-3.5.2 /var/www/html/pgpooladmin
 
# Change ownership of the directory
chown root:root -R /var/www/html/pgpooladmin
 
# Adjust file and folder permissions
chmod 0777 /var/www/html/pgpooladmin/templates_c
chown www-data /var/www/html/pgpooladmin/conf/pgmgt.conf.php
chmod 0644 /var/www/html/pgpooladmin/conf/pgmgt.conf.php

After executing the script you should be able to access the portal, so first check if Apache / PHP are working as expected by opening http://it-rdbms01/pgpooladmin/install/phpinfo.php (you should change hostname appropriately). At this location you should find standard phpinfo page starting with something like:

Image

Two parts are especially important: Multibyte Support should be enabled:

Image

And PostgreSQL Support should be enabled:

Image

If you don't see pgsql section at all, it's probably because Apache server was started before you've installed php5-pgsql package. Try with restarting Apache (service apache2 restart) and refreshing the page. If the section is still missing ensure that php5-pgsql package is actually installed.

6.7.3.1. Installation Wizard

Once you've ensured that everything is ok with PHP and Apache, start installation wizard by opening http://it-rdbms01/pgpooladmin/install/index.php (again change the host). You should get something like:

Image

Select your language and click "Next".

The second step of the wizard is "Directory Check". You should see two green checks. If not - you probably haven't set the appropriate file permissions (recheck the last three lines form the script above). Click "Next".

In the third step there are many fields with green checks and red X's. In order to finish the installation we need to ensure that there are no more red X's, but:

Probably the only cause for red X's is wrong path, and we'll fix it. Here's what I've had to change to make it green:

Field Value
pgpool.conf File /etc/pgpool2/3.5.2/pgpool.conf
pcp.conf File /etc/pgpool2/3.5.2/pcp.conf
pgpool Command /usr/sbin/pgpool
PCP directory /usr/sbin

If you've used different directories during pgpool-II installation process - you'll have to appropriately change the values here also.

After changing the values click "Check" button again and you should get all greens. Click "Next". You should get something like:

Image

As you can see, we should now delete /var/www/html/pgpooladmin/install directory, so do it. Click "top" button, and you'll end up on a login page. Enter username / password (you've defined while creating pcp.conf file in PostgreSQL HA with pgpool-II - Part 5), and you should be in. You should see something like:

Image

If you see some error messages like "failed to execute pcp_...", "e1002", etc., chances are that you haven't set .pcppass file appropriately. Look again above and recheck that the file is set as needed.

Note that since I've used admin user (which is not PostgreSQL user) to login to this session, there's "superuser: unknown (Connection error)" on the previous screenshot. Also majority of buttons are grayed out. But if you login by using postgres user, you'll get something like:

Image

All Problems Solved???

Well, no. Obviously pgpool-II is very immature (probably even too immature) product that won't ever stop causing some headache. At this moment I'm struggling with a new question: why two pgpool-II instances (which are targeting the same cluster) are showing different info??? I'll give you an example - screenshot of my IT-RDBMS01 pgpool instance (the same screen as the previous one):

Image

As you can see even a single instance provides contradictory info: node 1 is at same time "Down" and "Up". And there's button "Return" which is never and nowhere described. What the hack this button does? And what it means in the first place? Term "return" hasn't been used in PostgreSQL replication terminology, so obviously pgpool team has introduced a new one without any explanation. Guys, you can name a button "Pray it back" or "Normalize it up", but at least you should explain what actually this button does. Does this button initializes recovery (recovery_1st_stage, etc.) or not?

And here's the screenshot from my IT-RDBMS02 instance:

Image

The same cluster, the same servers, the same moment, two pgpool-II instances. Disappointing... Very disappointing.

Where to Go Next?

Well, go wherever you want - I'm not your mom!  

Just kidding, of course. Now you know how to implement PostgreSQL High Availability with pgpool-II so the world is yours! But if you insist on my advice - beer is always a good choice.

Image