This is the third part of the tutorial. Other parts are:
Yep, it would be wise for you to start from PostgreSQL HA with pgpool-II - Part 1, and go through the whole tutorial, part by part. Especially because this part will deal with automating the procedure described in the previous part.
As mentioned, this part will deal with automating the procedure for creating replication described in PostgreSQL HA with pgpool-II - Part 2. Everything we'll build here will be very useful later when we'll implement pgpool-II.
postgresql
service, etc.)
3.1 Objective
In order to be perfectly clear about what I'm trying to accomplish here, I'll define my objective as: To prepare servers, configurations and script files so that replication configuration tasks (installing and configuring primary and standby server, promoting standby server to primary role, etc.) can be performed efficiently and easily.
3.1.1 Enabling Passwordless SSH for postgres
User
There are few cases when postgres
UNIX user on one host should be able to execute some command on another host through SSH. For this reason, after installing PostgreSQL package, we need to enable this. In case that you don't know how to accomplish this - I've created another post that will help you: Passwordless SSH in Linux.
3.1.2 PostgreSQL Configuration Files
Although it is true that PostgreSQL configuration files are different for primary and standby server, we can do the following:
-
Create
pg_hba.conf
file as described in PostgreSQL HA with pgpool-II - Part 2, since it is the same in both cases (for primary and standby server); -
Create two versions (templates) of
postgresql.conf
file: one for primary, and the other for standby role, and store them both on the target server. This way, when the actual role of the server is determined (changed) we can simply use the appropriate file.
So let's crate repltemplates
directory where template files will be stored. I'll place these directories in PostgreSQL configuration directory. Next, we'll copy configuration file templates there so that we get the following file structure:
-
/etc/postgresql/9.5/main/repltemplates
(directory)-
postgresql.conf.primary
- Created by following the procedure for primary server described in PostgreSQL HA with pgpool-II - Part 2. -
postgresql.conf.standby
- Created by following the procedure for standby server described in PostgreSQL HA with pgpool-II - Part 2.
-
Finally we'll ensure that postgres
user owns all these files/directories:
$ chown postgres:postgres /etc/postgresql/9.5/main/pg_hba.conf
$ chown postgres:postgres -R /etc/postgresql/9.5/main/repltemplates
3.2 Introducing Some Conventions
In order to have an easy way to always determine (either from code or by a glance) if the particular server is primary or standby, we'll introduce the following convention:
-
Primary server must contain the following trigger file:
/etc/postgresql/9.5/main/im_the_master
-
Standby server must contain the following standby file:
/etc/postgresql/9.5/main/im_slave
- No server can contain both trigger and standby file at the same time.
3.3 Automation Scripts
Before starting with the scripts I need to mention that within the script I'm often using Ubuntu-embedded service command to start / stop / restart postgresql
service. On the other side, you may notice that other documentation mostly uses pg_ctl
command for this purpose. There's no difference between the two. You can learn more about the differences in my other post: Managing PostgreSQL Process on Ubuntu - service, pg_ctl and pg_ctlcluster.
Finally we can prepare some scripts that will make our life easier latter. I'll place these scripts in /etc/postgresql/9.5/main/replscripts
directory. But I need to remind you:
Although the scripts are rather long, the biggest part is usually some boilerplate code, not related to replication we are dealing with here. For example, every script starts with giant while loop which does nothing more than gathering input arguments. Parts of the script that are related to replication will always be additionally explained.
3.3.1 disable_postgresql.sh
It's already mentioned that we should not allow presence of multiple primary servers in the same cluster at the same time. For this reason I'll create the script that will disable PostgreSQL and prevent it to run either as primary or standby. Besides the mentioned argument-gathering while loop, the script is simple - just go through comments and you'll understand what it is doing.
#!/bin/sh
# By Fat Dragon, 05/24/2016
# Stopping and disabling postgresql service if running
# NOTE: The script should be executed as postgres user
echo "disable_postgresql - Start"
# Defining default values
trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"
while test $# -gt 0; do
case "$1" in
-h|--help)
echo "Disables PostgreSQL"
echo " "
echo "disable_postgresql [options]"
echo " "
echo "options:"
echo "-h, --help show brief help"
echo "-t, --trigger_file=FILE specify trigger file path"
echo " Optional, default: /etc/postgresql/9.5/main/im_the_master"
echo "-s, --standby_file=FILE specify standby file path"
echo " Optional, default: /etc/postgresql/9.5/main/im_slave"
echo " "
echo "Error Codes:"
echo " 1 - Wrong user. The script has to be executed as 'postgres' user."
echo " 2 - Argument error. Caused either by bad format of provided flags and"
echo " arguments or if a mandatory argument is missing."
exit 0
;;
-t)
shift
if test $# -gt 0; then
trigger_file=$1
else
echo "ERROR: -t flag requires trigger file to be specified."
exit 2
fi
shift
;;
--trigger-file=*)
trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-s)
shift
if test $# -gt 0; then
standby_file=$1
else
echo "ERROR: -s flag requires standby file to be specified."
exit 2
fi
shift
;;
--standby-file=*)
standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
*)
echo "ERROR: Unrecognized option $1"
exit 2
;;
esac
done
# Ensuring that 'postgres' runs the script
if [ "$(id -u)" -ne "$(id -u postgres)" ]; then
echo "ERROR: The script must be executed as 'postgres' user."
exit 1
fi
echo "INFO: Stopping postgresql service..."
service postgresql stop
# Moving postgresql.conf file in order to prevent service to be started
if [ -f /etc/postgresql/9.5/main/postgresql.conf ]; then
if [ -f /etc/postgresql/9.5/main/postgresql.conf.disabled ]; then
rm /etc/postgresql/9.5/main/postgresql.conf.disabled
fi
echo "INFO: Renaming postgresql.conf file to prevent future service start."
mv /etc/postgresql/9.5/main/postgresql.conf /etc/postgresql/9.5/main/postgresql.conf.disabled
fi
# Deleting recovery.conf file
echo "INFO: Checking if recovery.conf file exists..."
if [ -f /var/lib/postgresql/9.5/main/recovery.conf ]; then
echo "INFO: recovery.conf file found. Deleting..."
rm /etc/postgresql/9.5/main/recovery.conf
fi
# Deleting trigger file
echo "INFO: Checking if trigger file exists..."
if [ -f $trigger_file ]; then
echo "INFO: Trigger file found. Deleting..."
rm $trigger_file
fi
# Deleting standby file
echo "INFO: Checking if standby file exists..."
if [ -f $standby_file ]; then
echo "INFO: Standby file found. Deleting..."
rm $standby_file
fi
# Deleting primary info file
echo "INFO: Checking if primary info file exists..."
if [ -f /var/lib/postgresql/9.5/main/primary_info ]; then
echo "INFO: primary_info file found. Deleting..."
rm /var/lib/postgresql/9.5/main/primary_info
fi
echo "disable_postgresql - Done!"
exit 0
3.3.2 promote.sh
Script promote.sh
will promote a standby server to primary server role. What the script actually does (ignoring boilerplate code) is:
-
Checks if trigger / standby file is present. In general, the script will refuse to run if trigger file is missing or standby file is present, but this behavior can be changed by specifying flag
-f
. If-f
is specified, then the script will create new trigger file if it is missing, and will delete standby file if present. -
If
-d
flag (representing previous primary server that should be disabled) is specified (followed byhostname
), the script will try to executedisable_postgresql.sh
script at the previous primary server through SSH. -
Removes
recovery.conf
file if present, since it isn't needed on primary server. -
Checks if
postgresql.conf
file should be changed, and changes it (by copying from the prepared template) if needed, and restartspostgresql
service. -
Ensures that
replication
role exists, with the appropriate password.replication
user and its password can be set by using-u
and-p
flags, respectively. Note that here we are defining the password, not checking against existing. -
Finally it writes primary info file (
/var/lib/postgresql/9.5/main/primary_info
). This file will be used later byrecovery_1st_stage.sh
script (explained in PostgreSQL HA with pgpool-II - Part 5) for performing recovery of a standby server.
#!/bin/sh
# By Fat Dragon, 05/24/2016
# Promoting standby to primary node
# NOTE: The script should be executed as postgres user
echo "promote - Start"
# Defining default values
trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"
demote_host=""
replication_user="replication"
replication_password=""
force=false
debug=true
while test $# -gt 0; do
case "$1" in
-h|--help)
echo "Promotes a standby server to primary role"
echo " "
echo "promote [options]"
echo " "
echo "options:"
echo "-h, --help show brief help"
echo "-t, --trigger_file=FILE specify trigger file path"
echo " Optional, default: /etc/postgresql/9.5/main/im_the_master"
echo "-s, --standby_file=FILE specify standby file path"
echo " Optional, default: /etc/postgresql/9.5/main/im_slave"
echo "-d, --demote=HOST specify old primary to demote"
echo " Optional, if not specified no demotion will be performed."
echo "-u, --user specify replication role"
echo " Optional, default: replication"
echo "-p, --password=PASSWORD specify password for --user (mandatory)"
echo "-f, --force Forces promotion regardless of existence"
echo " of trigger / standby files."
echo " Optional, default: N/A"
echo " Description: Without this flag the script will require"
echo " presence of trigger file."
echo " With the flag set the script will create"
echo " trigger file as needed."
echo " "
echo "Error Codes:"
echo " 1 - Wrong user. The script has to be executed as 'postgres' user."
echo " 2 - Argument error. Caused either by bad format of provided flags and"
echo " arguments or if a mandatory argument is missing."
echo " 3 - Inapropriate trigger / standby files. See -f flag for details."
echo " 4 - Error creating/deleting/copying configuration files"
echo " (postgresql.conf and recovery.conf)."
echo " Hint: ensure that templates exist and check permissions."
echo " 5 - Error creating / altering replication_user."
exit 0
;;
-t)
shift
if test $# -gt 0; then
trigger_file=$1
else
echo "ERROR: -t flag requires trigger file to be specified."
exit 2
fi
shift
;;
--trigger-file=*)
trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-s)
shift
if test $# -gt 0; then
standby_file=$1
else
echo "ERROR: -s flag requires standby file to be specified."
exit 2
fi
shift
;;
--standby-file=*)
standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-d)
shift
if test $# -gt 0; then
demote_host=$1
else
echo "ERROR: -d flag requires host that will be demoted to be specified."
exit 2
fi
shift
;;
--demote-host=*)
demote_host=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-u)
shift
if test $# -gt 0; then
replication_user=$1
else
echo "ERROR: -u flag requires replication user to be specified."
exit 2
fi
shift
;;
--user=*)
replication_user=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-p)
shift
if test $# -gt 0; then
replication_password=$1
else
echo "ERROR: -p flag requires replication password to be specified."
exit 2
fi
shift
;;
--password=*)
replication_password=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-f|--force)
force=true
shift
;;
*)
echo "ERROR: Unrecognized option $1"
exit 2
;;
esac
done
# Ensuring that 'postgres' runs the script
if [ "$(id -u)" -ne "$(id -u postgres)" ]; then
echo "ERROR: The script must be executed as 'postgres' user."
exit 1
fi
if [ "$replication_password" = "" ]; then
echo "ERROR: --password is mandatory. For help execute 'promote -h'"
exit 2
fi
if $debug; then
echo "DEBUG: The script will be executed with the following arguments:"
echo "DEBUG: --trigger-file=$trigger_file"
echo "DEBUG: --standby_file=$standby_file"
echo "DEBUG: --demote-host=$demote_host"
echo "DEBUG: --user=$replication_user"
echo "DEBUG: --password=$replication_password"
if $force; then
echo "DEBUG: --force"
fi
fi
echo "INFO: Checking if standby file exists..."
if [ -e $standby_file ]; then
if $force; then
echo "INFO: Standby file found. Deleting..."
rm $standby_file
else
echo "ERROR: Cannot promote server that contains standby file: ${standby_file}"
exit 3
fi
fi
echo "INFO: Checking if trigger file exists..."
if [ ! -e $trigger_file ]; then
if $force; then
echo "INFO: Trigger file not found. Creating a new one..."
echo "Promoted at: $(date)" >> $trigger_file
else
echo "ERROR: Cannot promote server that does not contain trigger file: ${trigger_file}"
exit 3
fi
fi
success=false
# Disabling postgresql on demote host (if specified):
if [ "$demote_host" != "" ]; then
echo "INFO: Trying to disable postgresql at ${demote_host}..."
ssh -T postgres@$demote_host /etc/postgresql/9.5/main/replscripts/disable_postgresql.sh -t $trigger_file -s $standby_file && success=true
if ! $success ; then
echo "WARNING: Failed to execute 'disable_postgresql.sh' at demoted host."
fi
fi
if [ -e /var/lib/postgresql/9.5/main/recovery.conf ]; then
echo "INFO: Deleting recovery.conf file..."
success=false
rm /var/lib/postgresql/9.5/main/recovery.conf && success=true
if ! $success ; then
echo "ERROR: Failed to delete '/var/lib/postgresql/9.5/main/recovery.conf' file."
exit 4
fi
fi
echo "INFO: Checking if postgresql.conf file exists..."
if [ -e /etc/postgresql/9.5/main/postgresql.conf ]; then
echo "INFO: postgresql.conf file found. Checking if it is for primary server..."
if diff /etc/postgresql/9.5/main/postgresql.conf /etc/postgresql/9.5/main/repltemplates/postgresql.conf.primary >/dev/null ; then
echo "INFO: postgresql.conf file corresponds to primary server file. Nothing to do."
else
echo "INFO: postgresql.conf file does not correspond to primary server file. Deleting..."
success=false
rm /etc/postgresql/9.5/main/postgresql.conf && success=true
if ! $success ; then
echo "ERROR: Failed to delete '/etc/postgresql/9.5/main/postgresql.conf' file."
exit 4
fi
echo "INFO: Copying new postgresql.conf file..."
success=false
cp /etc/postgresql/9.5/main/repltemplates/postgresql.conf.primary /etc/postgresql/9.5/main/postgresql.conf && success=true
if ! $success ; then
echo "ERROR: Failed to copy new postgresql.conf file."
exit 4
fi
if service postgresql status ; then
echo "INFO: Restarting postgresql service..."
service postgresql restart
fi
fi
else
echo "INFO: postgresql.conf file not found. Copying new one..."
success=false
cp /etc/postgresql/9.5/main/repltemplates/postgresql.conf.primary /etc/postgresql/9.5/main/postgresql.conf && success=true
if ! $success ; then
echo "ERROR: Failed to copy new postgresql.conf file."
exit 4
fi
if service postgresql status ; then
echo "INFO: Restarting postgresql service..."
service postgresql restart
fi
fi
if service postgresql status ; then
echo "INFO: postgresql already running."
else
echo "INFO: Starting postgresql service..."
service postgresql start
fi
echo "INFO: Ensuring replication role and password..."
success=false
rolecount=$(psql -Atc "SELECT count (*) FROM pg_roles WHERE rolname='${replication_user}';") && success=true
if ! $success ; then
echo "ERROR: Failed to check existence of '${replication_user}' role."
exit 5
fi
if [ "$rolecount" = "0" ]; then
echo "INFO: Replication role not found. Creating..."
success=false
psql -c "CREATE ROLE ${replication_user} WITH REPLICATION PASSWORD '${replication_password}' LOGIN;" && success=true
if ! $success ; then
echo "ERROR: Failed to create '${replication_user}' role."
exit 5
fi
else
echo "INFO: Replication role found. Ensuring password..."
success=false
psql -c "ALTER ROLE ${replication_user} WITH REPLICATION PASSWORD '${replication_password}' LOGIN;" && success=true
if ! $success ; then
echo "ERROR: Failed to set password for '${replication_user}' role."
exit 5
fi
fi
echo "INFO: Creating primary info file..."
if [ -e /var/lib/postgresql/9.5/main/primary_info ]; then
rm /var/lib/postgresql/9.5/main/primary_info
fi
echo "REPL_USER=${replication_user}\nREPL_PASSWORD=${replication_password}\nTRIGGER_FILE=${trigger_file}\nSTANDBY_FILE=${standby_file}\n" >> /var/lib/postgresql/9.5/main/primary_info
chown postgres:postgres /var/lib/postgresql/9.5/main/primary_info
chmod 0600 /var/lib/postgresql/9.5/main/primary_info
echo "promote - Done!"
exit 0
3.3.3 create_slot.sh
This script will (re)create replication slot with specified name. Again, if you ignore a boilerplate code, the actual script is short and simple:
- Checks if the trigger file exists, and refuses to run if not (replication slot can be created only on the primary server).
-
Checks if the slot exists, and (re)creates it as needed. If
-r
flag is specified, the script will first delete the slot (if exist) and create the new one. If the flag isn't specified, the script won't do anything if the slot with the specified name already exists.
This script is called from the next one (initiate_replication.sh
), so you can check there for usage example.
#!/bin/sh
# By Fat Dragon, 05/24/2016
# (Re)creates replication slot.
# NOTE: The script should be executed as postgres user
echo "create_slot - Start"
# Defining default values
trigger_file="/etc/postgresql/9.5/main/im_the_master"
slot_name=""
recreate=false
debug=true
while test $# -gt 0; do
case "$1" in
-h|--help)
echo "Creates replication slot"
echo " "
echo "create_slot [options]"
echo " "
echo "options:"
echo "-h, --help show brief help"
echo "-t, --trigger_file=FILE specify trigger file path"
echo " Optional, default: /etc/postgresql/9.5/main/im_the_master"
echo "-n, --name=NAME slot name (mandatory)"
echo " Slot name can be also specified without using"
echo " flags (i.e. 'create_slot myslot')"
echo "-r, --recreate Forces re-creation if the slot already exists"
echo " Optional, default: N/A"
echo " Description: Without this flag the script won't do anything if"
echo " the slot with defined name already exists."
echo " With the flag set, if the slot with defined name"
echo " already exists it will be deleted and re-created."
echo " "
echo "Error Codes:"
echo " 1 - Wrong user. The script has to be executed as 'postgres' user."
echo " 2 - Argument error. Caused either by bad format of provided flags and"
echo " arguments or if a mandatory argument is missing."
echo " 3 - Inapropriate trigger / standby files. This script REQUIRES trigger"
echo " file to be present."
echo " 4 - Error executing a slot-related operation (query/create/drop)."
exit 0
;;
-t)
shift
if test $# -gt 0; then
trigger_file=$1
else
echo "ERROR: -t flag requires trigger file to be specified."
exit 2
fi
shift
;;
--trigger-file=*)
trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-n)
if [ "$slot_name" != "" ]; then
echo "ERROR: Invalid command. For help execute 'create_slot -h'"
exit 2
fi
shift
if test $# -gt 0; then
slot_name=$1
else
echo "ERROR: -n flag requires slot name to be specified."
exit 2
fi
shift
;;
--name=*)
if [ "$slot_name" != "" ]; then
echo "ERROR: Invalid command. For help execute 'create_slot -h'"
exit 2
fi
slot_name=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-r|--recreate)
recreate=true
shift
;;
*)
if [ "$slot_name" != "" ]; then
echo "ERROR: Invalid command. For help execute 'create_slot -h'"
exit 2
fi
slot_name=$1
shift
;;
esac
done
# Ensuring that 'postgres' runs the script
if [ "$(id -u)" -ne "$(id -u postgres)" ]; then
echo "ERROR: The script must be executed as 'postgres' user."
exit 1
fi
if [ "$slot_name" = "" ]; then
echo "ERROR: Slot name is mandatory. For help execute 'create_slot -h'"
exit 2
fi
if $debug; then
echo "DEBUG: The script will be executed with the following arguments:"
echo "DEBUG: --trigger-file=${trigger_file}"
echo "DEBUG: --name=${slot_name}"
if $recreate; then
echo "DEBUG: --recreate"
fi
fi
echo "Checking if trigger file exists..."
if [ ! -e $trigger_file ]; then
echo "ERROR: Cannot create replication slot if the server does not contain trigger file: ${trigger_file}"
exit 3
fi
success=false
echo "INFO: Checking if slot '${slot_name}' exists..."
slotcount=$(psql -Atc "SELECT count (*) FROM pg_replication_slots WHERE slot_name='${slot_name}';") && success=true
if ! $success ; then
echo "ERROR: Cannot check for '${slot_name}' slot existence."
exit 4
fi
if [ "$slotcount" = "0" ]; then
echo "INFO: Slot not found. Creating..."
success=false
psql -c "SELECT pg_create_physical_replication_slot('${slot_name}');" && success=true
if ! $success ; then
echo "ERROR: Cannot create '${slot_name}' slot."
exit 4
fi
elif $recreate ; then
echo "INFO: Slot found. Removing..."
success=false
psql -c "SELECT pg_drop_replication_slot('${slot_name}');" && success=true
if ! $success ; then
echo "ERROR: Cannot drop existing '${slot_name}' slot."
exit 4
fi
echo "INFO: Re-creating the slot..."
success=false
psql -c "SELECT pg_create_physical_replication_slot('${slot_name}');" && success=true
if ! $success ; then
echo "ERROR: Cannot create '${slot_name}' slot."
exit 4
fi
fi
echo "create_slot - Done!"
exit 0
3.3.4 initiate_replication.sh
The last script we'll create here is the script that initiates replication (initiates standby server). Again, after ignoring boilerplate code we can say that the script:
-
Checks trigger / standby files. The script has the same behavior regarding this as
promote.sh
script explained earlier, with only difference that this script demands standby file and refuses trigger file. Flag-f
has the same meaning. -
Ensures that PostgreSQL password file (
.pgpass
, explained in PostgreSQL HA with pgpool-II - Part 2) contains replication user / password; - Tries to recreate replication slot at the specified primary server, and exits if this attempt fails;
-
Stops
postgresql
service and deletes PostgreSQL data directory; -
Executes
pg_basebackup
to get initial backup; -
Creates
recovery.conf
file and sets its permissions; -
Deletes
postgresql.conf
file, creates the new one from template, and sets its permissions; -
Starts
postgresql
service.
#!/bin/sh
# By Fat Dragon, 05/24/2016
# Promoting standby to primary node.
# NOTE: The script should be executed as postgres user
echo "initiate_replication - Start"
# Defining default values
trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"
primary_host=""
primary_port="5432"
slot_name=$(echo "$HOSTNAME" | tr '[:upper:]' '[:lower:]')
slot_name=${slot_name/-/_}
replication_user="replication"
replication_password=""
force=false
debug=true
while test $# -gt 0; do
case "$1" in
-h|--help)
echo "Promotes a standby server to primary role"
echo " "
echo "promote [options]"
echo " "
echo "options:"
echo "-h, --help show brief help"
echo "-t, --trigger_file=FILE specify trigger file path"
echo " Optional, default: /etc/postgresql/9.5/main/im_the_master"
echo "-s, --standby_file=FILE specify standby file path"
echo " Optional, default: /etc/postgresql/9.5/main/im_slave"
echo "-H, --primary-host=HOST specify primary host (Mandatory)"
echo "-P, --primary-port=PORT specify primary port"
echo " Optional, default: 5432"
echo "-n, --slot_name=NAME specify slot name"
echo " Optional, defaults to lowercase hostname with dashes replaced"
echo " by underscores."
echo "-u, --user specify replication role"
echo " Optional, default: replication"
echo "-p, --password=PASSWORD specify password for --user"
echo " Optional, default: empty"
echo "-f, --force Forces promotion regardless to"
echo " trigger / standby files."
echo " Optional, default: N/A"
echo " Description: Without this flag the script will require"
echo " presence of standby file."
echo " With the flag set the script will create"
echo " standby file as needed."
echo " "
echo "Error Codes:"
echo " 1 - Wrong user. The script has to be executed as 'postgres' user."
echo " 2 - Argument error. Caused either by bad format of provided flags and"
echo " arguments or if a mandatory argument is missing."
echo " 3 - Inapropriate trigger / standby files. See -f flag for details."
echo " 4 - Error creating/deleting/copying configuration files"
echo " (postgresql.conf and recovery.conf)."
echo " Hint: ensure that templates exist and check permissions."
echo " 5 - Error in communicating with the primary server (to create the"
echo " slot or get the initial data)."
echo " 6 - Error deleting old data directory."
exit 0
;;
-t)
shift
if test $# -gt 0; then
trigger_file=$1
else
echo "ERROR: -t flag requires trigger file to be specified."
exit 2
fi
shift
;;
--trigger-file=*)
trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-s)
shift
if test $# -gt 0; then
standby_file=$1
else
echo "ERROR: -s flag requires standby file to be specified."
exit 2
fi
shift
;;
--standby-file=*)
standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-H)
shift
if test $# -gt 0; then
primary_host=$1
else
echo "ERROR: -H flag requires primary host to be specified."
exit 2
fi
shift
;;
--primary-host=*)
primary_host=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-P)
shift
if test $# -gt 0; then
primary_port=$1
else
echo "ERROR: -p flag requires port to be specified."
exit 2
fi
shift
;;
--primary-port=*)
primary_port=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-n)
shift
if test $# -gt 0; then
slot_name=$1
else
echo "ERROR: -n flag requires slot name to be specified."
exit 2
fi
shift
;;
--slot-name=*)
slot_name=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-u)
shift
if test $# -gt 0; then
replication_user=$1
else
echo "ERROR: -u flag requires replication user to be specified."
exit 2
fi
shift
;;
--user=*)
replication_user=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-p)
shift
if test $# -gt 0; then
replication_password=$1
else
echo "ERROR: -p flag requires replication password to be specified."
exit 2
fi
shift
;;
--password=*)
replication_password=`echo $1 | sed -e 's/^[^=]*=//g'`
shift
;;
-f|--force)
force=true
shift
;;
*)
echo "ERROR: Unrecognized option $1"
exit 2
;;
esac
done
# Ensuring that 'postgres' runs the script
if [ "$(id -u)" -ne "$(id -u postgres)" ]; then
echo "ERROR: The script must be executed as 'postgres' user."
exit 1
fi
if [ "$primary_host" = "" ]; then
echo "ERROR: Primary host is mandatory. For help execute 'initiate_replication -h'"
exit 2
fi
if [ "$replication_password" = "" ]; then
echo "ERROR: --password is mandatory. For help execute 'initiate_replication -h'"
exit 2
fi
if $debug; then
echo "DEBUG: The script will be executed with the following arguments:"
echo "DEBUG: --trigger-file=$trigger_file"
echo "DEBUG: --standby_file=$standby_file"
echo "DEBUG: --primary-host=$primary_host"
echo "DEBUG: --primary-port=$primary_port"
echo "DEBUG: --slot-name=$slot_name"
echo "DEBUG: --user=$replication_user"
echo "DEBUG: --password=$replication_password"
if $force; then
echo "DEBUG: --force"
fi
fi
echo "INFO: Checking if trigger file exists..."
if [ -e $trigger_file ]; then
if $force; then
echo "INFO: Trigger file found. Deleting..."
rm $trigger_file
else
echo "ERROR: Cannot initiate server as standby while it contains trigger file: ${trigger_file}"
exit 3
fi
fi
echo "INFO: Checking if standby file exists..."
if [ ! -e $standby_file ]; then
if $force; then
echo "INFO: Standby file not found. Creating new one..."
echo "Initiated at: $(date)" >> $standby_file
else
echo "ERROR: Cannot initiate server as standby if it does not contain standby file: ${standby_file}"
exit 3
fi
fi
echo "INFO: Ensuring replication user and password in password file (.pgpass)..."
password_line="*:*:*:${replication_user}:${replication_password}"
if [ ! -f /var/lib/postgresql/.pgpass ]; then
echo $password_line >> /var/lib/postgresql/.pgpass
elif ! grep -q "$password_line" /var/lib/postgresql/.pgpass ; then
sed -i -e '$a\' /var/lib/postgresql/.pgpass
echo $password_line >> /var/lib/postgresql/.pgpass
sed -i -e '$a\' /var/lib/postgresql/.pgpass
fi
chown postgres:postgres /var/lib/postgresql/.pgpass
chmod 0600 /var/lib/postgresql/.pgpass
success=false
echo "INFO: Creating replication slot at the primary server..."
ssh -T postgres@$primary_host /etc/postgresql/9.5/main/replscripts/create_slot.sh -r $slot_name && success=true
if ! $success ; then
echo "ERROR: Creating replication slot at the primary server failed."
exit 5
fi
service postgresql stop
if [ -d /var/lib/postgresql/9.5/main ]; then
echo "INFO: Deleting old data..."
success=false
rm -rf /var/lib/postgresql/9.5/main && success=true
if ! $success ; then
echo "ERROR: Deleting data directory failed."
exit 6
fi
fi
echo "INFO: Getting the initial backup..."
success=false
pg_basebackup -D /var/lib/postgresql/9.5/main -h $primary_host -p $primary_port -U $replication_user && success=true
if ! $success; then
echo "ERROR: Initial backup failed."
exit 5
fi
if [ -e /var/lib/postgresql/9.5/main/recovery.conf ]; then
echo "INFO: Removing old recovery.conf file..."
success=false
rm /var/lib/postgresql/9.5/main/recovery.conf && success=true
if ! $success; then
echo "ERROR: Removing old recovery.conf failed."
exit 4
fi
fi
echo "INFO: Creating recovery.conf file..."
cat >/var/lib/postgresql/9.5/main/recovery.conf <<EOL
standby_mode = 'on'
primary_slot_name = '${slot_name}'
primary_conninfo = 'host=${primary_host} port=${primary_port} user=${replication_user} password=${replication_password}'
trigger_file = '${trigger_file}'
EOL
chown postgres:postgres /var/lib/postgresql/9.5/main/recovery.conf
chmod 0644 /var/lib/postgresql/9.5/main/recovery.conf
if [ -e /etc/postgresql/9.5/main/postgresql.conf ]; then
echo "INFO: Removing old postgresql.conf file..."
success=false
rm /etc/postgresql/9.5/main/postgresql.conf && success=true
if ! $success; then
echo "ERROR: Removing old postgresql.conf failed."
exit 4
fi
fi
echo "INFO: Copying new postgresql.conf file..."
success=false
cp /etc/postgresql/9.5/main/repltemplates/postgresql.conf.standby /etc/postgresql/9.5/main/postgresql.conf && success=true
if ! $success; then
echo "ERROR: Copying new postgresql.conf failed."
exit 4
fi
chown postgres:postgres /etc/postgresql/9.5/main/postgresql.conf
chmod 0644 /etc/postgresql/9.5/main/postgresql.conf
echo "INFO: Starting postgresql service..."
service postgresql start
echo "initiate_replication - Done!"
exit 0
3.3.5 Permissions
Finally, we need to set appropriate permissions for these script files:
$ chown postgres:postgres -R /etc/postgresql/9.5/main/replscripts
$ chmod 0744 -R /etc/postgresql/9.5/main/replscripts
3.4 Result
Once everything is prepared as described in this page, the new primary server can be promoted by executing single line of code at the server that will be the primary (in my case on IT-RDBMS01
):
$ sudo -u postgres /etc/postgresql/9.5/main/replscripts/promote.sh -f -p replicationpassword
postgres
user.
In this call I've mostly relied on default values for arguments. You can see more details about arguments and their defaults if you execute the script with -h
(help) flag.
Similarly you can initiate standby server by executing single line of code at the server that will be standby (in my case IT-RDBMS02
):
$ sudo -u postgres /etc/postgresql/9.5/main/replscripts/initiate_replication.sh -f -H IT-RDBMS01 -P 5433 -p replicationpassword
This time I've had to specify few more arguments (primary host and port), but anyway the replication is up and running. You can test it in the same way we've did in PostgreSQL HA with pgpool-II - Part 2.
3.5 Where to Go Next?
Continue with PostgreSQL HA with pgpool-II - Part 4 where we'll finally install pgpool-II.
Comments
In initiate_replication script, command 'slot_name=${slot_name/-/_}' only replaces the first occurrence of '-' by '_'
Use 'slot_name=${slot_name//-/_}' instead to replace all of them.
;-)
Hey, thanks mate!
Sorry 'cause it took me so long to respond - at the moment I'm occupied with some other things (you can check my newest posts).
As you can see I haven't changed pgpool2 authorization thing you've told me the last time. Hopefully these days I will...
And thanks for correcting my scripts!
Keep in touch..
Enjoy!
Hi Aleksandar,
Glad to see your post on /blog/2016/05/postgresql-ha-pgpool-ii-part-3
However, this website seems to be down. Could you kindly bring this back up. I have more questions as follow up but could not open this website to ask here.
you could change #!/bin/sh to #!/bin/bash
solved following problem for me
initiate_replication - Start
./initiate_replication.sh: 14: ./initiate_replication.sh: Bad substitution
thank you ;)
Sorry for my way too late response Askar, and thanks for sharing the problem / solution here!
Hi,
I'm using postgresql 9.3 and installing it in CentOS.
1. is /etc/postgresql/9.5/main/im_the_master a directory?
1.a. What is placed inside of im_the_master and im_slave?
2. do you have a shell command to be used in replscripts for using WAL? version 9.3 doesn't support Slots.
Again great tutorial,
I faced the following problem, when primary server has gone down, the failover script fired up normally but nothing hapened due to the commands "service postgresql stat/restart" asking for interactive authentification. Of course this causes a BIG mess (with pgpool recognizing both servers as primaries and second server staying in read only).
To avoid that I edited sudoer file "sudo visudo" on both primary and standby with "%postgres ALL=(ALL) NOPASSWD: /usr/sbin/service postgresql *"
And i changed the scripts invoking "service postgresql start/restart" with "sudo service postgresql start/restart"
I see that /blog/2016/05/postgresql-ha-pgpool-ii-part-3
website is down. can some one share the automation scripts which author has mentioned at this page?
Up again. Sorry!
Awesome. Thanks a ton Aleksandar. I have some questions. Will post it in seperate thread.
At /blog/2016/05/postgresql-ha-pgpool-ii-part-3, in section 3.3. Introducing Some Conventions, I am not clear about what should be the contents of the trigger files :/etc/postgresql/9.5/main/im_the_master and /etc/postgresql/9.5/main/im_slave. Could you kindly share those details Aleksandar.
Content isn't important. Important thing is if the file exists or not. You can put any content in it (I believe that I'm putting some info like date of creation / change).
Ah ok. So this can be empty as well?
yep
Another question is I am using postgres 9.5 on centos 7. Are there equivalent scripts for those you have shared here for supporting this on centos apart from ubuntu ?
It shouldn't be too hard to adopt scripts for CentOS.
In your post you have mentioned to place the 4 automation scripts under /etc/postgresql/9.5/main/replscripts. I am not sure if these scripts should be place on both primary and standy. Could you confirm?
In section 3.5 at /blog/2016/05/postgresql-ha-pgpool-ii-part-3, I was able to promote to primary using the promote.sh script that I have run on primary server but I am facing issue when trying to run the next script that involves initiate_replication.sh on the secondary server. This is the error I get:
-bash-4.2$ /var/lib/pgsql/9.5/data/replscripts/initiate_replication.sh -f -H 10.142.82.232 -P 5433
initiate_replication - Start
DEBUG: The script will be executed with the following arguments:
DEBUG: --trigger-file=/var/lib/pgsql/9.5/data/im_the_master
DEBUG: --standby_file=/var/lib/pgsql/9.5/data/im_slave
DEBUG: --primary-host=10.142.82.232
DEBUG: --primary-port=5433
DEBUG: --slot-name=server2
DEBUG: --user=replication
DEBUG: --password=ca
DEBUG: --force
INFO: Checking if trigger file exists...
INFO: Checking if standby file exists...
INFO: Ensuring replication user and password in password file (.pgpass)...
INFO: Creating replication slot at the primary server...
create_slot - Start
DEBUG: The script will be executed with the following arguments:
DEBUG: --trigger-file=/var/lib/pgsql/9.5/data/im_the_master
DEBUG: --name=server2
DEBUG: --recreate
Checking if trigger file exists...
INFO: Checking if slot 'server2' exists...
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
ERROR: Cannot check for 'server2' slot existence.
ERROR: Creating replication slot at the primary server failed.
--------------------------
I have created slot by name 'server2' already on primary server. when I directly connect with psql command, I need to export PGPORT with 5433 every time,not sure why there is a problem here in connecting to the psql on primary server.
I tried debugging my best and I could not figure out why its complaining about connections on port 5432 when I have PGPORT set with 5433. Any help or thoughts on this would be greatly appreciated.
Add new comment
Anonymous comments require solving captcha, and waiting for administrator's approval.