PostgreSQL HA with pgpool-II - Part 3

This is the third part of  the tutorial. Other parts are:

Yep, it would be wise for you to start from PostgreSQL HA with pgpool-II - Part 1, and go through the whole tutorial, part by part. Especially because this part will deal with automating the procedure described in the previous part.

As mentioned, this part will deal with automating the procedure for creating replication described in PostgreSQL HA with pgpool-II - Part 2. Everything we'll build here will be very useful later when we'll implement pgpool-II.

3.1 Objective

In order to be perfectly clear about what I'm trying to accomplish here, I'll define my objective as: To prepare servers, configurations and script files so that replication configuration tasks (installing and configuring primary and standby server, promoting standby server to primary role, etc.) can be performed efficiently and easily.

3.1.1 Enabling Passwordless SSH for postgres User

There are few cases when postgres UNIX user on one host should be able to execute some command on another host through SSH. For this reason, after installing PostgreSQL package, we need to enable this. In case that you don't know how to accomplish this - I've created another post that will help you: Passwordless SSH in Linux.

3.1.2 PostgreSQL Configuration Files

Although it is true that PostgreSQL configuration files are different for primary and standby server, we can do the following:

  • Create pg_hba.conf file as described in PostgreSQL HA with pgpool-II - Part 2, since it is the same in both cases (for primary and standby server);
  • Create two versions (templates) of postgresql.conf file: one for primary, and the other for standby role, and store them both on the target server. This way, when the actual role of the server is determined (changed) we can simply use the appropriate file.

So let's crate repltemplates directory where template files will be stored. I'll place these directories in PostgreSQL configuration directory. Next, we'll copy configuration file templates there so that we get the following file structure:

Finally we'll ensure that postgres user owns all these files/directories:

$ chown postgres:postgres /etc/postgresql/9.5/main/pg_hba.conf
$ chown postgres:postgres -R /etc/postgresql/9.5/main/repltemplates

3.2 Introducing Some Conventions

In order to have an easy way to always determine (either from code or by a glance) if the particular server is primary or standby, we'll introduce the following convention:

  • Primary server must contain the following trigger file: /etc/postgresql/9.5/main/im_the_master
  • Standby server must contain the following standby file: /etc/postgresql/9.5/main/im_slave
  • No server can contain both trigger and standby file at the same time.

3.3 Automation Scripts

Before starting with the scripts I need to mention that within the script I'm often using Ubuntu-embedded service command to start / stop / restart postgresql service. On the other side, you may notice that other documentation mostly uses pg_ctl command for this purpose. There's no difference between the two. You can learn more about the differences in my other post: Managing PostgreSQL Process on Ubuntu - service, pg_ctl and pg_ctlcluster.

Finally we can prepare some scripts that will make our life easier latter. I'll place these scripts in /etc/postgresql/9.5/main/replscripts directory. But I need to remind you:

Although the scripts are rather long, the biggest part is usually some boilerplate code, not related to replication we are dealing with here. For example, every script starts with giant while loop which does nothing more than gathering input arguments. Parts of the script that are related to replication will always be additionally explained.

3.3.1 disable_postgresql.sh

It's already mentioned that we should not allow presence of multiple primary servers in the same cluster at the same time. For this reason I'll create the script that will disable PostgreSQL and prevent it to run either as primary or standby. Besides the mentioned argument-gathering while loop, the script is simple - just go through comments and you'll understand what it is doing.

#!/bin/sh
# By Fat Dragon, 05/24/2016
# Stopping and disabling postgresql service if running
# NOTE: The script should be executed as postgres user

echo "disable_postgresql - Start"

# Defining default values
trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"

while test $# -gt 0; do

    case "$1" in

        -h|--help)

            echo "Disables PostgreSQL"
            echo " "
            echo "disable_postgresql [options]"
            echo " "
            echo "options:"
            echo "-h, --help                show brief help"
            echo "-t, --trigger_file=FILE   specify trigger file path"
            echo "    Optional, default: /etc/postgresql/9.5/main/im_the_master"
            echo "-s, --standby_file=FILE   specify standby file path"
            echo "    Optional, default: /etc/postgresql/9.5/main/im_slave"
            echo " "
            echo "Error Codes:"
            echo "  1 - Wrong user. The script has to be executed as 'postgres' user."
            echo "  2 - Argument error. Caused either by bad format of provided flags and"
            echo "      arguments or if a mandatory argument is missing."
            exit 0
            ;;

        -t)

            shift

            if test $# -gt 0; then

                trigger_file=$1
            
            else
            
                echo "ERROR: -t flag requires trigger file to be specified."
                exit 2
            
            fi
            
            shift
            ;;
        
        --trigger-file=*)
        
            trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`
        
            shift
            ;;
        
        -s)
        
            shift
        
            if test $# -gt 0; then
        
                standby_file=$1
        
            else
        
                echo "ERROR: -s flag requires standby file to be specified."
                exit 2
        
            fi
        
            shift
            ;;
        
        --standby-file=*)
        
            standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`
        
            shift
            ;;
        
        *)
        
            echo "ERROR: Unrecognized option $1"
            exit 2
            ;;

    esac
    
done

# Ensuring that 'postgres' runs the script
if [ "$(id -u)" -ne "$(id -u postgres)" ]; then

    echo "ERROR: The script must be executed as 'postgres' user."
    exit 1

fi

echo "INFO: Stopping postgresql service..."
service postgresql stop

# Moving postgresql.conf file in order to prevent service to be started
if [ -f /etc/postgresql/9.5/main/postgresql.conf ]; then

    if [ -f /etc/postgresql/9.5/main/postgresql.conf.disabled ]; then
        rm /etc/postgresql/9.5/main/postgresql.conf.disabled
    fi

    echo "INFO: Renaming postgresql.conf file to prevent future service start."
    mv /etc/postgresql/9.5/main/postgresql.conf /etc/postgresql/9.5/main/postgresql.conf.disabled

fi

# Deleting recovery.conf file
echo "INFO: Checking if recovery.conf file exists..."
if [ -f /var/lib/postgresql/9.5/main/recovery.conf ]; then

	echo "INFO: recovery.conf file found. Deleting..."
	
    rm /etc/postgresql/9.5/main/recovery.conf
fi

# Deleting trigger file
echo "INFO: Checking if trigger file exists..."
if [ -f $trigger_file ]; then

	echo "INFO: Trigger file found. Deleting..."
	rm $trigger_file

fi

# Deleting standby file
echo "INFO: Checking if standby file exists..."
if [ -f $standby_file ]; then

	echo "INFO: Standby file found. Deleting..."
	rm $standby_file

fi

# Deleting primary info file
echo "INFO: Checking if primary info file exists..."
if [ -f /var/lib/postgresql/9.5/main/primary_info ]; then

	echo "INFO: primary_info file found. Deleting..."
	rm /var/lib/postgresql/9.5/main/primary_info

fi

echo "disable_postgresql - Done!"
exit 0

3.3.2 promote.sh

Script promote.sh will promote a standby server to primary server role. What the script actually does (ignoring boilerplate code) is:

  • Checks if trigger / standby file is present. In general, the script will refuse to run if trigger file is missing or standby file is present, but this behavior can be changed by specifying flag -f. If -f is specified, then the script will create new trigger file if it is missing, and will delete standby file if present.
  • If -d flag (representing previous primary server that should be disabled) is specified (followed by hostname), the script will try to execute disable_postgresql.sh script at the previous primary server through SSH.
  • Removes recovery.conf file if present, since it isn't needed on primary server.
  • Checks if postgresql.conf file should be changed, and changes it (by copying from the prepared template) if needed, and restarts postgresql service.
  • Ensures that replication role exists, with the appropriate password. replication user and its password can be set by using -u and -p flags, respectively. Note that here we are defining the password, not checking against existing.
  • Finally it writes primary info file (/var/lib/postgresql/9.5/main/primary_info). This file will be used later by recovery_1st_stage.sh script (explained in PostgreSQL HA with pgpool-II - Part 5) for performing recovery of a standby server.
#!/bin/sh
# By Fat Dragon, 05/24/2016
# Promoting standby to primary node
# NOTE: The script should be executed as postgres user

echo "promote - Start"

# Defining default values
trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"
demote_host=""
replication_user="replication"
replication_password=""
force=false

debug=true

while test $# -gt 0; do

    case "$1" in
    
        -h|--help)
    
            echo "Promotes a standby server to primary role"
            echo " "
            echo "promote [options]"
            echo " "
            echo "options:"
            echo "-h, --help                show brief help"
            echo "-t, --trigger_file=FILE   specify trigger file path"
			echo "    Optional, default: /etc/postgresql/9.5/main/im_the_master"
            echo "-s, --standby_file=FILE   specify standby file path"
			echo "    Optional, default: /etc/postgresql/9.5/main/im_slave"
            echo "-d, --demote=HOST         specify old primary to demote"
			echo "    Optional, if not specified no demotion will be performed."
			echo "-u, --user                specify replication role"
			echo "    Optional, default: replication"
			echo "-p, --password=PASSWORD   specify password for --user (mandatory)"
			echo "-f, --force               Forces promotion regardless of existence"
			echo "                          of trigger / standby files."
			echo "    Optional, default: N/A"
			echo "    Description:       Without this flag the script will require"
			echo "                       presence of trigger file."
			echo "                       With the flag set the script will create"
			echo "                       trigger file as needed."
			echo " "
			echo "Error Codes:"
			echo "  1 - Wrong user. The script has to be executed as 'postgres' user."
			echo "  2 - Argument error. Caused either by bad format of provided flags and"
			echo "      arguments or if a mandatory argument is missing."
			echo "  3 - Inapropriate trigger / standby files. See -f flag for details."
			echo "  4 - Error creating/deleting/copying configuration files"
			echo "      (postgresql.conf and recovery.conf)."
			echo "      Hint: ensure that templates exist and check permissions."
			echo "  5 - Error creating / altering replication_user."
            exit 0
            ;;

		-t)

			shift

            if test $# -gt 0; then

                trigger_file=$1

            else

                echo "ERROR: -t flag requires trigger file to be specified."
                exit 2

            fi

            shift
            ;;

        --trigger-file=*)

            trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`
            shift
            ;;

		-s)

			shift

            if test $# -gt 0; then

                standby_file=$1

            else

                echo "ERROR: -s flag requires standby file to be specified."
                exit 2

            fi
            shift
            ;;

        --standby-file=*)

            standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;

		-d)

			shift

            if test $# -gt 0; then

                demote_host=$1

            else

                echo "ERROR: -d flag requires host that will be demoted to be specified."
                exit 2

            fi

            shift
            ;;
        --demote-host=*)

            demote_host=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;

		-u)

			shift

            if test $# -gt 0; then

                replication_user=$1

            else

                echo "ERROR: -u flag requires replication user to be specified."
                exit 2

            fi

            shift
            ;;
        --user=*)

            replication_user=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;
		-p)

			shift

            if test $# -gt 0; then

                replication_password=$1

            else

                echo "ERROR: -p flag requires replication password to be specified."
                exit 2

            fi

            shift
            ;;

        --password=*)

            replication_password=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;

        -f|--force)

            force=true
            shift
            ;;

        *)

			echo "ERROR: Unrecognized option $1"
			exit 2
            ;;

    esac

done

# Ensuring that 'postgres' runs the script
if [ "$(id -u)" -ne "$(id -u postgres)" ]; then

    echo "ERROR: The script must be executed as 'postgres' user."
    exit 1

fi

if [ "$replication_password" = "" ]; then

    echo "ERROR: --password is mandatory. For help execute 'promote -h'"
    exit 2

fi

if $debug; then

	echo "DEBUG: The script will be executed with the following arguments:"
	echo "DEBUG: --trigger-file=$trigger_file"
	echo "DEBUG: --standby_file=$standby_file"
	echo "DEBUG: --demote-host=$demote_host"
	echo "DEBUG: --user=$replication_user"
	echo "DEBUG: --password=$replication_password"

	if $force; then
		echo "DEBUG: --force"
	fi

fi

echo "INFO: Checking if standby file exists..."
if [ -e $standby_file ]; then

	if $force; then

		echo "INFO: Standby file found. Deleting..."
		rm $standby_file

	else

		echo "ERROR: Cannot promote server that contains standby file: ${standby_file}"
		exit 3

	fi
	
fi

echo "INFO: Checking if trigger file exists..."
if [ ! -e $trigger_file ]; then

	if $force; then

		echo "INFO: Trigger file not found. Creating a new one..."
		echo "Promoted at: $(date)" >> $trigger_file

	else

		echo "ERROR: Cannot promote server that does not contain trigger file: ${trigger_file}"
		exit 3

	fi
	
fi

success=false

# Disabling postgresql on demote host (if specified):
if [ "$demote_host" != "" ]; then

	echo "INFO: Trying to disable postgresql at ${demote_host}..."
	ssh -T postgres@$demote_host /etc/postgresql/9.5/main/replscripts/disable_postgresql.sh -t $trigger_file -s $standby_file && success=true

	if ! $success ; then
		echo "WARNING: Failed to execute 'disable_postgresql.sh' at demoted host."
	fi
	
fi

if [ -e /var/lib/postgresql/9.5/main/recovery.conf ]; then
	
	echo "INFO: Deleting recovery.conf file..."

	success=false
	rm /var/lib/postgresql/9.5/main/recovery.conf && success=true
	
	if ! $success ; then

		echo "ERROR: Failed to delete '/var/lib/postgresql/9.5/main/recovery.conf' file."
		exit 4

	fi
	
fi

echo "INFO: Checking if postgresql.conf file exists..."
if [ -e /etc/postgresql/9.5/main/postgresql.conf ]; then

	echo "INFO: postgresql.conf file found. Checking if it is for primary server..."
	if diff /etc/postgresql/9.5/main/postgresql.conf /etc/postgresql/9.5/main/repltemplates/postgresql.conf.primary >/dev/null ; then
	
		echo "INFO: postgresql.conf file corresponds to primary server file. Nothing to do."
		
	else
	
		echo "INFO: postgresql.conf file does not correspond to primary server file. Deleting..."

		success=false
		rm /etc/postgresql/9.5/main/postgresql.conf && success=true
			
		if ! $success ; then

			echo "ERROR: Failed to delete '/etc/postgresql/9.5/main/postgresql.conf' file."
			exit 4

		fi
		
		echo "INFO: Copying new postgresql.conf file..."

		success=false
		cp /etc/postgresql/9.5/main/repltemplates/postgresql.conf.primary /etc/postgresql/9.5/main/postgresql.conf && success=true
		
		if ! $success ; then

			echo "ERROR: Failed to copy new postgresql.conf file."
			exit 4

		fi
		
		if service postgresql status ; then
			
			echo "INFO: Restarting postgresql service..."
			service postgresql restart

		fi
		
	fi
	
else

	echo "INFO: postgresql.conf file not found. Copying new one..."

	success=false
	cp /etc/postgresql/9.5/main/repltemplates/postgresql.conf.primary /etc/postgresql/9.5/main/postgresql.conf && success=true
	
	if ! $success ; then

		echo "ERROR: Failed to copy new postgresql.conf file."
		exit 4

	fi

	if service postgresql status ; then
		
		echo "INFO: Restarting postgresql service..."
		service postgresql restart

	fi
	
fi

if service postgresql status ; then

	echo "INFO: postgresql already running."
	
else

	echo "INFO: Starting postgresql service..."
	service postgresql start

fi

echo "INFO: Ensuring replication role and password..."

success=false
rolecount=$(psql -Atc "SELECT count (*) FROM pg_roles WHERE rolname='${replication_user}';") && success=true

if ! $success ; then

	echo "ERROR: Failed to check existence of '${replication_user}' role."
	exit 5

fi

if [ "$rolecount" = "0" ]; then

	echo "INFO: Replication role not found. Creating..."

	success=false
	psql -c "CREATE ROLE ${replication_user} WITH REPLICATION PASSWORD '${replication_password}' LOGIN;" && success=true

	if ! $success ; then

		echo "ERROR: Failed to create '${replication_user}' role."
		exit 5

	fi

else

	echo "INFO: Replication role found. Ensuring password..."

	success=false
	psql -c "ALTER ROLE ${replication_user} WITH REPLICATION PASSWORD '${replication_password}' LOGIN;" && success=true

	if ! $success ; then

		echo "ERROR: Failed to set password for '${replication_user}' role."
		exit 5

	fi

fi

echo "INFO: Creating primary info file..."
if [ -e /var/lib/postgresql/9.5/main/primary_info ]; then
	rm /var/lib/postgresql/9.5/main/primary_info
fi

echo "REPL_USER=${replication_user}\nREPL_PASSWORD=${replication_password}\nTRIGGER_FILE=${trigger_file}\nSTANDBY_FILE=${standby_file}\n" >> /var/lib/postgresql/9.5/main/primary_info

chown postgres:postgres /var/lib/postgresql/9.5/main/primary_info
chmod 0600 /var/lib/postgresql/9.5/main/primary_info

echo "promote - Done!"
exit 0

3.3.3 create_slot.sh

This script will (re)create replication slot with specified name. Again, if you ignore a boilerplate code, the actual script is short and simple:

  • Checks if the trigger file exists, and refuses to run if not (replication slot can be created only on the primary server).
  • Checks if the slot exists, and (re)creates it as needed. If -r flag is specified, the script will first delete the slot (if exist) and create the new one. If the flag isn't specified, the script won't do anything if the slot with the specified name already exists.

This script is called from the next one (initiate_replication.sh), so you can check there for usage example.

#!/bin/sh
# By Fat Dragon, 05/24/2016
# (Re)creates replication slot.
# NOTE: The script should be executed as postgres user

echo "create_slot - Start"

# Defining default values
trigger_file="/etc/postgresql/9.5/main/im_the_master"
slot_name=""
recreate=false

debug=true

while test $# -gt 0; do

    case "$1" in

        -h|--help)

            echo "Creates replication slot"
            echo " "
            echo "create_slot [options]"
            echo " "
            echo "options:"
            echo "-h, --help                show brief help"
            echo "-t, --trigger_file=FILE   specify trigger file path"
			echo "    Optional, default: /etc/postgresql/9.5/main/im_the_master"
            echo "-n, --name=NAME           slot name (mandatory)"
			echo "                          Slot name can be also specified without using"
			echo "                          flags (i.e. 'create_slot myslot')"
			echo "-r, --recreate            Forces re-creation if the slot already exists"
			echo "    Optional, default: N/A"
			echo "    Description:       Without this flag the script won't do anything if"
			echo "                       the slot with defined name already exists."
			echo "                       With the flag set, if the slot with defined name"
			echo "                       already exists it will be deleted and re-created."
			echo " "
			echo "Error Codes:"
			echo "  1 - Wrong user. The script has to be executed as 'postgres' user."
			echo "  2 - Argument error. Caused either by bad format of provided flags and"
			echo "      arguments or if a mandatory argument is missing."
			echo "  3 - Inapropriate trigger / standby files. This script REQUIRES trigger"
			echo "      file to be present."
			echo "  4 - Error executing a slot-related operation (query/create/drop)."
            exit 0
            ;;

		-t)
			
			shift
            
            if test $# -gt 0; then
            
                trigger_file=$1
            
            else
            
                echo "ERROR: -t flag requires trigger file to be specified."
                exit 2
            
            fi
            
            shift
            ;;
        
        --trigger-file=*)
        
            trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`
        
            shift
            ;;
		
		-n)
		
			if [ "$slot_name" != "" ]; then
		
				echo "ERROR: Invalid command. For help execute 'create_slot -h'"
				exit 2
		
			fi
			
			shift
        
            if test $# -gt 0; then
        
                slot_name=$1
            
            else
            
                echo "ERROR: -n flag requires slot name to be specified."
                exit 2
            
            fi
            
            shift
            ;;

        --name=*)
		    
		    if [ "$slot_name" != "" ]; then
			
				echo "ERROR: Invalid command. For help execute 'create_slot -h'"
				exit 2
			
			fi
			
            slot_name=`echo $1 | sed -e 's/^[^=]*=//g'`
            
            shift
            ;;

        -r|--recreate)

            recreate=true

            shift
            ;;

        *)

			if [ "$slot_name" != "" ]; then
				
				echo "ERROR: Invalid command. For help execute 'create_slot -h'"
				exit 2
			
			fi
			
			slot_name=$1
			
			shift
            ;;

    esac

done

# Ensuring that 'postgres' runs the script
if [ "$(id -u)" -ne "$(id -u postgres)" ]; then

    echo "ERROR: The script must be executed as 'postgres' user."
    exit 1

fi

if [ "$slot_name" = "" ]; then

	echo "ERROR: Slot name is mandatory. For help execute 'create_slot -h'"
	exit 2

fi

if $debug; then

	echo "DEBUG: The script will be executed with the following arguments:"
	echo "DEBUG: --trigger-file=${trigger_file}"
	echo "DEBUG: --name=${slot_name}"
	
	if $recreate; then
		echo "DEBUG: --recreate"
	fi
    
fi

echo "Checking if trigger file exists..."
if [ ! -e $trigger_file ]; then

	echo "ERROR: Cannot create replication slot if the server does not contain trigger file: ${trigger_file}"
	exit 3
	
fi

success=false

echo "INFO: Checking if slot '${slot_name}' exists..."
slotcount=$(psql -Atc "SELECT count (*) FROM pg_replication_slots WHERE slot_name='${slot_name}';") && success=true

if ! $success ; then

	echo "ERROR: Cannot check for '${slot_name}' slot existence."
	exit 4

fi

if [ "$slotcount" = "0" ]; then

	echo "INFO: Slot not found. Creating..."

	success=false
	psql -c "SELECT pg_create_physical_replication_slot('${slot_name}');" && success=true
		
	if ! $success ; then

		echo "ERROR: Cannot create '${slot_name}' slot."
		exit 4

	fi

elif $recreate ; then

	echo "INFO: Slot found. Removing..."

	success=false
	psql -c "SELECT pg_drop_replication_slot('${slot_name}');" && success=true
	
	if ! $success ; then

		echo "ERROR: Cannot drop existing '${slot_name}' slot."
		exit 4

	fi

	echo "INFO: Re-creating the slot..."

	success=false
	psql -c "SELECT pg_create_physical_replication_slot('${slot_name}');" && success=true
	
	if ! $success ; then

		echo "ERROR: Cannot create '${slot_name}' slot."
		exit 4

	fi

fi

echo "create_slot - Done!"
exit 0

3.3.4 initiate_replication.sh

The last script we'll create here is the script that initiates replication (initiates standby server). Again, after ignoring boilerplate code we can say that the script:

  • Checks trigger / standby files. The script has the same behavior regarding this as promote.sh script explained earlier, with only difference that this script demands standby file and refuses trigger file. Flag -f has the same meaning.
  • Ensures that PostgreSQL password file (.pgpass, explained in PostgreSQL HA with pgpool-II - Part 2) contains replication user / password;
  • Tries to recreate replication slot at the specified primary server, and exits if this attempt fails;
  • Stops postgresql service and deletes PostgreSQL data directory;
  • Executes pg_basebackup to get initial backup;
  • Creates recovery.conf file and sets its permissions;
  • Deletes postgresql.conf file, creates the new one from template, and sets its permissions;
  • Starts postgresql service.
#!/bin/sh
# By Fat Dragon, 05/24/2016
# Promoting standby to primary node.
# NOTE: The script should be executed as postgres user

echo "initiate_replication - Start"

# Defining default values
trigger_file="/etc/postgresql/9.5/main/im_the_master"
standby_file="/etc/postgresql/9.5/main/im_slave"
primary_host=""
primary_port="5432"
slot_name=$(echo "$HOSTNAME" | tr '[:upper:]' '[:lower:]')
slot_name=${slot_name/-/_}
replication_user="replication"
replication_password=""
force=false

debug=true

while test $# -gt 0; do

    case "$1" in

        -h|--help)
            
            echo "Promotes a standby server to primary role"
            echo " "
            echo "promote [options]"
            echo " "
            echo "options:"
            echo "-h, --help                show brief help"
            echo "-t, --trigger_file=FILE   specify trigger file path"
            echo "    Optional, default: /etc/postgresql/9.5/main/im_the_master"
            echo "-s, --standby_file=FILE   specify standby file path"
            echo "    Optional, default: /etc/postgresql/9.5/main/im_slave"
            echo "-H, --primary-host=HOST   specify primary host (Mandatory)"
            echo "-P, --primary-port=PORT   specify primary port"
            echo "    Optional, default: 5432"
            echo "-n, --slot_name=NAME      specify slot name"
            echo "    Optional, defaults to lowercase hostname with dashes replaced"
            echo "                          by underscores."
            echo "-u, --user                specify replication role"
            echo "    Optional, default: replication"
            echo "-p, --password=PASSWORD   specify password for --user"
            echo "    Optional, default: empty"
            echo "-f, --force               Forces promotion regardless to"
            echo "                          trigger / standby files."
            echo "    Optional, default: N/A"
            echo "    Description:       Without this flag the script will require"
            echo "                       presence of standby file."
            echo "                       With the flag set the script will create"
            echo "                       standby file as needed."
            echo " "
            echo "Error Codes:"
            echo "  1 - Wrong user. The script has to be executed as 'postgres' user."
            echo "  2 - Argument error. Caused either by bad format of provided flags and"
            echo "      arguments or if a mandatory argument is missing."
            echo "  3 - Inapropriate trigger / standby files. See -f flag for details."
            echo "  4 - Error creating/deleting/copying configuration files"
            echo "      (postgresql.conf and recovery.conf)."
            echo "      Hint: ensure that templates exist and check permissions."
            echo "  5 - Error in communicating with the primary server (to create the"
            echo "      slot or get the initial data)."
            echo "  6 - Error deleting old data directory."
            exit 0
            ;;

        -t)
            
            shift

            if test $# -gt 0; then

                trigger_file=$1

            else

                echo "ERROR: -t flag requires trigger file to be specified."
                exit 2

            fi

            shift
            ;;

        --trigger-file=*)
            
            trigger_file=`echo $1 | sed -e 's/^[^=]*=//g'`
            
            shift
            ;;

        -s)
            
            shift
            
            if test $# -gt 0; then

                standby_file=$1

            else

                echo "ERROR: -s flag requires standby file to be specified."
                exit 2

            fi

            shift
            ;;

        --standby-file=*)

            standby_file=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;

        -H)

            shift

            if test $# -gt 0; then

                primary_host=$1

            else

                echo "ERROR: -H flag requires primary host to be specified."
                exit 2

            fi

            shift
            ;;

        --primary-host=*)

            primary_host=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;

        -P)

            shift

            if test $# -gt 0; then

                primary_port=$1

            else

                echo "ERROR: -p flag requires port to be specified."
                exit 2

            fi

            shift
            ;;

        --primary-port=*)

            primary_port=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;

        -n)

            shift

            if test $# -gt 0; then

                slot_name=$1

            else

                echo "ERROR: -n flag requires slot name to be specified."
                exit 2

            fi

            shift
            ;;

        --slot-name=*)

            slot_name=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;

        -u)

            shift

            if test $# -gt 0; then

                replication_user=$1

            else

                echo "ERROR: -u flag requires replication user to be specified."
                exit 2

            fi

            shift
            ;;

        --user=*)

            replication_user=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;

        -p)

            shift

            if test $# -gt 0; then

                replication_password=$1

            else

                echo "ERROR: -p flag requires replication password to be specified."
                exit 2

            fi

            shift
            ;;

        --password=*)

            replication_password=`echo $1 | sed -e 's/^[^=]*=//g'`

            shift
            ;;
        
        -f|--force)

            force=true

            shift
            ;;

        *)

            echo "ERROR: Unrecognized option $1"
            exit 2
            ;;

    esac

done

# Ensuring that 'postgres' runs the script
if [ "$(id -u)" -ne "$(id -u postgres)" ]; then

    echo "ERROR: The script must be executed as 'postgres' user."
    exit 1

fi

if [ "$primary_host" = "" ]; then

    echo "ERROR: Primary host is mandatory. For help execute 'initiate_replication -h'"
    exit 2

fi

if [ "$replication_password" = "" ]; then

    echo "ERROR: --password is mandatory. For help execute 'initiate_replication -h'"
    exit 2

fi

if $debug; then

	echo "DEBUG: The script will be executed with the following arguments:"
	echo "DEBUG: --trigger-file=$trigger_file"
	echo "DEBUG: --standby_file=$standby_file"
	echo "DEBUG: --primary-host=$primary_host"
	echo "DEBUG: --primary-port=$primary_port"
	echo "DEBUG: --slot-name=$slot_name"
	echo "DEBUG: --user=$replication_user"
	echo "DEBUG: --password=$replication_password"

	if $force; then
		echo "DEBUG: --force"
	fi

fi

echo "INFO: Checking if trigger file exists..."
if [ -e $trigger_file ]; then

	if $force; then

		echo "INFO: Trigger file found. Deleting..."
		rm $trigger_file

	else

		echo "ERROR: Cannot initiate server as standby while it contains trigger file: ${trigger_file}"
		exit 3

	fi
	
fi

echo "INFO: Checking if standby file exists..."
if [ ! -e $standby_file ]; then

	if $force; then

		echo "INFO: Standby file not found. Creating new one..."
		echo "Initiated at: $(date)" >> $standby_file

	else

		echo "ERROR: Cannot initiate server as standby if it does not contain standby file: ${standby_file}"
		exit 3

	fi
	
fi

echo "INFO: Ensuring replication user and password in password file (.pgpass)..."
password_line="*:*:*:${replication_user}:${replication_password}"

if [ ! -f /var/lib/postgresql/.pgpass ]; then

	echo $password_line >> /var/lib/postgresql/.pgpass

elif ! grep -q "$password_line" /var/lib/postgresql/.pgpass ; then

	sed -i -e '$a\' /var/lib/postgresql/.pgpass
	echo $password_line >> /var/lib/postgresql/.pgpass
	sed -i -e '$a\' /var/lib/postgresql/.pgpass

fi

chown postgres:postgres /var/lib/postgresql/.pgpass
chmod 0600 /var/lib/postgresql/.pgpass

success=false

echo "INFO: Creating replication slot at the primary server..."
ssh -T postgres@$primary_host /etc/postgresql/9.5/main/replscripts/create_slot.sh -r $slot_name && success=true

if ! $success ; then

	echo "ERROR: Creating replication slot at the primary server failed."
	exit 5

fi

service postgresql stop

if [ -d /var/lib/postgresql/9.5/main ]; then

    echo "INFO: Deleting old data..."

    success=false       
    rm -rf /var/lib/postgresql/9.5/main && success=true

    if ! $success ; then

        echo "ERROR: Deleting data directory failed."
        exit 6

    fi

fi

echo "INFO: Getting the initial backup..."

success=false
pg_basebackup -D /var/lib/postgresql/9.5/main -h $primary_host -p $primary_port -U $replication_user && success=true

if ! $success; then

	echo "ERROR: Initial backup failed."
	exit 5

fi

if [ -e /var/lib/postgresql/9.5/main/recovery.conf ]; then

	echo "INFO: Removing old recovery.conf file..."

    success=false
    rm /var/lib/postgresql/9.5/main/recovery.conf && success=true

    if ! $success; then

        echo "ERROR: Removing old recovery.conf failed."
        exit 4

    fi

fi

echo "INFO: Creating recovery.conf file..."
cat >/var/lib/postgresql/9.5/main/recovery.conf <<EOL
standby_mode       = 'on'
primary_slot_name  = '${slot_name}'
primary_conninfo   = 'host=${primary_host} port=${primary_port} user=${replication_user} password=${replication_password}'
trigger_file       = '${trigger_file}'
EOL

chown postgres:postgres /var/lib/postgresql/9.5/main/recovery.conf
chmod 0644 /var/lib/postgresql/9.5/main/recovery.conf

if [ -e /etc/postgresql/9.5/main/postgresql.conf ]; then

	echo "INFO: Removing old postgresql.conf file..."

    success=false
    rm /etc/postgresql/9.5/main/postgresql.conf && success=true

    if ! $success; then

        echo "ERROR: Removing old postgresql.conf failed."
        exit 4

    fi

fi

echo "INFO: Copying new postgresql.conf file..."

success=false
cp /etc/postgresql/9.5/main/repltemplates/postgresql.conf.standby /etc/postgresql/9.5/main/postgresql.conf && success=true

if ! $success; then

    echo "ERROR: Copying new postgresql.conf failed."
    exit 4

fi

chown postgres:postgres /etc/postgresql/9.5/main/postgresql.conf
chmod 0644 /etc/postgresql/9.5/main/postgresql.conf

echo "INFO: Starting postgresql service..."
service postgresql start

echo "initiate_replication - Done!"
exit 0

3.3.5 Permissions

Finally, we need to set appropriate permissions for these script files:

$ chown postgres:postgres -R /etc/postgresql/9.5/main/replscripts
$ chmod 0744 -R /etc/postgresql/9.5/main/replscripts

3.4 Result

Once everything is prepared as described in this page, the new primary server can be promoted by executing single line of code at the server that will be the primary (in my case on IT-RDBMS01):

$ sudo -u postgres /etc/postgresql/9.5/main/replscripts/promote.sh -f -p replicationpassword

In this call I've mostly relied on default values for arguments. You can see more details about arguments and their defaults if you execute the script with -h (help) flag.

Similarly you can initiate standby server by executing single line of code at the server that will be standby (in my case IT-RDBMS02):

$ sudo -u postgres /etc/postgresql/9.5/main/replscripts/initiate_replication.sh -f -H IT-RDBMS01 -P 5433 -p replicationpassword

This time I've had to specify few more arguments (primary host and port), but anyway the replication is up and running. You can test it in the same way we've did in PostgreSQL HA with pgpool-II - Part 2.

3.5 Where to Go Next?

Continue with PostgreSQL HA with pgpool-II - Part 4 where we'll finally install pgpool-II.

Comments

Submitted byAnselme Rabouan (not verified) on Thu, 06/09/2016 - 15:25

In initiate_replication script, command 'slot_name=${slot_name/-/_}' only replaces the first occurrence of '-' by '_'
Use 'slot_name=${slot_name//-/_}' instead to replace all of them.
;-)

Submitted byfatdragon on Sun, 06/12/2016 - 15:00

In reply to by Anselme Rabouan (not verified)

Hey, thanks mate!
Sorry 'cause it took me so long to respond - at the moment I'm occupied with some other things (you can check my newest posts).
As you can see I haven't changed pgpool2 authorization thing you've told me the last time. Hopefully these days I will...
And thanks for correcting my scripts!
Keep in touch..
Enjoy!

Submitted byjusty (not verified) on Tue, 05/23/2017 - 23:43

In reply to by fatdragon

Hi Aleksandar,
Glad to see your post on /blog/2016/05/postgresql-ha-pgpool-ii-part-3
However, this website seems to be down. Could you kindly bring this back up. I have more questions as follow up but could not open this website to ask here.

Submitted byAskar Akhmerov (not verified) on Thu, 08/18/2016 - 00:14

you could change #!/bin/sh to #!/bin/bash

solved following problem for me

initiate_replication - Start
./initiate_replication.sh: 14: ./initiate_replication.sh: Bad substitution

Submitted byMohamed Lotfi … (not verified) on Wed, 08/31/2016 - 01:17

In reply to by Askar Akhmerov (not verified)

thank you ;)

Submitted byfatdragon on Tue, 09/13/2016 - 01:35

In reply to by Askar Akhmerov (not verified)

Sorry for my way too late response Askar, and thanks for sharing the problem / solution here!

Submitted byDan (not verified) on Thu, 12/01/2016 - 05:13

Hi,
I'm using postgresql 9.3 and installing it in CentOS.

1. is /etc/postgresql/9.5/main/im_the_master a directory?
1.a. What is placed inside of im_the_master and im_slave?
2. do you have a shell command to be used in replscripts for using WAL? version 9.3 doesn't support Slots.

Submitted byDavid Berardozzi (not verified) on Wed, 01/04/2017 - 11:50

Again great tutorial,
I faced the following problem, when primary server has gone down, the failover script fired up normally but nothing hapened due to the commands "service postgresql stat/restart" asking for interactive authentification. Of course this causes a BIG mess (with pgpool recognizing both servers as primaries and second server staying in read only).

To avoid that I edited sudoer file "sudo visudo" on both primary and standby with "%postgres ALL=(ALL) NOPASSWD: /usr/sbin/service postgresql *"
And i changed the scripts invoking "service postgresql start/restart" with "sudo service postgresql start/restart"

Submitted byfatdragon on Wed, 05/24/2017 - 00:03

In reply to by justy (not verified)

Up again. Sorry!

Submitted byjusty (not verified) on Wed, 05/24/2017 - 00:09

In reply to by fatdragon

Awesome. Thanks a ton Aleksandar. I have some questions. Will post it in seperate thread.

Submitted byjusty (not verified) on Wed, 05/24/2017 - 00:13

At /blog/2016/05/postgresql-ha-pgpool-ii-part-3, in section 3.3. Introducing Some Conventions, I am not clear about what should be the contents of the trigger files :/etc/postgresql/9.5/main/im_the_master and /etc/postgresql/9.5/main/im_slave. Could you kindly share those details Aleksandar.

Submitted byfatdragon on Wed, 05/24/2017 - 00:18

In reply to by justy (not verified)

Content isn't important. Important thing is if the file exists or not. You can put any content in it (I believe that I'm putting some info like date of creation / change).

Submitted byjusty (not verified) on Wed, 05/24/2017 - 00:22

In reply to by fatdragon

Ah ok. So this can be empty as well?

Submitted byfatdragon on Wed, 05/24/2017 - 00:24

In reply to by justy (not verified)

yep

Submitted byjusty (not verified) on Wed, 05/24/2017 - 00:14

Another question is I am using postgres 9.5 on centos 7. Are there equivalent scripts for those you have shared here for supporting this on centos apart from ubuntu ?

Submitted byfatdragon on Wed, 05/24/2017 - 00:19

In reply to by justy (not verified)

It shouldn't be too hard to adopt scripts for CentOS.

Submitted byjusty (not verified) on Wed, 05/24/2017 - 00:44

In reply to by fatdragon

In your post you have mentioned to place the 4 automation scripts under /etc/postgresql/9.5/main/replscripts. I am not sure if these scripts should be place on both primary and standy. Could you confirm?

Submitted byjusty (not verified) on Wed, 05/24/2017 - 05:26

In section 3.5 at /blog/2016/05/postgresql-ha-pgpool-ii-part-3, I was able to promote to primary using the promote.sh script that I have run on primary server but I am facing issue when trying to run the next script that involves initiate_replication.sh on the secondary server. This is the error I get:
-bash-4.2$ /var/lib/pgsql/9.5/data/replscripts/initiate_replication.sh -f -H 10.142.82.232 -P 5433
initiate_replication - Start
DEBUG: The script will be executed with the following arguments:
DEBUG: --trigger-file=/var/lib/pgsql/9.5/data/im_the_master
DEBUG: --standby_file=/var/lib/pgsql/9.5/data/im_slave
DEBUG: --primary-host=10.142.82.232
DEBUG: --primary-port=5433
DEBUG: --slot-name=server2
DEBUG: --user=replication
DEBUG: --password=ca
DEBUG: --force
INFO: Checking if trigger file exists...
INFO: Checking if standby file exists...
INFO: Ensuring replication user and password in password file (.pgpass)...
INFO: Creating replication slot at the primary server...
create_slot - Start
DEBUG: The script will be executed with the following arguments:
DEBUG: --trigger-file=/var/lib/pgsql/9.5/data/im_the_master
DEBUG: --name=server2
DEBUG: --recreate
Checking if trigger file exists...
INFO: Checking if slot 'server2' exists...
psql: could not connect to server: No such file or directory
Is the server running locally and accepting
connections on Unix domain socket "/var/run/postgresql/.s.PGSQL.5432"?
ERROR: Cannot check for 'server2' slot existence.
ERROR: Creating replication slot at the primary server failed.
--------------------------
I have created slot by name 'server2' already on primary server. when I directly connect with psql command, I need to export PGPORT with 5433 every time,not sure why there is a problem here in connecting to the psql on primary server.

Submitted byjusty (not verified) on Thu, 05/25/2017 - 05:30

In reply to by justy (not verified)

I tried debugging my best and I could not figure out why its complaining about connections on port 5432 when I have PGPORT set with 5433. Any help or thoughts on this would be greatly appreciated.

Add new comment

Anonymous comments require solving captcha, and waiting for administrator's approval.