Share

PostgreSQL or Postgres is an open source object-relational database management system (ORDBMS) with more than 15 years of active development. It’s a powerful database server and can handle high workloads. PostgreSQL can be used on Linux, Unix, BSD and Windows servers.

Set Up Master-Slave Replication on PostgreSQL

PostgreSQL 9.5  Ubuntu 16.04

Database replication is the term we use to describe the technology used to maintain a copy of a set of data on a remote system. Keeping a reliable copy of a running system is one of the biggest concerns of redundancy and we all like maintainable, easy-to-use and stable copies of our data.

Typically, individual database servers are referred to as nodes. The whole group of database servers involved in replication is known as a cluster. A database server that allows a user to make changes is known as a master or primary or may be described as a source of changes. A database server that only allows read-only access is known as a Hot Standby.

The key aspect of replication is that data changes are captured on a master, and then transferred to other nodes. In some cases, a node may send data changes to other nodes, which is a process known as cascading or relay. Thus, the master is a sending node but not all sending nodes need to be masters. Replication is often categorized by whether more than one master node is allowed, in which case it will be known as multi-master replication.

The master/slave database replication is a process of copying (syncing) data from a database on one server (the master) to a database on another server (the slaves). The main benefit of this process is to distribute databases to multiple machines, so when the master server has a problem, there is a backup machine with same data available for handling requests without interruption.

PostgreSQL provides several ways to replicate a database. It can be used for backup purposes and to provide a high availability database server. In this blog, we will check how to install and configure PostgreSQL replication by using hot standby mode.

Hot standby mode requires 2 database servers, we will use Ubuntu 16.04 as the operating system on both servers.

  1. Master Server – accepts connections from the client with read and write permissions.
  2. Slave Server – the standby server runs a copy of the data from the master server with read-only permission.

Configuration:

Master Server(10.10.3.25): It’s a primary server runs as the active database. This database accepts connections from clients and permits read-write operations.

Slave Server(10.10.4.25): It’s a standby server run a copy of the active database. These databases are configured to accept connections from clients and permit read-only operations. If the primary database server fails, the system can failover to the standby server, which makes the standby server the active primary server.

How Hot Standby works:

  1. Postgres uses write-ahead logging (WAL) to continuously archive database transactions. For each change made to the data files, WAL writes an entry in a log file.( The log records describing the changes have been flushed to permanent storage) The system uses these log entries to perform point-in- time restoration from archives and to keep the standby server up to date.
  2. The process of updating the standby server with WAL entries is called streaming replication. This process operates asynchronously, which means it can take some time for the standby server to receive an update from the primary server. Though this delay can be very short, synchronization between the servers is not instantaneous.
  3. Postgres doesn’t provide functionality to automatically failover when the primary server fails. This is a manual operation unless we use a third-party solution to manage failover.
  4. Load balancing is not automatic with Hot Standby. If load balancing is a requirement we must provide a load-balancing solution that uses the primary server for read-write operations and the standby server for read-only operations.

Step 1: We will configure ssh access between our servers to make transferring files easier.

We will need to set a password for the Postgres user so that we can transfer the key files initially. If you desire, you can remove the password at a later time:

sudo
passwd postgres

Switch over to the Postgres user like this:

sudo
su - postgres

Generate an ssh key for the Postgres user:

ssh-keygen

Press “ENTER” to all of the prompts that follow.

Transfer the keys to the other server by typing:

ssh-copy-id IP_address_of_opposite_server

We now able to ssh freely between the two servers as the Postgres user.

Step 2: Configure the Master Server

We will begin by configuring the master server. All of these commands should be executed as the Postgres user.

First, we will create a user called “rep” that can be used solely for replication:

psql -c "CREATE USER rep REPLICATION LOGIN CONNECTION LIMIT 1 ENCRYPTED PASSWORD 'yourpassword';"

Next, we will move to the Postgres configuration directory:

cd /etc/postgresql/9.5/main

We will modify the access file with the user we just created:

nano pg_hba.conf

Add a line to let the new user get access to this server:

host replication rep 10.10.4.25/32 md5

Save and close the file.

Next, we will open the main postgres configuration file:

nano postgresql.conf

Find these parameters. Uncomment them if they are commented, and modify the values according to what we have listed below:

listen_addresses = '*'

wal_level = hot_standby

checkpoint_segments = 8

archive_mode = on

archive_command = 'cp -i %p /var/lib/postgresql/9.5/main/archive/%f'

max_wal_senders = 3

wal_keep_segments = 8

wal_log_hints = on

Save and close the file.

Now create a new directory inside of the ‘main’ directory for the archive configuration – run the command below as Postgres user:

mkdir -p /var/lib/9.5/main/archive/

Restart the master server to implement your changes:

service postgresql restart

Step 3: Configure the Slave Server

Begin on the slave server by shutting down the Postgres database software:

service postgresql stop

We will be making some similar configuration changes to Postgres files, so change to the configuration directory:

cd /etc/postgresql/9.5/main

Adjust the access file to allow the other server to connect to this. This is in case we need to turn the slave into the master later on down the road.

nano pg_hba.conf

Again, add this line somewhere not at the end of the file:

host replication rep 10.10.3.25/32 md5

Save and close the file.

Next, open the postgres configuration file:

nano postgresql.conf

We can use the same configuration options you set for the master server. And add hot_standby = on

listen_addresses = '*'

wal_level = hot_standby

checkpoint_segments = 8

archive_mode = on

archive_command = 'cp -i %p /var/lib/postgresql/9.5/main/archive/%f'

max_wal_senders = 3

wal_keep_segments = 8

wal_log_hints = on hot_standby = on

Step 4 – Syncronize Data from Master server to Slave server

In this step, we will move the PostgreSQL data directory ‘/var/lib/postgresql/9.5/main’ to a backup folder and then replace it with the latest master data with ‘pg_basebackup’ command.

Before the slave can replicate the master, we need to give it the initial database to build off of. This is because it reads the logs off of the master server and applies the changes to its own database. We need that database to match the master database.

Now login to the Postgres user and rename the ‘main’ directory to ‘main_original’ as a backup.

su - postgres
mv 9.5/main 9.5/main_original

Run the command below to copy data from the master server to slave server:

pg_basebackup -h 10.10.3.25 -D /var/lib/postgresql/9.5/main -U rep -v -P

Note:

  • 10.10.3.25 is master server IP address.
  • And you will be prompted to entering the password for user ‘rep’ for the replication.

Go to the new ‘main’ directory and create the new recovery file ‘recovery.conf‘ with vim:

cd /var/lib/postgresql/9.5/main/
vim recovery.conf
standby_mode = 'on'

primary_conninfo = 'host=10.10.3.25 port=5432 user=rep password=password'

restore_command = 'cp //var/lib/postgresql/9.5/main/archive/%f %p'

trigger_file = '/tmp/postgresql.trigger.5432'

The last line in the file,trigger_file is one of the main parts of the entire configuration. If we create a file at that location on the slave machine, the slave will reconfigure itself to act as a master.

Now back to the root user with exit and start PostgreSQL

sudo service postgresql start

Check the logs to see if there are any problems. They are located on both machines here:

less /var/log/postgresql/postgresql-9.5-main.log

It should see that it is successfully connecting to the master server.

Step 5 – Testing

Go to the master server and log into the Postgres user, then run the command below to see the replication info.

su - postgres

psql -x -c "select * from pg_stat_replication;"

Next, test to create a new database from the master server and then check that the database exists on the slave server.

Step 6 – Primary Method to promoting the Backup Server to Main Server

sudo touch /tmp/postgresql.trigger.5432

This will break the current replication, especially if the master server is still running, but is what we would need to do if the master server goes down. This will allow the slave to begin accepting writes. We can then fix the master server and turn that into the slave.

This will cause Postgres to:

  • Notice the file
  • Finish off the slave process and recover any log files transferred from the main server that still needs recovery
  • Rename the ‘recovery.conf’ file to ‘recovery.done’
  • when that happens, it is now the primary server

Alternate Method using pg_ctl

use pg_ctl to promote the server

su – postgres
 enter the postgres password
/usr/lib/postgresql/9.5/bin/pg_ctl -D /var/lib/postgresql/9.5/main promote

Now we have a master and slave PostgreSQL server configured to communicate effectively. If we have an application that will be writing to and querying the databases, we could set up a load balancing scheme to always write to the master, but split the reads between the master and slave. This could increase the performance of the database interactions.

Recovery after failover by using pg_rewind

pg_rewind is a tool for synchronizing a PostgreSQL cluster with another copy of the same cluster after the clusters’ timelines have diverged. A typical scenario is to bring an old master server back online after failover, as a standby that follows the new master.

How pg_rewind works

pg_rewind scans the “old” master’s PGDATA folder, identifies the data blocks changed during the switch to the new timeline, then copies only these blocks from the promoted slave. This is then used to replace the changes. As a “collateral” effect, the configuration files are also copied from the promoted standby However, this allows the prevention of re-syncing PGDATA completely.

To do this, it is necessary to have all the WALs produced in the final moments before the switchover from the old master. Changes are identified by comparing the status of the data blocks present in the PGDATA with the changes logged in the WALs. Once the changed blocks are identified, the WALs are replayed, mixing a sort of ‘rewind’ of the timelines.

The result is equivalent to replacing the target data directory with the source one. All files are copied, including configuration files. The advantage of pg_rewind over taking a new base backup, or tools like rsync, is thatpg_rewind does not require reading through all unchanged files in the cluster. That makes it a lot faster when the database is large and only a small portion of it differs between the clusters.

pg_rewind examines the timeline histories of the source and target clusters to determine the point where they diverged and expects to find WAL in the target cluster’s pg_xlog directory reaching all the way back to the point of divergence. In the typical failover scenario where the target cluster was shut down soon after the divergence, that is not a problem, but if the target cluster had run for a long time after the divergence, the old WAL files might not be present anymore. In that case, they can be manually copied from the WAL archive to the pg_xlog directory.

When the target server is started up for the first time after running pg_rewind, it will go into recovery mode and replay all WAL generated in the source server after the point of divergence. If some of the WAL was no longer available in the source server when pg_rewind was run, and therefore could not be copied by pg_rewind session, it needs to be made available when the target server is started up. That can be done by creating a recovery.conf file in the target data directory with a suitable restore_command.

Moreover: the instances have to be initialised with the “-k” (or –data-checksums) parameter the parameter wal_log_hints has to be enabled.

sudo service postgresql stop
su - postgres
/usr/lib/postgresql/9.5/bin/pg_rewind --target-pgdata=/var/lib/postgresql/9.5/main --source-server='host=10.10.4.25 port=5432 user=postgres password=password'

Keep in mind that just a few blocks were copied (those changed during the split-brain), even if your database size is hundreds of GBs! Remember that the configurations are also copied, and eventually the already-present recovery.conf in the “downgraded” master has been over-written. So, remember to:

  • change the port where the instance listens in the postgresql.conf
  • change the primary_conninfo in the recovery.conf in order to make the downgraded master connect to the promoted master
cd /etc/postgresql/9.5/main/
 nano postgresql.conf
And add hot_standby = on
cd /var/lib/postgresql/9.5/main/
vim recovery.conf
standby_mode = 'on'

primary_conninfo = 'host=10.10.4.25 port=5432 user=rep password=password'

restore_command = 'cp //var/lib/postgresql/9.5/main/archive/%f %p'

trigger_file = '/tmp/postgresql.trigger.5432'
 sudo service postgresql start

Once this has been done, start the downgraded master and that will start to follow up the promoted standby and will then be its standby in turn.