PostgreSQL streaming replication (hot_standby)

This is an inbuilt replication system in PostgreSQL that uses wal files to apply changes to the secondary server. Since this replication uses wal files any changes on the primary server(schema or data)are automatically applied to the secondary server.

Points to note:

*One thing to keep into consideration is that primary and secondary database servers should have the same major release installed in order to have this replication working. For eg if you have PostgreSQL 9.3.3 installed on your secondary server you can have 9.3.x on your master or vice a versa. However, any major release 9.1.x will not work with 9.3.x.

*Another point to take into cosideration is that whatever data objects you have on your primary server instance, all of that will be replicated to the secondary instance. You cannot have selective tables database or schemas replicate to your secondary database as it applies wal logs to the secondary server. Exceptional objects that are not replicated to your secondary database are the unlogged tables for eg temp tables.

Configuration:

Assumptions:
*I assume you have postgreSQL 9.0 or higher installed on both primary and secondary server.
*I am using rsync to copy the wal files over to the standby. Make sure rsync works from the primary to the secondary server without using a password (public private key sharing).

On the Primary server:

1. Open the postgresql.conf file and make the following changes:

wal_level = hot_standby
archive_mode = on
archive_command = 'rsync %p hostname_or_ipaddress_secondary_server:path_to_the_archive_dir_on_secondary_server/%f' (create this directory on the secondary server)
max_wal_senders = 1 #This is the number of secondary servers you want to send the wal files to. If you add another server to replication increment this value to 2.
wal_keep_segments = 1200 (this is the number of wal files to keep in case the replication lags behind. Make sure there is enough space in the pg_xlog mount point to accomodate the wal files in the pg_xlog dir. 1200*16/1024=18.75GB)

2. Edit the pg_hba.conf file and grant the secondary server access for replication

host replication postgres ip_address_of_secondary_server/22 trust

3. Restart the PostgreSQL database

4. Make sure that wal files are being pushed to the archive directory on the secondary server (if not check the error in postgreql log file and rectify any permission or file transfer related issues)

5. Now take online-backup of the primary database

$ psql -c "SELECT pg_start_backup('label', true)"

$ rsync -av --exclude pg_xlog --exclude postgresql.conf --exclude postgresql.pid path_of data_dir_on_primary/* ip_address_of_secondary_server:path_of data_dir_on_secondary

$ psql -c "SELECT pg_stop_backup()"

On the Secondary server:

1. Edit the postgresql.conf file:

hot_standby = on
archive_mode = off

2. Create recovery.conf and add following entries:

# Note that recovery.conf must be in $PGDATA directory.

standby_mode = 'on'

primary_conninfo = 'host=ip_address_of_primary port=5432 user=postgres'

trigger_file = '/path_to/trigger.tg' (this is the file that will be created when we want to make the secondary server as primary)

restore_command = 'rsync /path_to/archive/%f "%p"'

3. Start the standby database

Monitoring:

You can calculate the replication lag by comparing the current WAL write location on the primary with the last WAL location received/replayed by the standby. They can be retrieved using pg_current_xlog_location on the primary and the pg_last_xlog_receive_location/pg_last_xlog_replay_location on the standby, respectively.

$ psql -c "SELECT pg_current_xlog_location()" -h ip_address_of_primary (primary host)
pg_current_xlog_location
--------------------------
0/5000000
(1 row)

$ psql -c "select pg_last_xlog_receive_location()" -h ip_address_of_secondary (standby host)
pg_last_xlog_receive_location
-------------------------------
0/5000000
(1 row)

$ psql -c "select pg_last_xlog_replay_location()" -h ip_address_of_secondary (standby host)
pg_last_xlog_replay_location
------------------------------
0/5000000
(1 row)

Streaming replication process can also be checked by using ps -ef

Primary

$ ps -ef | grep sender
postgres 4950 4925 0 12:32 ? 00:00:00 postgres: wal sender process postgres 127.0.0.1(4925) streaming 0/5000000

Secondary/standby

$ ps -ef | grep receiver
postgres 5955 5941 1 12:33 ? 00:00:01 postgres: wal receiver process streaming 0/5000000
Advertisements

  1. #1 by Obernhard on April 16, 2014 - 7:36 am

    Hi Vikram,
    good post but it looks like to me the main feature of streaming replication is not mentionned :
    you mention “Streaming replication uses wal files to apply changes to the secondary server”. This is actually an option, but it usually comes in addition to what streaming replication is really : Streaming transactions from source to target without using wal files.
    You can perfectly setup streaming replication without a recovery file on the slave node. The downside is that in case of downtime you would have to retrieve wal files from master node in order to resync the slave node. Without recovery file, transactions are transferred from master to slave node using wal sender and wal receiver processes.
    Another advantage of streaming mechanism is that you do not have to wait for an entire wal file to be generated in order to have changes applied on the slave node.
    Streaming replication is really about “streaming”, and recovering from wal files is only used as an extra security in case there’s a network downtime so that with streaming replication you cannot catch up the gap. When network is up, archive wals are never used, only streaming mechanisms are used.
    Kind regards,
    Olivier

    • #2 by Vikram Rai on April 16, 2014 - 3:47 pm

      Thanks Olivier! I should have mentioned more on streaming replication. This is good information for a lot of people out there that are looking for help around streaming replication. Am glad you put in an effort to write this down. Appreciate it !! Stay tuned for more to come on this blog.

  2. #3 by Obernhard on April 16, 2014 - 6:17 pm

    Hey, by the way i did a mistake by saying you do not need any recovery file. Actually you need it, otherwise there won’t be any recovery as far as i know (Recovering xlogs sent on the fly by wal senders). But there’s no mandatory need to retrieve archived wals (strictly considering the recovery of xlogs). But of course i would never work without retrieving archived log, just in case we need to solve a gap because of long duration network failure between master and slave šŸ™‚

    Have to mention i appreciate to find a good blog about postgresql. Have been working on postgresql since more than one year (and in the meantime I’ve been in the Oracle world since 15 years). Finding as much detailed information as what exists in Oracle publications is not an easy task with postgresql. Yes the documentation has lots of information, but 1) deserves a better organization 2) much more insights about core internals. I guess this will come over time when postgresql becomes more and more popular.

    Will keep reading your blog šŸ™‚

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: