PGREPLAY (replicating production load in a test environment)

Ever wondered why all the testing passes ok but when it comes to real production load things start to look bad on the database. Only if we knew how to replicate the database load in real time on a testing environment we would have some data to analyze. I was given the same task and pgreplay came to rescue.
I find this tool very helpul when:

–> I want to test a new version of PostgreSQL database
–> Or even when I need to have some statistics around how the database would behave on a new hardware.
–> Also, if I make a configuration change to the database how would it behave in a real time load scenario.

All of the above scenarios are worth spending time and you can have a clear idea how your database will behave when its live in production.

About the module:

pgreplay reads the SQL statements in a PostgreSQL log file, extracts them and executes them in the same order and relative time against a PostgreSQL database cluster.

Installation:

Assuming you have a test box with the same data as production and are using postgres as your database.
Download and install pgreplay from the below link(I would install this on a test box and not production)

su - postgres
wget http://pgfoundry.org/frs/download.php/3345/pgreplay-1.2.0.tar.gz
tar -xvf pgreplay-1.2.0.tar.gz
./configure --with-postgres=/opt/PostgreSQL/9.2.4/bin
make
sudo make install

Configuration:

pgreplay needs to read a postgres log file and that log has to be in a specific format. The change would need a restart of you postgres database. Make a copy of the postgresql.conf file before you go ahead with this change.

log_destination = 'stderr'
logging_collector = on
log_directory = '/somemountpoint/pg_log' (this is the location where your postgres log file will be generated. Make sure postgres user has write privs)
log_min_duration_statement = 100 (this value is set to milliseconds. If you want to log all statements put 0)
log_min_messages = error  (or more)
   (if you know that you have no cancel requests, 'log' will do)
log_min_error_statement = log  (or more)
log_connections = on
log_disconnections = on
log_line_prefix = '%m|%u|%d|%c|'  (if you don't use CSV logging)
log_statement = 'all'
lc_messages must be set to English (the encoding does not matter)
bytea_output = escape  (from version 9.0 on, only if you want to replay
                        the log on 8.4 or earlier)

Thats it! restart postgres and logging will start the way pgreplay can parse the statements.

Usage:

There are two ways to replay the postgres logs to another postgres database

1. pgreplay [<parse options>] [<replay options>] [<infile>] 

This will parse a PostgreSQL log file and replays the
statements against a database.

2. pgreplay -f [<parse options>] [-o <outfile>] [<infile>]

This will parse a PostgreSQL log file and writes the
contents to a “replay file” that can be replayed with -r

I like the second option better because then I can just copy the parsed file to any database and do whatever testing is necessary
Now that you have a couple of postgres log files in the log location you specified in postgresql.conf file all you have to do is copy the logs over to a test box that has pgreplay installed and run the below procedure:

I first create a parse file:

pgreplay -f -o test.replay /somemountpoint/pg_log/postgresql.log
notice you will find the test.replay log file in the current working directory

Now just replay the test.replay parse file and watch the postgres database replay all the statements from production to a test box:

pgreplay -r -j test.replay

There are a lot more options that you can try out with pgreplay I will not be explaining them all but feel free to ask any questions about the module. More information on pgreplay is available here.

Advertisements
  1. #1 by Pius Chan on June 13, 2014 - 9:55 pm

    Help. I test pgreplay and can not make it work. I test it with a simple postgres database log in CSV format with only 4 lines:

    2014-05-30 06:00:00.007 PDT,,,29428,””,538880d0.72f4,1,””,2014-05-30 06:00:00 PDT,,0,LOG,00000,”connection received: host=192.168.1.101 port=36131″,,,,,,,,,””
    2014-05-30 06:00:00.009 PDT,”postgres”,”TESTDB”,29428,”192.168.1.101:36131″,538880d0.72f4,2,”authentication”,2014-05-30 06:00:00 PDT,59/5693130,0,LOG,00000,”connection authorized: user=postgres database=MONSOON”,,,,,,,,,””
    2014-05-30 06:00:00.953 PDT,”postgres”,”TESTDB”,29428,”192.168.1.101:36131″,538880d0.72f4,3,”SELECT”,2014-05-30 06:00:00 PDT,59/0,0,LOG,00000,”duration: 941.939 ms statement: select count(id) from customer”,,,,,,,,,””
    2014-05-30 06:00:01.389 PDT,”postgres”,”TESTDB”,29428,”192.168.1.101:36131″,538880d0.72f4,4,”idle”,2014-05-30 06:00:00 PDT,,0,LOG,00000,”disconnection: session time: 0:00:01.381 user=postgres database=MONSOON host=192.168.1.101 port=36131″,,,,,,,,,””

    So I run pgreplay to generate the replay file but the statistics always reflect no SQL statement processed:

    $ pgreplay -f -cq -o test1.replay test1.log

    Parse statistics
    ================

    Log lines read: 4
    Total SQL statements processed: 0
    Simple SQL statements processed: 0
    Parametrized SQL statements processed: 0
    Named prepared SQL statements executions processed: 0
    Cancel requests processed: 0
    Fast-path function calls ignored: 0

    When replay the generated file, the database log shows that the “select count(id) from customer” is missing:

    2014-06-13 14:52:12.979 PDT,,,18491,””,539b728c.483b,1,””,2014-06-13 14:52:12 PDT,,0,LOG,00000,”connection received: host=192.168.6.151 port=37853″,,,,,,,,,””
    2014-06-13 14:52:12.979 PDT,”postgres”,”TESTDB”,18491,”192.168.6.151:37853″,539b728c.483b,2,”authentication”,2014-06-13 14:52:12 PDT,28/207566,0,LOG,00000,”connection authorized: user=postgres database=TESTDB”,,,,,,,,,””
    2014-06-13 14:52:14.359 PDT,”postgres”,”TESTDB”,18491,”192.168.6.151:37853″,539b728c.483b,3,”idle”,2014-06-13 14:52:12 PDT,,0,LOG,00000,”disconnection: session time: 0:00:01.380 user=postgres database=TESTDB host=192.168.6.151 port=37853″,,,,,,,,,””

    Any idea?

    Thanks a lot,

    Pius

    • #2 by Vikram Rai on June 17, 2014 - 6:33 pm

      Hi Pius,

      It looks like pg_replay is ignoring the log input. Could you please let me know the output of the log_line_prefix that you have set for your postgres logging. Trying to figure out why this might be happening with a simple select operation.

      Thanks,
      Vikram

  2. #3 by Pius Chan on June 19, 2014 - 5:03 pm

    Hi Vikram,

    The log_line_prefix setting is:

    log_line_prefix = ‘%t [%d] [%u] [%p]: [%l-1] %h ‘

    However, I find that once the log_destination is set to ‘csvlog’, the database log format is fixed. The log_line_prefix does not affect it.

    Thanks a lot,

    Pius

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: