Ever wondered why all the testing passes ok but when it comes to real production load things start to look bad on the database. Only if we knew how to replicate the database load in real time on a testing environment we would have some data to analyze. I was given the same task and pgreplay came to rescue.
I find this tool very helpul when:
–> I want to test a new version of PostgreSQL database
–> Or even when I need to have some statistics around how the database would behave on a new hardware.
–> Also, if I make a configuration change to the database how would it behave in a real time load scenario.
All of the above scenarios are worth spending time and you can have a clear idea how your database will behave when its live in production.
About the module:
pgreplay reads the SQL statements in a PostgreSQL log file, extracts them and executes them in the same order and relative time against a PostgreSQL database cluster.
Assuming you have a test box with the same data as production and are using postgres as your database.
Download and install pgreplay from the below link(I would install this on a test box and not production)
su - postgres wget http://pgfoundry.org/frs/download.php/3345/pgreplay-1.2.0.tar.gz tar -xvf pgreplay-1.2.0.tar.gz ./configure --with-postgres=/opt/PostgreSQL/9.2.4/bin make sudo make install
pgreplay needs to read a postgres log file and that log has to be in a specific format. The change would need a restart of you postgres database. Make a copy of the postgresql.conf file before you go ahead with this change.
log_destination = 'stderr' logging_collector = on log_directory = '/somemountpoint/pg_log' (this is the location where your postgres log file will be generated. Make sure postgres user has write privs) log_min_duration_statement = 100 (this value is set to milliseconds. If you want to log all statements put 0) log_min_messages = error (or more) (if you know that you have no cancel requests, 'log' will do) log_min_error_statement = log (or more) log_connections = on log_disconnections = on log_line_prefix = '%m|%u|%d|%c|' (if you don't use CSV logging) log_statement = 'all' lc_messages must be set to English (the encoding does not matter) bytea_output = escape (from version 9.0 on, only if you want to replay the log on 8.4 or earlier)
Thats it! restart postgres and logging will start the way pgreplay can parse the statements.
There are two ways to replay the postgres logs to another postgres database
1. pgreplay [<parse options>] [<replay options>] [<infile>]
This will parse a PostgreSQL log file and replays the
statements against a database.
2. pgreplay -f [<parse options>] [-o <outfile>] [<infile>]
This will parse a PostgreSQL log file and writes the
contents to a “replay file” that can be replayed with -r
I like the second option better because then I can just copy the parsed file to any database and do whatever testing is necessary
Now that you have a couple of postgres log files in the log location you specified in postgresql.conf file all you have to do is copy the logs over to a test box that has pgreplay installed and run the below procedure:
I first create a parse file:
pgreplay -f -o test.replay /somemountpoint/pg_log/postgresql.log notice you will find the test.replay log file in the current working directory
Now just replay the test.replay parse file and watch the postgres database replay all the statements from production to a test box:
pgreplay -r -j test.replay
There are a lot more options that you can try out with pgreplay I will not be explaining them all but feel free to ask any questions about the module. More information on pgreplay is available here.