WAL-E (Incremental backups with S3 support)

It has been a while since I wrote on my blog. Thought this would be a good addition to the knowledgebase. I came accross this online backup tool that I believe is worth writing about. Incremental backups are taken care of and base backups are compressed and sent across to S3. No more writting shell scripts when it comes to shipping them to S3 anymore which I thought was pretty neat. I did struggle a bit to get it installed on a CentOS box and did not find much online help for it. Hopefully this will help someone get this module up and going in no time.

Dependencies:

python (>= 2.6)
lzop
pv
And ofcourse we are talking about postgres database here so any postgres version >=8.4 should work with it

First we have to get python26 or greater which for some reason I could not get it from the CentOS repo even after a yum update.

1. Download python version from the below link

wget http://python.org/ftp/python/2.7.6/Python-2.7.6.tar.xz

Extract the file and compile it with
./configure
make
sudo make install

NOTE: if you are having trouble xtracting the tar.xz file install xz
wget http://tukaani.org/xz/xz-5.0.5.tar.gz
./configure
make
sudo make install

2. We will need Setuptools as well to have our module compiled

wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py

Then install it for Python 2.7 that you installed above.
sudo /usr/local/bin/python2.7 ez_setup.py

NOTE: Here I was set back again because the ez_setup.py script was trying to download setuptools with a certificate check. All I did was added –no-check-certificate in the script where it was doing a wget like:

def download_file_wget(url, target):
cmd = ['wget', url, '--no-check-certificate', '--quiet', '--output-document', target]
_clean_check(cmd, target)

3. Install pip using the newly installed setuptools:

sudo easy_install-2.7 pip

4. Install virtualenv for Python 2.7

pip2.7 install virtualenv

sudo pip install wal-e

Now that we have wal-e installed we are going to make a couple of configuration changes that will enable wal-e to work with S3.

1. Create an environment directory to use wal-e

mkdir -p /etc/wal-e.d/env
chown -R postgres:postgres /etc/wal-e.d
echo "secret_key_goes_here"> /etc/wal-e.d/env/AWS_SECRET_ACCESS_KEY
echo "access_id_for_s3_goes_here"> /etc/wal-e.d/env/AWS_ACCESS_KEY_ID
echo 's3://specify_bucket_name/directory_if_you_have_created_on_in_the_bucket'> /etc/wal-e.d/env/WALE_S3_PREFIX

2. Since this is going to be an incremental backup setup we would have to turn archiving on.

wal_level = archive
archive_mode = yes
archive_command = 'envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-push %p'
archive_timeout = 60

NOTE: you would have to restart your postgres database so that these changes can be read by postgres

Thats it! Now you can start making a base backup and forget about the incremental as wal-e automatically ships those wal files to S3 :). Reason is the archive command we have setup in the postgresql.conf file.

1. To take a base backup:

su postgres
envdir /etc/wal-e.d/env /usr/local/bin/wal-e backup-push /path to your datadir

You can always list the backups that you have on S3 by:
envdir /etc/wal-e.d/env /usr/bin/wal-e backup-list

name last_modified expanded_size_bytes wal_segment_backup_start wal_segment_offset_backup_start wal_segment_backup_stop wal_segment_offset_backup_stop
base_00000001000000AD000000C7_00000040 2014-03-06T17:51:26.000Z 00000001000000AD000000C7 00000040

2. Deleting or retaining number of backups is easy as well.

If you want to delete a specific backup
wal-e delete [--confirm] base_00000004000002DF000000A6_03626144

Or you can just delete backups older than a base backup by using the before clause:
wal-e delete [--confirm] before base_00000004000002DF000000A6_03626144

Retaining number of backups as:
wal-e delete [--confirm] retain 5

3. Restoring using backup-fetch

To restore the complete database on a seperate box:
envdir /etc/wal-e.d/env wal-e backup-fetch

Wal fetch can also be accomplished with wal-e:
envdir /etc/wal-e.d/env wal-e wal-fetch

There are a couple of more things that can be done with wal-e like using encryption on backups, managing tablespace backups(this is imp if you have user defined tablespaces in your database) controlling I/O of base backup, increasing throughput of wal-push etc. You might want to check into those options before putting this in production as base back I/O’s can take a decent amount of CPU overhead if not configured properly. Here is the link that will help with further information on this module.

Feel free to ask questions and hope this helped.

Advertisements
  1. #1 by Amit on April 20, 2014 - 4:37 pm

    Good one.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: