1. Environment Variables
3. Sound / Vibrate Alert
5. Backup / Restore
In order to connect to a database, user would first need to specify connection details in this section.
–> PostgreSQL < 9.2
Check this option to monitor a PostgreSQL database older than 9.2.x. If this is not setup properly the monitor will not display all checks.
–> Use SSL
Check this only if postgres database has been configured with open ssl, so that all data which is being sent / received to and from the database is encrypted. Private key authentication is currently being worked on and will be part of future release.
–> Hostname / IP
Enter the hostname (FQDN) or IP address of the database server.
–> Database name
This is the database name that you would like to monitor and maintain.
–> DB port
The port number that your database runs on. User can get this information from the postgresql.conf file or by looking at the show port psql output.
–> DB username
The postgres database username.
–> DB password
Password of the database user in the DB username field
After specifying the above it is a good practice to test connection using the “Test Connection” button. If there is an error you will see a pop up window with the error description. The timeout to check the connection is set to 20 seconds in case there are connectivity issues. You will see “Connection Sucessful” if the test connection passes.
This screen is used to configure threshold limits for alert notifications. This is important if a user wants to receive real time alerts on the database. A mobile device will alert with sound and vibrate(if device is capable) incase a threshold is crossed.
Current thresholds include:
–> Connections: Monitors connection count and the threshold can be set from 0 to 1000. If you like to disable alerting on this particular check, slide the connection count to max (1000) and the alert for connections will be disabled.
–> Locks: Monitors number of locks in the database and threshold can be set from 0 to 10. If you like to disable alerting on this particular check, slide the locks count to max (10) and the alert for locks will be disabled.
–> Hung sessions: Monitors number of hung queries and the threshold can be set from 0 to 10. If you like to disable alerting on this particular check, slide the hung sessions count to max (10) and the alert for hung sessions will be disabled.
–> Hit %: Hit% of the entire database can be monitored using this threshold and the threshold can be set from 0 to 100. If you like to disable alerting on this particular check, slide the hit% count to mim (0) and the alert for hit% will be disabled. Lower the hit% the more seq scans are happening in the database.
–> Bloats: Number of bloated table threshold can be set from 0 to 10. If you like to disable alerting on this particular check, slide the bloats count to max (10) and the alert for bloats will be disabled.
–> Long_run: Number of long running queries threshold can be set from 0 to 10. If you like to disable alerting on this particular check, slide the long_run count to max (10) and the alert for long_running queries will be disabled.
–> Standby Lag: Standby replication lag threshold can be set from 0 to 1024. Currently this threshold is for all the standby servers using postgres inbuilt streaming replication. If you like to disable alerting on this particular check, slide the standby_lag mB to max (1024) and the alert for standby_lag will be disabled.
Sound / Vibrate Alert:
This button can be used to turn on / off all sound/vibrate alerts generated by dBVigil.
The monitor screen is where you can see stats of your database at a glance. This screen shows the following information:
–> DB Conn’s : Total active database connections.
–> BG_Max : Total number of times background writer max value was exceeded.
–> Standby: Shows max lag on any of the standby servers.
–> Locks: Total number of active locks in the database.
–> Hung: Total number of hung queries (queries in idle in transaction mode for over a minute).
–> Long_run: Total number of long running queries over a minute.
–> Hit%: Hit% of the entire database.
–> Bloats: Total number of bloated tables.
–> DB_Size: Size of the current connected database.
If any of the checks in the monitor screen passes a threshold value set in the application, the application will alert the user via sound and vibrate (if the sound and vibrate button is on) and the application will also notify the user which threshold has been crossed by blinking the corresponding check. If the sound and vibrate is set to off. The application will silently blink the check.
Every check in the Monitor screen shows more information about what the issue is. For eg: if you tap on the connections check, it will open another screen that will give a user information on which user is connected to the database, what ip address the user is connected from, how many active connections does the user have and whether or not the connection is in waiting state.
A user can tap on all the other checks as well in the monitor screen and will get relevant information about the check. This information can be helpful to further investigate the issue with the alert.
Monitor Activity NEEDS TO BE OPEN if a user wants monitoring alert. A user can perform other activities on the device without interrupting active monitoring. Even if the device is in sleep mode dBVigil will actively monitor the database and send alerts. Since the application is for a mobile device I have made an effort to make it seamlessly work with network interruptions. This means that if network is lost / switched from mobile network to wifi, dBVigil will attempt to reconnect to the database to gather the current stats from the database as soon as the device gets a viable network to connect to.
Backup / Restore:
Backup and restore options are available using dBvigil:
–> Backup table to a backup table:
Specify the schema.tablename of the table to backup(in source table) and schema.table of the backup table(in backup to table) to backup the table and dBVigil will backup the source table for the user and notify when the backup completes.
–> Backup table to a file on the server:
Specify the schema.tablename of the table to backup(in source table) and full path and filename of backup file(in backup to file) to backup the table and dBVigil will backup the source table for the user and notify when the backup completes.
–> Restore backup of table from file:
Make sure there is a backup file that was used using COPY command in postgres. Specify the schema.tablename of the table to restore(table name) and full path and filename of backup file(in restore from file) to restore the table and dBVigil will restore the table for the user and notify when the restore completes.
Vacuum/Reindex and manage postgres processes:
The monitor screen will show and alert a user when there are issues with the database but in order to address some of the issues a user can take actions using the maintenance screen.
Every activity in this screen will prompt for the database user password that should match the password supplied in the environment variables screen. If the password does not match the task will fail.
–> Kill blocking queries: This will kill all the blocking queries that are responsible for the locks locks shown in the monitor screen.
–> Kill hung sessions: Will kill all hung sessions that are in idle in transaction state for over a minute as shown in the monitor screen.
–> Kill by pid: Provide pid and click on the kill by pid button and this will terminate the process by pid.
Use the appropriate button and provide the database user password to execute vacuum and reindex related jobs to run on the entire database. VacuumDB runs vacuum analyze on the entire database VS to vacuum full
–> Database Vacuum: Specify a database name in the cluster to run vacuum analyze, this database name will be used for table vacuum and table reindex as well. Make sure the database name is specified.
–> Table Vacuum: Schema.table_name to run vaccum analyze.
–> Table Reindex Schema.table_name to run reindex.
–> Does dBVigil work on cloud based postgres instances?
I have tested dBVigil on physical servers using VPN access, ec2 and RDS instances on amazon. dBVigil should work on other cloud instances as well as long as the jdbc driver can connect to a postgres instance.
–> How frequent is the data fetched from the database?
The checks on the monitor screen are divided into two blocks. Frequent checks and non frequent checks.
Frequent checks execute every 10 seconds and are listed below:
Non frequent checks run every 30 minutes (starting the time user opts to launch monitor screen)
–> Is dBVigil secure?
dBVigil uses pg_catalog tables to gather statistics of the database. This is generic information of the database and does NOT extract data from any user created tables/functions/views etc. Moreover, using ssl connection a user can secure a database connection over mobile network. Future releases of dBVigil will have support for private key authentication over ssl encryption.
–> Is there proper connection management when using dBVigil?
Yes, I have made sure that every connection and every transaction that is called using dBVigil is closed before another request for connection is made. User will NOT see connections opened by dBVigil which are not being closed properly.
–> Limited visibility of query definition in details about various checks.
This is a limitation of android kitkat regarding nested horizontal scroll for rows. I wanted to add support for kitkat as a lot of users out there are still using kitkat. This issue will be addressed in upcoming release.
–> How many standby servers can I monitor?
The check on standby server in monitor queries for all added servers that the primary is aware of and reports the lag on the worst performing candidate.
–> Would there be multi server support
This concept will be an added feature in future releases.
–> Would dBVigil support different kinds of databases?
With the demand and requirement, implementation of mysql / other databases will be added in future releases.
–> How do I get support?
Send an email with questions/concerns to firstname.lastname@example.org
This is an inbuilt replication system in PostgreSQL that uses wal files to apply changes to the secondary server. Since this replication uses wal files any changes on the primary server(schema or data)are automatically applied to the secondary server.
Points to note:
*One thing to keep into consideration is that primary and secondary database servers should have the same major release installed in order to have this replication working. For eg if you have PostgreSQL 9.3.3 installed on your secondary server you can have 9.3.x on your master or vice a versa. However, any major release 9.1.x will not work with 9.3.x.
*Another point to take into cosideration is that whatever data objects you have on your primary server instance, all of that will be replicated to the secondary instance. You cannot have selective tables database or schemas replicate to your secondary database as it applies wal logs to the secondary server. Exceptional objects that are not replicated to your secondary database are the unlogged tables for eg temp tables.
*I assume you have postgreSQL 9.0 or higher installed on both primary and secondary server.
*I am using rsync to copy the wal files over to the standby. Make sure rsync works from the primary to the secondary server without using a password (public private key sharing).
On the Primary server:
1. Open the postgresql.conf file and make the following changes:
wal_level = hot_standby archive_mode = on archive_command = 'rsync %p hostname_or_ipaddress_secondary_server:path_to_the_archive_dir_on_secondary_server/%f' (create this directory on the secondary server) max_wal_senders = 1 #This is the number of secondary servers you want to send the wal files to. If you add another server to replication increment this value to 2. wal_keep_segments = 1200 (this is the number of wal files to keep in case the replication lags behind. Make sure there is enough space in the pg_xlog mount point to accomodate the wal files in the pg_xlog dir. 1200*16/1024=18.75GB)
2. Edit the pg_hba.conf file and grant the secondary server access for replication
host replication postgres ip_address_of_secondary_server/22 trust
3. Restart the PostgreSQL database
4. Make sure that wal files are being pushed to the archive directory on the secondary server (if not check the error in postgreql log file and rectify any permission or file transfer related issues)
5. Now take online-backup of the primary database
$ psql -c "SELECT pg_start_backup('label', true)" $ rsync -av --exclude pg_xlog --exclude postgresql.conf --exclude postgresql.pid path_of data_dir_on_primary/* ip_address_of_secondary_server:path_of data_dir_on_secondary $ psql -c "SELECT pg_stop_backup()"
On the Secondary server:
1. Edit the postgresql.conf file:
hot_standby = on archive_mode = off
2. Create recovery.conf and add following entries:
# Note that recovery.conf must be in $PGDATA directory. standby_mode = 'on' primary_conninfo = 'host=ip_address_of_primary port=5432 user=postgres' trigger_file = '/path_to/trigger.tg' (this is the file that will be created when we want to make the secondary server as primary) restore_command = 'rsync /path_to/archive/%f "%p"'
3. Start the standby database
You can calculate the replication lag by comparing the current WAL write location on the primary with the last WAL location received/replayed by the standby. They can be retrieved using pg_current_xlog_location on the primary and the pg_last_xlog_receive_location/pg_last_xlog_replay_location on the standby, respectively.
$ psql -c "SELECT pg_current_xlog_location()" -h ip_address_of_primary (primary host) pg_current_xlog_location -------------------------- 0/5000000 (1 row) $ psql -c "select pg_last_xlog_receive_location()" -h ip_address_of_secondary (standby host) pg_last_xlog_receive_location ------------------------------- 0/5000000 (1 row) $ psql -c "select pg_last_xlog_replay_location()" -h ip_address_of_secondary (standby host) pg_last_xlog_replay_location ------------------------------ 0/5000000 (1 row)
Streaming replication process can also be checked by using ps -ef
$ ps -ef | grep sender postgres 4950 4925 0 12:32 ? 00:00:00 postgres: wal sender process postgres 127.0.0.1(4925) streaming 0/5000000
$ ps -ef | grep receiver postgres 5955 5941 1 12:33 ? 00:00:01 postgres: wal receiver process streaming 0/5000000
This is probably a slony bug that I will log with the slony community but for now I have figured out a workaround that would keep slony running after the below procedure:
OS: CentOS 6
PostgreSQL version: 9.3.3
Slony version: 2.2.2 (I have seen this issue in earlier versions also since the origin of 2.1.3 or maybe prior)
NOTE: This fix is specifically for the error invalid input for txid_snapshot. It has worked for me numurous times and I thought this might be useful for someone. Please be aware that manually updating slony catalog tables is not recommended and may cause further issues in slony replication. Use the below procedure at your own risk.
ERROR: invalid input for txid_snapshot: "233477269:233478374:233477269,233477846,233477978,233478062,233478210,233478287,233478308,233478314,233478360,233478360,233478371,233478372"
The above error although is not clear as to what the invalid string is in the slony event logs but a little more searching for it through the slony schema suggested that there is a duplication entry for a transaction id in the event log that slony is not happy about. Looking at the above error we see:
The duplicate entry for the transaction id is 233478360,233478360. It looks like it has two entries of the same transaction id in the event log and cannot process the request.
NOTE: we are only looking at the transaction id’s that are listed after the colon entries for the transaction list. For eg the entries below in bold is what we are concerned about.
Procedure to fix the above error:
1. Connect to the master replication node with psql
2. Search the transaction id in the sl_event table to find the ev_seqno like below
testdb=# select * from _slonyschema.sl_event where ev_snapshot::text like '233477269:233478374:233477269,233477846,233477978,233478062,233478210,233478287,233478308,233478314,233478360,233478360,233478371,233478372'; ev_origin | ev_seqno | ev_timestamp | ev_snapshot | ev_type | ev_data1 | ev_data2 | ev_data3 | ev_ data4 | ev_data5 | ev_data6 | ev_data7 | ev_data8 -----------+------------+-------------------------------+-----------------------------------------------------------------------------------------------------+---------+----------+----------+----------+---- ------+----------+----------+----------+---------- 2 | 5002155051 | 2014-03-11 16:27:20.155651+00 | 233477269:233478374:233477269,233477846,233477978,233478062,233478210,233478287,233478308,233478314,233478360,233478360,233478371,233478372 | SYNC | | | | (1 row)
NOTE: make sure you have one row listed in there so that you can update the row by removing the duplicate value of the transaction id. In this case it is 233478360,233478360
3. Update the row in sl_event table
testdb=# UPDATE _slonyschema.sl_event set ev_snapshot='233477269:233478374:233477269,233477846,233477978,233478062,233478210,233478287,233478308,233478314,233478360,233478371,233478372' where ev_origin=2 and ev_seqno=5002155051;
NOTE: I updated the sl_event table and took out the duplicate value from ev_snapshot.
4. Restart slony process, restarting slony will clear out the errors as slony retries with the updated information. There could be more duplicates in the sl_event table that the slony log file may show up and you would need to follow the above procedure for all of them. But in most cases slony starts up fine and replication is back online.
Remember this is a temperory solution(be careful while you make the above change) and updating the wrong row might mess up slony but I guess if you have the above error in replication it just means you have to rebuild it anyway. So might as well give the above procedure a try and let me know if it worked for you.
Ever wondered why all the testing passes ok but when it comes to real production load things start to look bad on the database. Only if we knew how to replicate the database load in real time on a testing environment we would have some data to analyze. I was given the same task and pgreplay came to rescue.
I find this tool very helpul when:
–> I want to test a new version of PostgreSQL database
–> Or even when I need to have some statistics around how the database would behave on a new hardware.
–> Also, if I make a configuration change to the database how would it behave in a real time load scenario.
All of the above scenarios are worth spending time and you can have a clear idea how your database will behave when its live in production.
About the module:
pgreplay reads the SQL statements in a PostgreSQL log file, extracts them and executes them in the same order and relative time against a PostgreSQL database cluster.
Assuming you have a test box with the same data as production and are using postgres as your database.
Download and install pgreplay from the below link(I would install this on a test box and not production)
su - postgres wget http://pgfoundry.org/frs/download.php/3345/pgreplay-1.2.0.tar.gz tar -xvf pgreplay-1.2.0.tar.gz ./configure --with-postgres=/opt/PostgreSQL/9.2.4/bin make sudo make install
pgreplay needs to read a postgres log file and that log has to be in a specific format. The change would need a restart of you postgres database. Make a copy of the postgresql.conf file before you go ahead with this change.
log_destination = 'stderr' logging_collector = on log_directory = '/somemountpoint/pg_log' (this is the location where your postgres log file will be generated. Make sure postgres user has write privs) log_min_duration_statement = 100 (this value is set to milliseconds. If you want to log all statements put 0) log_min_messages = error (or more) (if you know that you have no cancel requests, 'log' will do) log_min_error_statement = log (or more) log_connections = on log_disconnections = on log_line_prefix = '%m|%u|%d|%c|' (if you don't use CSV logging) log_statement = 'all' lc_messages must be set to English (the encoding does not matter) bytea_output = escape (from version 9.0 on, only if you want to replay the log on 8.4 or earlier)
Thats it! restart postgres and logging will start the way pgreplay can parse the statements.
There are two ways to replay the postgres logs to another postgres database
1. pgreplay [<parse options>] [<replay options>] [<infile>]
This will parse a PostgreSQL log file and replays the
statements against a database.
2. pgreplay -f [<parse options>] [-o <outfile>] [<infile>]
This will parse a PostgreSQL log file and writes the
contents to a “replay file” that can be replayed with -r
I like the second option better because then I can just copy the parsed file to any database and do whatever testing is necessary
Now that you have a couple of postgres log files in the log location you specified in postgresql.conf file all you have to do is copy the logs over to a test box that has pgreplay installed and run the below procedure:
I first create a parse file:
pgreplay -f -o test.replay /somemountpoint/pg_log/postgresql.log notice you will find the test.replay log file in the current working directory
Now just replay the test.replay parse file and watch the postgres database replay all the statements from production to a test box:
pgreplay -r -j test.replay
There are a lot more options that you can try out with pgreplay I will not be explaining them all but feel free to ask any questions about the module. More information on pgreplay is available here.
It has been a while since I wrote on my blog. Thought this would be a good addition to the knowledgebase. I came accross this online backup tool that I believe is worth writing about. Incremental backups are taken care of and base backups are compressed and sent across to S3. No more writting shell scripts when it comes to shipping them to S3 anymore which I thought was pretty neat. I did struggle a bit to get it installed on a CentOS box and did not find much online help for it. Hopefully this will help someone get this module up and going in no time.
python (>= 2.6)
And ofcourse we are talking about postgres database here so any postgres version >=8.4 should work with it
First we have to get python26 or greater which for some reason I could not get it from the CentOS repo even after a yum update.
1. Download python version from the below link
wget http://python.org/ftp/python/2.7.6/Python-2.7.6.tar.xz Extract the file and compile it with ./configure make sudo make install NOTE: if you are having trouble xtracting the tar.xz file install xz wget http://tukaani.org/xz/xz-5.0.5.tar.gz ./configure make sudo make install
2. We will need Setuptools as well to have our module compiled
wget https://bitbucket.org/pypa/setuptools/raw/bootstrap/ez_setup.py Then install it for Python 2.7 that you installed above. sudo /usr/local/bin/python2.7 ez_setup.py
NOTE: Here I was set back again because the ez_setup.py script was trying to download setuptools with a certificate check. All I did was added –no-check-certificate in the script where it was doing a wget like:
def download_file_wget(url, target): cmd = ['wget', url, '--no-check-certificate', '--quiet', '--output-document', target] _clean_check(cmd, target)
3. Install pip using the newly installed setuptools:
sudo easy_install-2.7 pip
4. Install virtualenv for Python 2.7
pip2.7 install virtualenv sudo pip install wal-e
Now that we have wal-e installed we are going to make a couple of configuration changes that will enable wal-e to work with S3.
1. Create an environment directory to use wal-e
mkdir -p /etc/wal-e.d/env chown -R postgres:postgres /etc/wal-e.d echo "secret_key_goes_here"> /etc/wal-e.d/env/AWS_SECRET_ACCESS_KEY echo "access_id_for_s3_goes_here"> /etc/wal-e.d/env/AWS_ACCESS_KEY_ID echo 's3://specify_bucket_name/directory_if_you_have_created_on_in_the_bucket'> /etc/wal-e.d/env/WALE_S3_PREFIX
2. Since this is going to be an incremental backup setup we would have to turn archiving on.
wal_level = archive archive_mode = yes archive_command = 'envdir /etc/wal-e.d/env /usr/local/bin/wal-e wal-push %p' archive_timeout = 60
NOTE: you would have to restart your postgres database so that these changes can be read by postgres
Thats it! Now you can start making a base backup and forget about the incremental as wal-e automatically ships those wal files to S3🙂. Reason is the archive command we have setup in the postgresql.conf file.
1. To take a base backup:
su postgres envdir /etc/wal-e.d/env /usr/local/bin/wal-e backup-push /path to your datadir You can always list the backups that you have on S3 by: envdir /etc/wal-e.d/env /usr/bin/wal-e backup-list name last_modified expanded_size_bytes wal_segment_backup_start wal_segment_offset_backup_start wal_segment_backup_stop wal_segment_offset_backup_stop base_00000001000000AD000000C7_00000040 2014-03-06T17:51:26.000Z 00000001000000AD000000C7 00000040
2. Deleting or retaining number of backups is easy as well.
If you want to delete a specific backup wal-e delete [--confirm] base_00000004000002DF000000A6_03626144 Or you can just delete backups older than a base backup by using the before clause: wal-e delete [--confirm] before base_00000004000002DF000000A6_03626144 Retaining number of backups as: wal-e delete [--confirm] retain 5
3. Restoring using backup-fetch
To restore the complete database on a seperate box: envdir /etc/wal-e.d/env wal-e backup-fetch Wal fetch can also be accomplished with wal-e: envdir /etc/wal-e.d/env wal-e wal-fetch
There are a couple of more things that can be done with wal-e like using encryption on backups, managing tablespace backups(this is imp if you have user defined tablespaces in your database) controlling I/O of base backup, increasing throughput of wal-push etc. You might want to check into those options before putting this in production as base back I/O’s can take a decent amount of CPU overhead if not configured properly. Here is the link that will help with further information on this module.
Feel free to ask questions and hope this helped.
I get to do this all the time and I thought it would be a good addition to this blog. My case scenario here is that every time there is a db release going out that modifies a user defined type in my database I get nervous because I need to have a quick rollback plan if the release is called for a rollback. There may be a better way to achieve this feel free to share. It can come in handy if you really need to know what CREATE TYPE syntax is being used.
We do know that just using \dT+ might give some information but it does not give the create statement / complete definition of the type:
\dT+ accounts.billing_info List of data types Schema | Name | Internal name | Size | Elements | Description -----------+------------------------------+--------------------+-------+----------+------------- accounts | accounts.billing_info | billing_info | tuple | | (1 row)
To get around this I use the following query that gives me the complete definition of the user defined type. Replace schema and table name in the below to match your requirement.
SELECT 'CREATE TYPE accounts.billing_info as ('||array_to_string(ARRAY(select a.attname||' '|| pg_catalog.format_type(a.atttypid, a.atttypmod) FROM pg_catalog.pg_attribute a WHERE a.attrelid = (select oid from pg_class where relname='billing_info' and relnamespace=(select oid from pg_namespace where nspname='accounts')) AND a.attnum > 0 AND NOT a.attisdropped ORDER BY a.attnum),',')||');'; ?column? -------------------------------------------------------------------------------------------------------------------------------------------------------------------- CREATE TYPE accounts.billing_info as (billme character varying(20),serial integer,bill_name character varying(100)); (1 row)
And here I have my complete type definition and can use it as required. Hope this helps!!!
I got a very interesting question today from a developer. What is the size of a tuple in a table. Well you kinda pause for a while and think ummm I thought I knew this one. Well postgresql has a contrib module that can help you out with this. Lets take a look at pgstattuple, this contrib module will give you information like table length, number of live tuples in a table, total length of live tuples, number of dead tuples, free space in a table and much more.
All this information is extremely helpful when you want to check the stats on a specific table. Lets go ahead and get this conrib module started in our current database:
Assuming you have source install for postgresql available
$ cd postgresql-8.4.9/contrib/pgstattuple $ make $ make install
Now we need to the pgstattuple functions to be available in the database we are trying to check the tuples on. For that run the sql:
$ psql dbname -f pgstattuple.sql
That’s it we are all set to check the tuples. Connect to your database and call the function as:
test=# SELECT * FROM pgstattuple('test.account_services'); -[ RECORD 1 ]------+--------- table_len | 50683904 tuple_count | 597431 tuple_len | 47914138 tuple_percent | 94.54 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 207320 free_percent | 0.41
I got the approximate/average length of a tuple by tuple_len/tuple_count (not the smartest way to get that info but oh well it worked for me)
You can also use pgstatindex to get information on the indexes.
hdap=# SELECT * FROM pgstatindex('pg_cast_oid_index'); -[ RECORD 1 ]------+------ version | 2 tree_level | 0 index_size | 8192 root_block_no | 1 internal_pages | 0 leaf_pages | 1 empty_pages | 0 deleted_pages | 0 avg_leaf_density | 37.54 leaf_fragmentation | 0