Graphical representation schema

Once in a while we get into the situation where we are given the task to work on a new database. Without knowing the design of the database it is hard to write and explain queries and gather the desired results. There is one opensource tool that can come in very handy before you dive into understanding the database design.

So I figured I will write something on Schemaspy which works very well on a majority of databases. But we will only talk about postgresql for now. Configuration and installation is pretty much the same for all other database types.

Ok so lets get started with the install process. This install is on Ubuntu 12.04 but will be pretty much the same for any other linux base OS.

Install the following dependencies:

1. vik@debian$ sudo apt-get install graphviz
2. vik@debian$ wget http://jdbc.postgresql.org/download/postgresql-9.1-902.jdbc4.jar (place the jar where you can access it with schemaspy)
3. vik@debian$ sudo apt-get install openjdk-7-jdk openjdk-7-jre

Get Schemaspy:

vik@debian$ wget http://sourceforge.net/projects/schemaspy/files/schemaspy/SchemaSpy%205.0.0/schemaSpy_5.0.0.jar/download -O schemaSpy_5.0.0.jar

Lets fire it up:

vik@debian$ java -jar schemaSpy_5.0.0.jar -t pgsql -host localhost:2525 -db test -s testschema -u postgres -p password -dp /home/vik/plugins/postgres/postgresql-9.1-902.jdbc4.jar -o /home/vik/dbgui

What does the above mean?
-t pgsql (selects pgsql as dbtype to connect)
-host (obviously where your postgresql database is use :port#)
-db (database to connect to)
-s (schema name)
-dp (this is the location where the jdbc plugin is)
-o (this is the location where schemaspy will put all the gui representation files in)

If you want your gui representation to work on your browser from any box. Get apache2 and create a symlink like so:

vik@debian$ sudo apt-get install apache2
vik@debian$ cd /var/www
vik@debian$ sudo ln -s /home/vik/dbgui test (test is the dbname you can have your own there)

Thats it access schemaspy:

http://localhost/test

Here is a schema sample output from the schemspy website:

http://schemaspy.sourceforge.net/sample/

,

2 Comments

Maintaining indexes

During the course of database and application maturity there are times when we try hard to improve the performance of the database to meet customer expectations in accordance to the agreed standards. These are the times when we think lets add an index on that old table to make it work faster.

There are a couple of things you should keep in mind when you are looking at adding an index to your table. Ask yourself the question “will this really help in performance”? Consider a scenario where you have a table that is pretty busy with inserts all day. You have a requirement for a report from that table during the night and add an index to speed up that report. Well works perfect for your little report running during off hours. How about the inserts happening during the day on the table that is being used excessively. You will notice there is a fall in performance there. Reason being you added another index and that index also needs to be updated with every insert/update. Increasing the time it takes and reducing the performance.

Make sure you run the below query once a month to get a report on the unused indexes on your database. This can be pretty useful when trying to boost your database’s performance. Also you will find that you have freed up quite some space that was lying around not being used by the database.

I would recommend running this off of crontab and sending the output via email every month.

Unused indexes:

select 

    indexrelid::regclass as index, relid::regclass as table 

from 

    pg_stat_user_indexes 

    JOIN pg_index USING (indexrelid) 

where 

    idx_scan = 0 and indisunique is false;
 

NOTE: don’t just blindly drop the indexes listed in here. Make sure they are useless first. Also, running this is useless after a fresh restore or after calling pg_stat_reset(). If your stats have reset run at least one business cycle on the database before running the sql queries.

Index maintenance also covers removing duplicates. A lot of times I have seen indexes on a table with a unique key and then the same column has a primary key constraint. Primary keys are unique. Run the below SQL and identify the duplicate indexes that can be removed from the database. This will also help in improving performance and reclaiming lost space.

Duplicate Indexes:

SELECT pg_size_pretty(sum(pg_relation_size(idx))::bigint) AS size,
       (array_agg(idx))[1] AS idx1, (array_agg(idx))[2] AS idx2,
       (array_agg(idx))[3] AS idx3, (array_agg(idx))[4] AS idx4
FROM (
    SELECT indexrelid::regclass AS idx, (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'||
                                         coalesce(indexprs::text,'')||E'\n' || coalesce(indpred::text,'')) AS KEY
    FROM pg_index) sub
GROUP BY KEY HAVING count(*)>1
ORDER BY sum(pg_relation_size(idx)) DESC;

If you ever thought about finding the foreign key columns that are missing indexes. Below is something that might help. Again check if you really need to add the index don’t just assume based on the query below:

Missing indexes:

select src_table, dst_table, fk_name, pg_size_pretty(s_size) as s_size, pg_size_pretty(d_size) as d_size, d
    from (
        select
                distinct on (1,2,3,4,5)
                textin(regclassout(c.conrelid))  as src_table,
                textin(regclassout(c.confrelid)) as dst_table,
                c.conname                        as fk_name,
                pg_relation_size(c.conrelid)     as s_size,
                pg_relation_size(c.confrelid)    as d_size,
                array_upper(di.indkey::int[], 1) + 1 - array_upper(c.conkey::int[], 1) as d
        from pg_constraint c
        left join pg_index di on di.indrelid = c.conrelid
                and array_to_string(di.indkey, ' ') ~ ('^' || array_to_string(c.conkey, ' ') || '( |$)')
        join pg_stat_user_tables st on st.relid = c.conrelid
        where c.contype = 'f'
        order by 1,2,3,4,5,6 asc
    ) mfk
    where mfk.d is distinct from 0 and mfk.s_size > 1000000
    order by mfk.s_size desc, mfk.d desc;

, ,

Leave a comment

pg_top

pg_top is an utility just like the UNIX top command but it has much more to offer and I find it very useful to monitor a PostgreSQL database.

It is tedious to run a top and then grep the user postgres and look for the commands running. Then also have another session of pg_stat_activity to figure out what the queries are doing. Instead it is easier to just run pg_top and get all the below with just one command

View currently running SQL statement of a process.

View query plan of a currently running SQL statement.

View locks held by a process.

View user table statistics.

View user index statistics.

Lets first install the tool:

--> wget http://pgfoundry.org/frs/download.php/1780/pg_top-3.6.2.tar.bz2

--> tar -xjvf pg_top-3.6.2.tar.bz2

--> cd pg_top-3.6.2

--> ./configure --prefix='path to where you want it installed'

--> make

--> make install

If you have environment variables configured for the postgres user it should pick up the pgport and binaries

If not you can use pg_top as

pg_top -p 5432 -U postgres -d database -W

Where ‘-d’ specifies the database to connect to, ‘-U’ specifies the database user, and ‘-W’ will prompt for the password if a password is required to connect.

Use the ‘Q’ key and enter a PID once pg_top is started.

To see the Query Plan use the ‘E’ key and enter a PID once pg_top is started. The ‘E’ commands runs ‘EXPLAIN’ and ‘EXPLAIN ANALYZE’ can be used by using the ‘A’ key followed by a PID.

Locks Acquired

Use the ‘L’ key and enter a PID once pg_top is started.

Table Statistics

Use the ‘R’ key to display user table statistics. Use ‘t’ to toggle between displaying cumulative and differential statistics.

Index Statistics

Use the ‘X’ key to display user index statistics. Use ‘t’ to toggle between displaying cumulative and differential statistics.

Update for Postgres 9.2: I tried to use pg_top for postgres 9.2 and ran into an issue where pg_top does not work well with this version of postgres(mainly because some of the columns in pg_stat_activity have been changed eg: procpid changed to pid, current_query changed to query and state has been added). I took the latest version and still no luck so figured I search this on google and found a link worked for me. Here is what I did to make it work:

downloaded the latest source of pgtop http://pgfoundry.org/frs/download.php/1780/pg_top-3.6.2.tar.bz2

untar it and used the following patch before compiling it:

curl -L http://ln-s.net/+6lb|gzip -dc|patch -p1 -d pg_top-3.6.2

then use the nomal configure make and make install

,

13 Comments

Warm_standby

For users that are on previous versions of Postgres 8.x there is an option for warm standby that is available. In case there is an issue with the primary server the standby server can be activated to be used as primary and data loss is prevented. In order to setup warm standby follow the steps below:

Steps to setup warm standby

–> Configure the Standalone Db server
–> Start the Back Up Mode
–> Take The Base Back Up
–> Stop The Back Up Mode
–> Configure the Stand By Db server
–> Start the Start By Server
–> View The Logs
–> Switching to The Stand by server

 

Configure the Stand Alone DB server
1) Edit the postgresql.conf file to activate the archive mode of the server and define
where to archive the transactional logs

postgres@px:~> vi data/postgresql.conf
archive_mode = on
archive_command = 'cp -i %p /d0/data/archive_backups/%f'
archive_timeout = 300s

2) Start the DB server

postgres@px:~> pg_ctl -D
server starting
data start

3) Create some objects

postgres@px:~> psql
Welcome to psql 8.3.4, the PostgreSQL interactive terminal.
[local]:5432:postgres# \l
List of databases
Name
| Owner
| Encoding
-----------+----------+----------
postgres | postgres | UTF8
template0 | postgres | UTF8
template1 | postgres | UTF8
(3 rows)
[local]:5432:postgres# create database newdb ;
CREATE DATABASE
[local]:5432:postgres# \c newdb
You are now connected to database "newdb".
[local]:5432:newdb# CREATE TABLE sample as select * from
pg_class,pg_namespace ;
SELECT
[local]:5432:newdb# \dt
List of relations
Schema | Name | Type | Owner
--------+--------+-------+----------
public | sample | table | postgres
(1 row)
postgres@px:~/data> psql
[local]:5432:postgres# CREATE TABLE classproc
pg_class natural join pg_proc ;
SELECT
as select * from

 

Start the Backup Mode

[local]:5432:postgres# select pg_start_backup('L1') ;
pg_start_backup
-----------------
0/C621394
(1 row)

Take the Base Backup

postgres@px:~> cd basebkp
postgres@px:~/basebkp> cp -r /home/postgres/data .
postgres@px:~/basebkp>
total 12
drwxr-xr-x 3 postgres users
drwxr-xr-x 5 postgres users
drwx------ 11 postgres users
l
4096 2008-12-29 15:28 ./
4096 2008-12-29 15:21 ../
4096 2008-12-29 15:28 data/
postgres@px:~/basebkp> du -sh
369M
data
data
postgres@px:~/basebkp> mv data/ standby
postgres@px:~/basebkp> ls
standby

 

Stop the Backup Mode

[local]:5432:postgres# CREATE TABLE chktable as select *
from pg_tables ;
SELECT
[local]:5432:postgres# CREATE TABLE chk12 as select * from
pg_tables natural join pg_class ;
SELECT
[local]:5432:postgres# select pg_stop_backup() ;
pg_stop_backup
----------------
0/1547E284
(1 row)

 

Configuring the Stand By Server

postgres@px:~/basebkp> vi standby/recovery.conf
postgres@px:~/basebkp> cat basebkp/standby/recovery.conf
restore_command = 'pg_standby -l -d -s 2 -t
/tmp/pgsql.trigger.5432 /d0/data/archive_backups %f %p %r'
postgres@px:~/basebkp> vi standby/postgresql.conf
port = 5433 ##change port
postgres@px:~/basebkp/standby> rm -f postmaster.pid
<strong>Start the Stand By Server</strong>
postgres@px:~/basebkp&gt; pg_ctl -D standby start
server starting
postgres@px:~/basebkp&gt; psql -p 5433 -l
psql: FATAL: the database system is starting up

 

View LOG files of the Stand By Server

postgres@px:~&gt; less ~/basebkp/standby/pg_log/postgresql-
2008-12-29_162601.log
[3263 4958acc1.cbf 1]LOG: database system was interrupted; last
known up at 2008-12-29 15:27:28 IST
[3263 4958acc1.cbf 2]LOG: starting archive recovery
[3263 4958acc1.cbf 3]LOG: restore_command = 'pg_standby -l -d -s
2 -t /tmp/pgsql.trigger.5432 /d0/data/archive_backups %f %p %r'
Trigger file
: /tmp/pgsql.trigger.5432
Waiting for WAL file
: 00000001.history
WAL file path
:
/d0/data/archive_backups/00000001.history
Restoring to...
: pg_xlog/RECOVERYHISTORY
Sleep interval
: 2 seconds
Max wait interval
: 0 forever
Command for restore
: ln -s -f
"/d0/data/archive_backups/00000001.history"
"pg_xlog/RECOVERYHISTORY"
Keep archive history
: 000000000000000000000000 and later
running restore
: OK

 

Stop Stand Alone Server and Use the server as a Live DB server

postgres@px:~&gt; touch /tmp/pgsql.trigger.5432
postgres@px:~/basebkp/standby/pg_log&gt; less postgresql-
2008-12-29_163930.log
WAL file not present yet. Checking for trigger file...
WAL file not present yet. Checking for trigger file...trigger
file found
[12107 4958afea.2f4b 22]LOG: could not open file
"pg_xlog/00000001000000000000001A" (log file 0, segment 26): No
such file or directory
[12107 4958afea.2f4b 23]LOG: redo done at 0/1960BAB8
[12107 4958afea.2f4b 24]LOG: last completed transaction was at
log time 2008-12-29 16:32:41.142195+05:30
Trigger file
: /tmp/pgsql.trigger.5432
Waiting for WAL file
: 000000010000000000000019
WAL file path
:
/d0/data/archive_backups/000000010000000000000019
Restoring to...
: pg_xlog/RECOVERYXLOG

Now lets check if everything is in place and you are all set

,

Leave a comment

Monitoring_PostgreSQL

Lets talk about the most widely used monitoring system for PostgreSQL. There are other tools available, some paid and do a really good job but I would like to use something that is free for the general public and reliable. Nagios!!
Nagios is an open source system of widely used network monitoring, which monitors the equipment (hardware) and services (software) that you specify, alerting them when their behavior is not desired.

Configuring the monitoring server:

Requirements

Web Server (This manual assumes that you use Apache)

Php (On the web interface)
Gcc
Libgd (a graphics library is needed to display the statusmap)
Perl
Mail Server

Installation:

Create a user nagios

useradd-m nagios
passwd nagios

Install nagios

wget <a href="http://sourceforge.net/projects/nagios/files/nagios-3.x/nagios-3.4.1/nagios-3.4.1.tar.gz">http://sourceforge.net/projects/nagios/files/nagios-3.x/nagios-3.4.1/nagios-3.4.1.tar.gz</a>
tar -xzvf nagios-3.4.1.tar.gz
cd nagios-3.4.1
. / Configure - with-command-group = nagios
make all
make install
make install-init
make install-config
make install-commandmode

Configuration:

Edit the file / usr / local / nagios / etc / objects / contacts.cfg
This change is to indicate to email alerts notify each set.
Replace “nagios @ localhost” by the post of administrator

Configure the web interface:

make install-webconf
htpasswd-c / usr / local / nagios / etc / htpasswd.users nagiosadmin
nagios.nagios chown / usr / local / nagios / etc / htpasswd.users
chmod 664 / usr / local / nagios / etc / htpasswd.users
service httpd restart

Install Nagios Plugins:

wget <a href="http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.15/nagios-plugins-1.4.15.tar.gz/download">http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.15/nagios-plugins-1.4.15.tar.gz/download</a>
tar -xzvf nagios-plugins-1.4.15.tar.gz
cd nagios-plugins-1.4.15
. / Configure - with-nagios-user = nagios - with-nagios-group = nagios - without-pgsql
make
make install

Start Nagios:

cd / etc / init.d
chkconfig - add nagios
chkconfig nagios on

To check that all is well
/ Usr / local / nagios / bin / nagios-v / usr / local / nagios / etc / nagios.cfg

Finally:

service nagios start

At this point it is possible to enter Nagios, opening the browser and giving as address: http://localhost/nagios
Enter nagiosadmin user and a key that when you set the web interface.
If I install on the same machine that will be monitored to installation of the plugin for PostgreSQL, otherwise continue with the next section.

Set the machine to be monitored:

Requirements

PostgreSQL (The instructions in this manual were tested using version 8.3.7)
Xinetd
Gcc
Perl

Create a user Nagios
useradd-m nagios
passwd nagios

<strong>Install Nagios Plugins:</strong>
The PostgreSQL plugin then install it manually to make sure to use the latest version.

wget <a href="http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.15/nagios-plugins-1.4.15.tar.gz/download">http://sourceforge.net/projects/nagiosplug/files/nagiosplug/1.4.15/nagios-plugins-1.4.15.tar.gz/download</a>
tar -xzvf nagios-plugins-1.4.15.tar.gz
cd nagios-plugins-1.4.15
. / Configure - with-nagios-user = nagios - with-nagios-group = nagios - without-pgsql
make
make install

nagios.nagios chown-R / usr / local / nagios

Install the module Nagios nrpe:

wget <a href="http://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz">http://sourceforge.net/projects/nagios/files/nrpe-2.x/nrpe-2.13/nrpe-2.13.tar.gz</a>
tar -xzvf nrpe-2.13.tar.gz
cd nrpe-2.13
. / Configure
make all
make install-plugin
make install-daemon
make install-daemon-config
make install-xinetd

Configure the module nrpe:

Edit the file / etc / xinet.d / nrpe
modify the value of field only_from "by the server ip to be monitored
Edit the file / etc / services
add the following line "nrpe 5666/tcp # NRPE"
Configuring the server monitoring services to check on the remote machine

Test the module nrpe:

/ Usr / local / nagios / libexec / server check_nrpe-H
Edit the file / usr / local / nagios / etc / objects / commands.cfg

Add the following lines at end of file
define command (
command_name check_nrpe
command_line $ USER1 $ / HOSTADDRESS check_nrpe-H $ ARG1 $-c $ $
)
We must tell that we will monitor Nagios another machine by editing the file / usr / local / nagios / etc / nagios.cfg Add the following line in the section "OBJECT CONFIGURATION FILE (S)":

cfg_file = / usr / local / nagios / etc / objects / dbhost.cfg

Creating the definition of a file host dbhost.cfg

define host (
use linux-server
dbserver host_name
pgsql alias 8.3
IP.del.servidor.PostgreSQL address
)

service nagios restart

Install PostgreSQL plugin to monitor (on the server to be monitored)

Download the file from check_postgres.tar.gz http://bucardo.org/check_postgres
check_postgres.tar.gz gunzip-dc | tar xvf -
cd check_postgres
check_postgres.pl cp / usr / local / nagios / libexec /.
perl / usr / local / nagios / libexec / check_postgres.pl-symlinks
To install checks must create an entry for each service that we will monitor the file / usr / local / nagios / etc / nrpe.cfg similar to (must be added in the section "COMMAND DEFINITIONS"):

command [check_postgres_locks] = / usr / local / nagios / libexec / check_postgres_locks-w 2-c 3
Where the command can be taken http://bucardo.org/check_postgres/check_postgres.pl.html

Then you should specify the services to monitor, creating entries like the following in / usr / local / nagios / etc / objects / dbhost.cfg in the monitoring server:

define service (
use generic-service
dbserver host_name
PGSQL service_description locks
check_command check_nrpe! check_postgres_locks
)
define service (
use generic-service
dbserver host_name
service_description CPU Load
check_command check_nrpe! check_load
)
define service (
use generic-service
dbserver host_name
Current users service_description
check_command check_nrpe! check_users
)
define service (
use generic-service
dbserver host_name
Total service_description Processes
check_command check_nrpe! check_total_procs
)
define service (
use generic-service
dbserver host_name
service_description Zombie Processes
check_command check_nrpe! check_zombie_procs
)

, ,

Leave a comment

Bloats

Bloats can cause some serious performance impact on queries and you will see server load rise from what it was at an earlier point in time. The reason for bloats in a table/index could be due to various reasons. Updates rollbacks failed transactions etc. What is important here to identify these bloats and remove them before things get too bad.

I have simplified the process of identifying these bloats using the below script. The result set what you will see is:

relation | reltuples | relpages | otta | bloat | wastedpages | wastedbytes | pwastedbytes | relbytes | prelbytes
---------------------------------+------------+------------+------+-------+------------------+-----------------+-------------------+------------+-------------------------------------
account.activation_return_policy | 973111 | 164186 | 5240 | 31.3 | 158946 | 1302085632 | 1242 MB | 1345011712 | 1283 MB
billing.search_bill_info                   | 688832 | 6754 | 3372 | 2.0 | 3382 | 27705344 | 26 MB | 55328768 | 53 MB
(2 rows)

Here you see that there are two tables that have 31.3% and 2.0% bloats respectively. It is a good idea to run pgreorg or cluster command depending on the database availability requirement to remove these bloats.

NOTE: CLUSTER command will lock the table for the time it removes the bloat and is slower than pgreorg. Make a practical choice here by looking at the options you have. If you need more information on pgreorg , look though my post about pgreorg. 

The sql that I have used to get this information is given below. It is a good idea to put the sql in a crontab and probably send out the result set via email so that you are aware of the bloated tables and can work on reindexing /clustering or using pgreorg at a later time.

SELECT
schemaname||'.'||tablename as relation, reltuples::bigint, relpages::bigint, otta,
ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS bloat,
CASE WHEN relpages &lt; otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages,
CASE WHEN relpages &lt; otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,
pg_size_pretty((CASE WHEN relpages &lt; otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END)::bigint) AS pwastedbytes,
bs*relpages::bigint as relbytes,
pg_size_pretty((bs*relpages::bigint)::bigint) as prelbytes
FROM (
SELECT
schemaname, tablename, cc.reltuples, cc.relpages, bs,
CEIL((cc.reltuples*((datahdr+ma-
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta
FROM (
SELECT
ma,bs,schemaname,tablename,
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
FROM (
SELECT
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
hdr+(
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac&lt;&gt;0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
SELECT
(SELECT current_setting('block_size')::numeric) AS bs,
CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename and cc.relkind = 'r'
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname &lt;&gt; 'information_schema'
) AS sml
WHERE sml.relpages - otta &gt; 128
AND ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) &gt; 1.2
AND CASE WHEN relpages &lt; otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END &gt; 1024 * 100
ORDER BY wastedbytes DESC;

, , ,

2 Comments

Table partition with foreign keys

When to partition:

Many a times I have seen people saying oh the table is not so big in size and does not need partitioning. Let me clarify a little bit here. Its never about the size of a table but the number of distinct rows in a table when making a decision on when to partition. The main reason we want to partition a table is to increase the query performance that might be taking a long time.

First things first lets not rush into conclusions and step back and think about the queries itself. Can the query be tuned to perform better? That is the question you should ask yourself and check the explain plan to probably add an index? Increase work_mem? Or just rearrange the joins or even remove some joins that are not necessary(I will write another post about how to tune queries soon). Once you are convinced about the fact that the query cannot be improved further then its time to make a decision to partition the table. The way I do this is to pick up every query that is referencing the table and check the best way moving forward. For eg: if your application queries are running off of id column and you make a partition on the date column. You are in for bigger problems and the performance will degrade. Find the column on which you would want the partition to be based on and then move ahead with the testing and implementation. Personally I would create the partition on a test box and run some serious tests using pgbench. That would give you information if its worth partitioning the table or not.

The process: 

I started working on table partitioning on postgres and thought this might come in handy when we are dealing with foreign keys that are not recognized when dealing with partitions. The standard method of partition will give us information on how to create and manage partitions but when it comes to foreign keys we have to go an extra mile to make sure foreign keys are maintained when implementing partition.

I have an example below with two tables that I want to partition (accounts and billing). Billing table has a foreign key on the billing_id column referencing accounts_id as below:

billing:

Table "public.billing"
Column | Type | Modifiers
--------------+---------+-----------
billing_id | integer | not null
billing_type | boolean |
dt_billing | date |
Indexes:
"billing_pkey" PRIMARY KEY, btree (billing_id)
Foreign-key constraints:
"bill_id_fk" FOREIGN KEY (billing_id) REFERENCES accounts(account_id) ON UPDATE RESTRICT ON DELETE RESTRICT

accounts:

Table "public.accounts"
Column | Type | Modifiers
--------------+------------------------+-----------
account_id | integer | not null
account_name | character varying(100) |
dt_account | date |
Indexes:
"accounts_pkey" PRIMARY KEY, btree (account_id)
Referenced by:
TABLE "billing" CONSTRAINT "bill_id_fk" FOREIGN KEY (billing_id) REFERENCES accounts(account_id) ON UPDATE RESTRICT ON DELETE RESTRICT

Lets start by setting up partition on both of these tables with the date field.

NOTE: I have created a shadow table for accounts to keep track of the foreign keys. The reason for this is that foreign keys can be created from one partitioned table to another table, or to a specific partition, but not to the top level of a partitioned table because the parent table has no data. In many cases you are not going to partition using that foreign key column and then table 2 (billing) can’t see the respective column in table1 (accounts). For partitions to work with foreign keys I simply created a shadow table with the account_id column and reference that table for all the child billing tables. Also, in order to maintain this shadow table I am creating a rule that will update the shadow table when the primary accounts table is updated. You can call it a hack but it works well.  If you don’t care about the foreign keys then just drop them and follow steps without creating shadow table and rule.

-- Creating accounts parent table

CREATE TABLE accounts
(
account_id integer NOT NULL,
account_name character varying(100),
dt_account date,
CONSTRAINT accounts_pkey PRIMARY KEY (account_id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE accounts OWNER TO postgres;

--Creating accounts child tables

CREATE TABLE accounts_2012
(
account_id integer NOT NULL,
account_name character varying(100),
dt_account date,
CONSTRAINT accounts_2012_pkey PRIMARY KEY (account_id ),
CHECK ( EXTRACT(YEAR FROM dt_account) = 2012))
INHERITS (accounts);
ALTER TABLE accounts_2012 OWNER TO postgres;

CREATE TABLE accounts_2011
(
account_id integer NOT NULL,
account_name character varying(100),
dt_account date,
CONSTRAINT accounts_2011_pkey PRIMARY KEY (account_id ),
CHECK ( EXTRACT(YEAR FROM dt_account) = 2011))
INHERITS (accounts);
ALTER TABLE accounts_2011 OWNER TO postgres;

CREATE TABLE accounts_2010
(
account_id integer NOT NULL,
account_name character varying(100),
dt_account date,
CONSTRAINT accounts_2010_pkey PRIMARY KEY (account_id ),
CHECK ( EXTRACT(YEAR FROM dt_account) = 2010))
INHERITS (accounts);
ALTER TABLE accounts_2010 OWNER TO postgres;

--Creating shadow table for primary and foreign keys

CREATE TABLE accounts_shadow
(
account_id integer NOT NULL,
CONSTRAINT accounts_shadow_pkey PRIMARY KEY (account_id )
)
WITH (
OIDS=FALSE
);
ALTER TABLE accounts OWNER TO postgres;

--Creating parent billing table

CREATE TABLE billing
(
billing_id integer NOT NULL,
billing_type boolean,
dt_billing date,
CONSTRAINT billing_pkey PRIMARY KEY (billing_id ),
CONSTRAINT bill_id_fk FOREIGN KEY (billing_id)
REFERENCES accounts_shadow (account_id) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT
)
WITH (
OIDS=FALSE
);
ALTER TABLE billing
OWNER TO postgres;

--Creating child billing tables

CREATE TABLE billing_2012
(
billing_id integer NOT NULL,
billing_type boolean,
dt_billing date,
CONSTRAINT billing_2012_pkey PRIMARY KEY (billing_id ),
CONSTRAINT bill_id_2012_fk FOREIGN KEY (billing_id)
REFERENCES accounts_shadow (account_id) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CHECK ( EXTRACT(YEAR FROM dt_billing) = 2012))
INHERITS (billing);
ALTER TABLE billing OWNER TO postgres;

CREATE TABLE billing_2011
(
billing_id integer NOT NULL,
billing_type boolean,
dt_billing date,
CONSTRAINT billing_2011_pkey PRIMARY KEY (billing_id ),
CONSTRAINT bill_id_2011_fk FOREIGN KEY (billing_id)
REFERENCES accounts_shadow (account_id) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CHECK ( EXTRACT(YEAR FROM dt_billing) = 2011))
INHERITS (billing);
ALTER TABLE billing OWNER TO postgres;

CREATE TABLE billing_2010
(
billing_id integer NOT NULL,
billing_type boolean,
dt_billing date,
CONSTRAINT billing_2010_pkey PRIMARY KEY (billing_id ),
CONSTRAINT bill_id_2010_fk FOREIGN KEY (billing_id)
REFERENCES accounts_shadow (account_id) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT,
CHECK ( EXTRACT(YEAR FROM dt_billing) = 2010))
INHERITS (billing);
ALTER TABLE billing OWNER TO postgres;

After creating the tables its time to create the functions and triggers like so:

--Function for billing table

CREATE OR REPLACE FUNCTION billing_trig() RETURNS TRIGGER
AS
$$
BEGIN
IF TG_OP = 'INSERT' THEN
IF EXTRACT (YEAR FROM NEW.dt_billing) = 2012 THEN
INSERT INTO billing_2012 VALUES(NEW.*);
ELSIF EXTRACT (YEAR FROM NEW.dt_billing) = 2011 THEN
INSERT INTO billing_2011 VALUES(NEW.*);
ELSIF EXTRACT (YEAR FROM NEW.dt_billing) = 2010 THEN
INSERT INTO billing_2010 VALUES(NEW.*);
ELSE
RAISE NOTICE 'Wrong Year in NEW.dt_billing Values%', NEW.dt_billing;
END IF;
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_billing_tg
BEFORE INSERT ON billing
FOR EACH ROW
EXECUTE PROCEDURE billing_trig();

--Function for accounts table

CREATE OR REPLACE FUNCTION accounts_trig() RETURNS TRIGGER
AS
$$
BEGIN
IF TG_OP = 'INSERT' THEN
IF EXTRACT (YEAR FROM NEW.dt_account) = 2012 THEN
INSERT INTO accounts_2012 VALUES(NEW.*);
ELSIF EXTRACT (YEAR FROM NEW.dt_account) = 2011 THEN
INSERT INTO accounts_2011 VALUES(NEW.*);
ELSIF EXTRACT (YEAR FROM NEW.dt_account) = 2010 THEN
INSERT INTO accounts_2010 VALUES(NEW.*);
ELSE
RAISE NOTICE 'Wrong Year in NEW.dt_account Values%', NEW.dt_account;
END IF;
END IF;
RETURN NULL;
END;
$$
LANGUAGE plpgsql;
CREATE TRIGGER insert_account_tg
BEFORE INSERT ON accounts
FOR EACH ROW
EXECUTE PROCEDURE accounts_trig();

Finally lets create a rule on the accounts table to insert new values into the shadow table:

CREATE RULE 'accounts_id_rule' AS ON INSERT
TO accounts
DO INSERT INTO accounts_shadow VALUES
(
NEW.account_id
);

, ,

2 Comments

%d bloggers like this: