Posts Tagged maintenance

Maintaining indexes

During the course of database and application maturity there are times when we try hard to improve the performance of the database to meet customer expectations in accordance to the agreed standards. These are the times when we think lets add an index on that old table to make it work faster.

There are a couple of things you should keep in mind when you are looking at adding an index to your table. Ask yourself the question “will this really help in performance”? Consider a scenario where you have a table that is pretty busy with inserts all day. You have a requirement for a report from that table during the night and add an index to speed up that report. Well works perfect for your little report running during off hours. How about the inserts happening during the day on the table that is being used excessively. You will notice there is a fall in performance there. Reason being you added another index and that index also needs to be updated with every insert/update. Increasing the time it takes and reducing the performance.

Make sure you run the below query once a month to get a report on the unused indexes on your database. This can be pretty useful when trying to boost your database’s performance. Also you will find that you have freed up quite some space that was lying around not being used by the database.

I would recommend running this off of crontab and sending the output via email every month.

Unused indexes:


    indexrelid::regclass as index, relid::regclass as table 



    JOIN pg_index USING (indexrelid) 


    idx_scan = 0 and indisunique is false;

NOTE: don’t just blindly drop the indexes listed in here. Make sure they are useless first. Also, running this is useless after a fresh restore or after calling pg_stat_reset(). If your stats have reset run at least one business cycle on the database before running the sql queries.

Index maintenance also covers removing duplicates. A lot of times I have seen indexes on a table with a unique key and then the same column has a primary key constraint. Primary keys are unique. Run the below SQL and identify the duplicate indexes that can be removed from the database. This will also help in improving performance and reclaiming lost space.

Duplicate Indexes:

SELECT pg_size_pretty(sum(pg_relation_size(idx))::bigint) AS size,
       (array_agg(idx))[1] AS idx1, (array_agg(idx))[2] AS idx2,
       (array_agg(idx))[3] AS idx3, (array_agg(idx))[4] AS idx4
    SELECT indexrelid::regclass AS idx, (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'||
                                         coalesce(indexprs::text,'')||E'\n' || coalesce(indpred::text,'')) AS KEY
    FROM pg_index) sub
ORDER BY sum(pg_relation_size(idx)) DESC;

If you ever thought about finding the foreign key columns that are missing indexes. Below is something that might help. Again check if you really need to add the index don’t just assume based on the query below:

Missing indexes:

select src_table, dst_table, fk_name, pg_size_pretty(s_size) as s_size, pg_size_pretty(d_size) as d_size, d
    from (
                distinct on (1,2,3,4,5)
                textin(regclassout(c.conrelid))  as src_table,
                textin(regclassout(c.confrelid)) as dst_table,
                c.conname                        as fk_name,
                pg_relation_size(c.conrelid)     as s_size,
                pg_relation_size(c.confrelid)    as d_size,
                array_upper(di.indkey::int[], 1) + 1 - array_upper(c.conkey::int[], 1) as d
        from pg_constraint c
        left join pg_index di on di.indrelid = c.conrelid
                and array_to_string(di.indkey, ' ') ~ ('^' || array_to_string(c.conkey, ' ') || '( |$)')
        join pg_stat_user_tables st on st.relid = c.conrelid
        where c.contype = 'f'
        order by 1,2,3,4,5,6 asc
    ) mfk
    where mfk.d is distinct from 0 and mfk.s_size > 1000000
    order by mfk.s_size desc, mfk.d desc;

, ,

Leave a comment


Bloats can cause some serious performance impact on queries and you will see server load rise from what it was at an earlier point in time. The reason for bloats in a table/index could be due to various reasons. Updates rollbacks failed transactions etc. What is important here to identify these bloats and remove them before things get too bad.

I have simplified the process of identifying these bloats using the below script. The result set what you will see is:

relation | reltuples | relpages | otta | bloat | wastedpages | wastedbytes | pwastedbytes | relbytes | prelbytes
account.activation_return_policy | 973111 | 164186 | 5240 | 31.3 | 158946 | 1302085632 | 1242 MB | 1345011712 | 1283 MB
billing.search_bill_info                   | 688832 | 6754 | 3372 | 2.0 | 3382 | 27705344 | 26 MB | 55328768 | 53 MB
(2 rows)

Here you see that there are two tables that have 31.3% and 2.0% bloats respectively. It is a good idea to run pgreorg or cluster command depending on the database availability requirement to remove these bloats.

NOTE: CLUSTER command will lock the table for the time it removes the bloat and is slower than pgreorg. Make a practical choice here by looking at the options you have. If you need more information on pgreorg , look though my post about pgreorg. 

The sql that I have used to get this information is given below. It is a good idea to put the sql in a crontab and probably send out the result set via email so that you are aware of the bloated tables and can work on reindexing /clustering or using pgreorg at a later time.

schemaname||'.'||tablename as relation, reltuples::bigint, relpages::bigint, otta,
ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) AS bloat,
CASE WHEN relpages < otta THEN 0 ELSE relpages::bigint - otta END AS wastedpages,
CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END AS wastedbytes,
pg_size_pretty((CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END)::bigint) AS pwastedbytes,
bs*relpages::bigint as relbytes,
pg_size_pretty((bs*relpages::bigint)::bigint) as prelbytes
schemaname, tablename, cc.reltuples, cc.relpages, bs,
(CASE WHEN datahdr%ma=0 THEN ma ELSE datahdr%ma END))+nullhdr2+4))/(bs-20::float)) AS otta
(datawidth+(hdr+ma-(case when hdr%ma=0 THEN ma ELSE hdr%ma END)))::numeric AS datahdr,
(maxfracsum*(nullhdr+ma-(case when nullhdr%ma=0 THEN ma ELSE nullhdr%ma END))) AS nullhdr2
schemaname, tablename, hdr, ma, bs,
SUM((1-null_frac)*avg_width) AS datawidth,
MAX(null_frac) AS maxfracsum,
SELECT 1+count(*)/8
FROM pg_stats s2
WHERE null_frac<>0 AND s2.schemaname = s.schemaname AND s2.tablename = s.tablename
) AS nullhdr
FROM pg_stats s, (
(SELECT current_setting('block_size')::numeric) AS bs,
CASE WHEN substring(v,12,3) IN ('8.0','8.1','8.2') THEN 27 ELSE 23 END AS hdr,
CASE WHEN v ~ 'mingw32' THEN 8 ELSE 4 END AS ma
FROM (SELECT version() AS v) AS foo
) AS constants
GROUP BY 1,2,3,4,5
) AS foo
) AS rs
JOIN pg_class cc ON cc.relname = rs.tablename and cc.relkind = 'r'
JOIN pg_namespace nn ON cc.relnamespace = nn.oid AND nn.nspname = rs.schemaname AND nn.nspname <> 'information_schema'
) AS sml
WHERE sml.relpages - otta > 128
AND ROUND(CASE WHEN otta=0 THEN 0.0 ELSE sml.relpages/otta::numeric END,1) > 1.2
AND CASE WHEN relpages < otta THEN 0 ELSE bs*(sml.relpages-otta)::bigint END > 1024 * 100
ORDER BY wastedbytes DESC;

, , ,


Maintaining system tables

This is probably something you learn as you work more on PostgreSQL server rather than in any user guide out there.

System catalog tables in Postgres maintain information on every stats available for Postgres to device a plan for execution. These tables also need maintenance from time to time as the indexes too get bloats and performance degrades. We often run vacuum and reindex on all our user tables but it is important to keep the system tables in good shape too.

Below is a script that can be put in crontab and is advisable to run it every other month (takes only a couple of seconds to process, still run it during off production hours).

for db in `psql -l -t|awk '{print $1}'|grep -v -e ":" -e "template0"|sed '/^$/d'`; do reindexdb -s -d $db; done

, , ,

Leave a comment


In any OLTP 24*7 database there is a constant need to maintain the database to meet performance metrics. As the database and tables grow in size it gets difficult to maintain the tables while production is on. As DBA’s we try to script vacuum analyze on tables during off peak hours and that does update the pg_stats to help in performance. But how do we get rid of all the bloat due to updates and rollbacks? Also how do we rebuild our indexes in such a scenario when downtime is not possible at all?

Well the answer is pg_reorg. pg_reorg re-organizes tables on a postgres database without any locks so that you can retrieve or update rows in tables being reorganized. It places the data of a table in a temp table and then recreates the indexes on the temp table. Thus, leaving the primary production table unlocked for production use. This is a very useful tool that reorganizes the tables in an instant without impacting production. Make sure you have twice the space available for the table that you are performing a pg_reorg on.

pg_reorg can be download from here. They maintain rpm and source installers.

Once you have pg_reorg downloaded follow the steps below to install it:

$ cd pg_reorg
$ make
$ su
$ make install

Next you need to run the following pg_reorg script to register it with your database

$ psql -f $PGSHARE/contrib/pg_reorg.sql -d your_database

Now you are all set and let the games begin:

$ pg_reorg --no-order --table test1 -d mydb

This will reorg the test1 table in the database mydb

, , ,


%d bloggers like this: