Archive for June, 2012
I got a very interesting question today from a developer. What is the size of a tuple in a table. Well you kinda pause for a while and think ummm I thought I knew this one. Well postgresql has a contrib module that can help you out with this. Lets take a look at pgstattuple, this contrib module will give you information like table length, number of live tuples in a table, total length of live tuples, number of dead tuples, free space in a table and much more.
All this information is extremely helpful when you want to check the stats on a specific table. Lets go ahead and get this conrib module started in our current database:
Assuming you have source install for postgresql available
$ cd postgresql-8.4.9/contrib/pgstattuple $ make $ make install
Now we need to the pgstattuple functions to be available in the database we are trying to check the tuples on. For that run the sql:
$ psql dbname -f pgstattuple.sql
That’s it we are all set to check the tuples. Connect to your database and call the function as:
test=# SELECT * FROM pgstattuple('test.account_services'); -[ RECORD 1 ]------+--------- table_len | 50683904 tuple_count | 597431 tuple_len | 47914138 tuple_percent | 94.54 dead_tuple_count | 0 dead_tuple_len | 0 dead_tuple_percent | 0 free_space | 207320 free_percent | 0.41
I got the approximate/average length of a tuple by tuple_len/tuple_count (not the smartest way to get that info but oh well it worked for me)
You can also use pgstatindex to get information on the indexes.
hdap=# SELECT * FROM pgstatindex('pg_cast_oid_index'); -[ RECORD 1 ]------+------ version | 2 tree_level | 0 index_size | 8192 root_block_no | 1 internal_pages | 0 leaf_pages | 1 empty_pages | 0 deleted_pages | 0 avg_leaf_density | 37.54 leaf_fragmentation | 0
Once in a while we get into the situation where we are given the task to work on a new database. Without knowing the design of the database it is hard to write and explain queries and gather the desired results. There is one opensource tool that can come in very handy before you dive into understanding the database design.
So I figured I will write something on Schemaspy which works very well on a majority of databases. But we will only talk about postgresql for now. Configuration and installation is pretty much the same for all other database types.
Ok so lets get started with the install process. This install is on Ubuntu 12.04 but will be pretty much the same for any other linux base OS.
Install the following dependencies:
1. vik@debian$ sudo apt-get install graphviz 2. vik@debian$ wget http://jdbc.postgresql.org/download/postgresql-9.1-902.jdbc4.jar (place the jar where you can access it with schemaspy) 3. vik@debian$ sudo apt-get install openjdk-7-jdk openjdk-7-jre
vik@debian$ wget http://sourceforge.net/projects/schemaspy/files/schemaspy/SchemaSpy%205.0.0/schemaSpy_5.0.0.jar/download -O schemaSpy_5.0.0.jar
Lets fire it up:
vik@debian$ java -jar schemaSpy_5.0.0.jar -t pgsql -host localhost:2525 -db test -s testschema -u postgres -p password -dp /home/vik/plugins/postgres/postgresql-9.1-902.jdbc4.jar -o /home/vik/dbgui What does the above mean? -t pgsql (selects pgsql as dbtype to connect) -host (obviously where your postgresql database is use :port#) -db (database to connect to) -s (schema name) -dp (this is the location where the jdbc plugin is) -o (this is the location where schemaspy will put all the gui representation files in)
If you want your gui representation to work on your browser from any box. Get apache2 and create a symlink like so:
vik@debian$ sudo apt-get install apache2 vik@debian$ cd /var/www vik@debian$ sudo ln -s /home/vik/dbgui test (test is the dbname you can have your own there)
Thats it access schemaspy:
Here is a schema sample output from the schemspy website:
During the course of database and application maturity there are times when we try hard to improve the performance of the database to meet customer expectations in accordance to the agreed standards. These are the times when we think lets add an index on that old table to make it work faster.
There are a couple of things you should keep in mind when you are looking at adding an index to your table. Ask yourself the question “will this really help in performance”? Consider a scenario where you have a table that is pretty busy with inserts all day. You have a requirement for a report from that table during the night and add an index to speed up that report. Well works perfect for your little report running during off hours. How about the inserts happening during the day on the table that is being used excessively. You will notice there is a fall in performance there. Reason being you added another index and that index also needs to be updated with every insert/update. Increasing the time it takes and reducing the performance.
Make sure you run the below query once a month to get a report on the unused indexes on your database. This can be pretty useful when trying to boost your database’s performance. Also you will find that you have freed up quite some space that was lying around not being used by the database.
I would recommend running this off of crontab and sending the output via email every month.
select indexrelid::regclass as index, relid::regclass as table from pg_stat_user_indexes JOIN pg_index USING (indexrelid) where idx_scan = 0 and indisunique is false;
NOTE: don’t just blindly drop the indexes listed in here. Make sure they are useless first. Also, running this is useless after a fresh restore or after calling pg_stat_reset(). If your stats have reset run at least one business cycle on the database before running the sql queries.
Index maintenance also covers removing duplicates. A lot of times I have seen indexes on a table with a unique key and then the same column has a primary key constraint. Primary keys are unique. Run the below SQL and identify the duplicate indexes that can be removed from the database. This will also help in improving performance and reclaiming lost space.
SELECT pg_size_pretty(sum(pg_relation_size(idx))::bigint) AS size, (array_agg(idx)) AS idx1, (array_agg(idx)) AS idx2, (array_agg(idx)) AS idx3, (array_agg(idx)) AS idx4 FROM ( SELECT indexrelid::regclass AS idx, (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'|| coalesce(indexprs::text,'')||E'\n' || coalesce(indpred::text,'')) AS KEY FROM pg_index) sub GROUP BY KEY HAVING count(*)>1 ORDER BY sum(pg_relation_size(idx)) DESC;
If you ever thought about finding the foreign key columns that are missing indexes. Below is something that might help. Again check if you really need to add the index don’t just assume based on the query below:
select src_table, dst_table, fk_name, pg_size_pretty(s_size) as s_size, pg_size_pretty(d_size) as d_size, d from ( select distinct on (1,2,3,4,5) textin(regclassout(c.conrelid)) as src_table, textin(regclassout(c.confrelid)) as dst_table, c.conname as fk_name, pg_relation_size(c.conrelid) as s_size, pg_relation_size(c.confrelid) as d_size, array_upper(di.indkey::int, 1) + 1 - array_upper(c.conkey::int, 1) as d from pg_constraint c left join pg_index di on di.indrelid = c.conrelid and array_to_string(di.indkey, ' ') ~ ('^' || array_to_string(c.conkey, ' ') || '( |$)') join pg_stat_user_tables st on st.relid = c.conrelid where c.contype = 'f' order by 1,2,3,4,5,6 asc ) mfk where mfk.d is distinct from 0 and mfk.s_size > 1000000 order by mfk.s_size desc, mfk.d desc;