Monday, April 2, 2012

Adding new columns in pg_stat_*_tables view for vacuum/analyze count in PostgreSQL 9.1

Upto PostgreSQL 9.0,We need to gather auto-vacuum stats like number of times auto-vacuum,manual vacuum kicked off on each tables resides on Databases from Database server log files. These Information really helpful to tune auto-vacuum settings in cluster level.

Now,PostgreSQL 9.1 has been added new columns I.e vacuum_count,autovacuum_count,autoanalyze_count,analyze_count in pg_stat_*_tables view and these columns will shows the number of times vacuumed and analyzed in each table available in database.

vacuum_count -- Number of times it has been vacuumed manually

Autovacuum_count -- Number of times it has been vacuumed by the autovacuum daemon

Analyze_count -- Number of times it has been analyzed manually

autoanalyze_count -- Number of time it has been analyzed by the autovacuum daemon.

postgres=# select * from pg_stat_all_tables;

relid | 2617
schemaname | pg_catalog
relname | pg_operator
seq_scan | 0
seq_tup_read | 0
idx_scan | 87
idx_tup_fetch | 228
n_tup_ins | 2
n_tup_upd | 2
n_tup_del | 0
n_tup_hot_upd | 2
n_live_tup | 708
n_dead_tup | 0
last_vacuum | 2012-03-07 17:46:00.899226+05:30
last_autovacuum |
last_analyze | 2012-03-07 17:46:00.903273+05:30
last_autoanalyze |
vacuum_count | 5
autovacuum_count | 0
analyze_count | 5
autoanalyze_count | 0

Saturday, March 31, 2012

PGPool-II configuration with Load Balancing Mode in PostgreSQL


PGpool is a connection server for PostgreSQL. PGpool runs between PostgreSQL's clients(front ends) and servers(back ends). A PostgreSQL client can connect to pgpool as if it were a standard PostgreSQL server. pgpool caches the connection to PostgreSQL server to reduce the overhead to establish the connection to it.


About load balancing mode :

If replication is enabled, you can enjoy the load balancing capability of pgpool by enabling load_balance_mode in pgpool.conf. If conditions below are all meet, SELECT queries are distributed among the master and the slave server in random manner, which will boost performance:

1) Protocol version is V3. this means the backend must be PostgreSQL 7.4 or later.

2) The query begins with "SELECT" or "select" (case is ignored). No space is allowed before "SELECT".

3) SELECT query is not in a transaction block.

Please note that a SELECT may modify databases by calling a updatable function. In this case you should NOT use the load balancing. Otherwise pgpool will fail due to the contents difference between the master and the secondary database. You can avoid the load balancing by putting spaces or comments in the beginning of the query.


Advantages of pgpool :

There are some connection pool servers other than pgpool. This section explains why you should use pgpool:

1) You do not need to modify your applications

There are some connection pool servers which require special API(Application Program Interface) to play with them. Since pgpool looks like PostgreSQL server from the client's point of view, existing PostgreSQL applications can be used without any modifications.

2) Any programming languages can be used

Since pgpool is not an API, applications written in any languages including PHP, Perl and Java can be used.

3) Employing prefork architecture

pgpool employing prefork architecture, meaning no need to start up process for each connection from its clients, gives better performance.

4) Resource usage control

pgpool can limit number of connections to PostgreSQL server. Users could avoid too much load of PostgreSQL by using pgpool especially under Web application environment.

5) fail over

pgpool has a functionality so called "fail over". If the first server goes down, pgpool will automatically switch to the secondary server.

6) replication

pgpool can be used as a replication server.

7) load balancing

SELECT statement can be distributed among servers to gain more performance.

Disadvantage of pgpool :

1) overhead

Any access to PostgreSQL must go through pgpool, which means some overhead is added to each database access.

2) Not all libpq protocols are supported

currently following protocols are not supported:

o any authentication methods except "trust", "clear text password",
"pam" (replication or master/slave mode)

o any authentication methods except "trust", "clear text password",
"crypt", "md5", "pam" (non replication mode)

3) No access control to pgpool using pg_hba.conf

Any client can connect to pgpool. If this is your concern, you could limit access by using another software such as iptables.


Installation of Pgpool-II :

A. Download pgpool-II software from below link


B. Install the pgpool-II

# tar -xvf pgpool-II-3.0.4.tar.gz

# cd pgpool-II-3.0.4

# ./configure --prefix=/usr/local/pgsql84 --with-pgsql-libdir=/usr/local/pgsql84/lib --with-pgsql-includedir=/usr/local/pgsql84/include

# make

# make install


C. Verify the pgpool-II binaries on /usr/local/pgsql84/bin

After the installation, make sure pgpool-II binaries are created under the /usr/local/pgsql84/bin directory.

pgpool-II installation directories are:

/usr/local/pgsql84/bin/pgpool pgpool executable
/usr/local/pgsql84/etc/pgpool.conf pgpool-II configuration file
/usr/local/pgsql84/etc/pool_hba.conf pgpool-HBA configuration file


configuration :

A. After installation give the owner ship to all the files installed under /usr/local/pgsql84/bin directory with 'pgsql' user



# chown -R pgsql:pgsql /usr/local/pgsql84/bin
# chown -R pgsql:pgsql /usr/local/pgsql84/etc


B. Copy the pgpool sample configuration files to pgpool configuration files.

$ cd /usr/local/pgsql84/etc

cp pgpool.conf-sample pgpool.conf

cp pool_hba.conf-sample pool_hba.conf

C. Edit the pgpool.conf file and the necessary alteration as mentioned below


[pgsql@bng-params-db /usr/local/pgsql84/etc]$ more pgpool.conf
#
# pgpool-II configuration file sample
# $Header: /cvsroot/pgpool/pgpool-II/pgpool.conf.sample,v 1.39.2.2 2011/02/22 06:36:49 kitagawa Exp $

# Host name or IP address to listen on: '*' for all, '' for no TCP/IP
# connections
listen_addresses = '*'

# Port number for pgpool
port = 9999

# Port number for pgpool communication manager
pcp_port = 9898

# Unix domain socket path. (The Debian package defaults to
# /var/run/postgresql.)
socket_dir = '/tmp'

# Unix domain socket path for pgpool communication manager.
# (Debian package defaults to /var/run/postgresql)
pcp_socket_dir = '/usr/local/pgsql/pgpool'

# Unix domain socket path for the backend. Debian package defaults to /var/run/postgresql!
backend_socket_dir = '/usr/local/pgsql/pgpool'

# pgpool communication manager timeout. 0 means no timeout. This parameter is ignored now.
pcp_timeout = 10

# number of pre-forked child process
num_init_children = 32

# Number of connection pools allowed for a child process
max_pool = 4

# If idle for this many seconds, child exits. 0 means no timeout.
child_life_time = 300

# If idle for this many seconds, connection to PostgreSQL closes.
# 0 means no timeout.
connection_life_time = 0

# If child_max_connections connections were received, child exits.
# 0 means no exit.
child_max_connections = 0

# If client_idle_limit is n (n > 0), the client is forced to be
# disconnected whenever after n seconds idle (even inside an explicit
# transactions!)
# 0 means no disconnect.
client_idle_limit = 0

# Maximum time in seconds to complete client authentication.
# 0 means no timeout.
authentication_timeout = 60

# Logging directory
logdir = '/usr/local/pgsql/pgpool'

# pid file name
pid_file_name = '/usr/local/pgsql/pgpool/pgpool.pid'

# Replication mode
replication_mode = false

# Load balancing mode, i.e., all SELECTs are load balanced.
load_balance_mode = true

# If there's a disagreement with the packet kind sent from backend,
# then degenrate the node which is most likely "minority". If false,
# just force to exit this session.
replication_stop_on_mismatch = false

# If there's a disagreement with the number of affected tuples in
# UPDATE/DELETE, then degenrate the node which is most likely
# "minority".
# If false, just abort the transaction to keep the consistency.
failover_if_affected_tuples_mismatch = false

# If true, replicate SELECT statement when replication_mode or parallel_mode is enabled.
# A priority of replicate_select is higher than load_balance_mode.
replicate_select = false

# Semicolon separated list of queries to be issued at the end of a
# session
reset_query_list = 'ABORT; DISCARD ALL'
# for 8.2 or older this should be as follows.
#reset_query_list = 'ABORT; RESET ALL; SET SESSION AUTHORIZATION DEFAULT'

# white_function_list is a comma separated list of function names
# those do not write to database. Any functions not listed here
# are regarded to write to database and SELECTs including such
# writer-functions will be executed on master(primary) in master/slave
# mode, or executed on all DB nodes in replication mode.
#
# black_function_list is a comma separated list of function names
# those write to database. Any functions not listed here
# are regarded not to write to database and SELECTs including such
# read-only-functions will be executed on any DB nodes.
#
# You cannot make full both white_function_list and
# black_function_list at the same time. If you specify something in
# one of them, you should make empty other.
#
# Pre 3.0 pgpool-II recognizes nextval and setval in hard coded
# way. Following setting will do the same as the previous version.
# white_function_list = ''
# black_function_list = 'nextval,setval'
white_function_list = ''
black_function_list = 'nextval,setval'

# If true print timestamp on each log line.
print_timestamp = true

# If true, operate in master/slave mode.
master_slave_mode =true

# Master/slave sub mode. either 'slony' or 'stream'. Default is 'slony'.
master_slave_sub_mode = 'slony'

# If the standby server delays more than delay_threshold,
# any query goes to the primary only. The unit is in bytes.
# 0 disables the check. Default is 0.
# Note that health_check_period required to be greater than 0
# to enable the functionality.
delay_threshold = 0

# 'always' logs the standby delay whenever health check runs.
# 'if_over_threshold' logs only if the delay exceeds delay_threshold.
# 'none' disables the delay log.
log_standby_delay = 'none'

# If true, cache connection pool.
connection_cache = true

# Health check timeout. 0 means no timeout.
health_check_timeout = 20

# Health check period. 0 means no health check.
health_check_period = 0

# Health check user
health_check_user = 'pgsql'

# Execute command by failover.
# special values: %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %% = '%' character
#
failover_command = '%d%h%p'

# Execute command by failback.
# special values: %d = node id
# %h = host name
# %p = port number
# %D = database cluster path
# %m = new master node id
# %H = hostname of the new master node
# %M = old master node id
# %P = old primary node id
# %% = '%' character
#
failback_command = '%d%h%p'

# If true, trigger fail over when writing to the backend communication
# socket fails. This is the same behavior of pgpool-II 2.2.x or
# earlier. If set to false, pgpool will report an error and disconnect
# the session.
fail_over_on_backend_error = true

# If true, automatically locks a table with INSERT statements to keep
# SERIAL data consistency. If the data does not have SERIAL data
# type, no lock will be issued. An /*INSERT LOCK*/ comment has the
# same effect. A /*NO INSERT LOCK*/ comment disables the effect.
insert_lock = true

# If true, ignore leading white spaces of each query while pgpool judges
# whether the query is a SELECT so that it can be load balanced. This
# is useful for certain APIs such as DBI/DBD which is known to adding an
# extra leading white space.
ignore_leading_white_space = true

# If true, print all statements to the log. Like the log_statement option
# to PostgreSQL, this allows for observing queries without engaging in full
# debugging.
log_statement =true

# If true, print all statements to the log. Similar to log_statement except
# that prints DB node id and backend process id info.
log_per_node_statement =true

# If true, incoming connections will be printed to the log.
log_connections =true

# If true, hostname will be shown in ps status. Also shown in
# connection log if log_connections = true.
# Be warned that this feature will add overhead to look up hostname.
log_hostname =true

# if non 0, run in parallel query mode
parallel_mode = false

# if non 0, use query cache
enable_query_cache = false

#set pgpool2 hostname
pgpool2_hostname = ''

# system DB info
system_db_hostname = 'localhost'
system_db_port = 5432
system_db_dbname = 'pgpool'
system_db_schema = 'pgpool_catalog'
system_db_user = 'pgsql'
system_db_password = ''

# backend_hostname, backend_port, backend_weight
# here are examples
backend_hostname0 = '192.168.186.51'
backend_port0 = 5432
backend_weight0 = 0
backend_data_directory0 = '/export/raid/pgsql/data84'
backend_hostname1 = '10.209.152.225'
backend_port1 = 5432
backend_weight1 = 1
backend_data_directory1 = '/usr/local/pgsql/data84'

# - HBA -

# If true, use pool_hba.conf for client authentication.
enable_pool_hba =true

# - online recovery -
# online recovery user
recovery_user = 'pgsql'

# online recovery password
recovery_password = ''

# execute a command in first stage.
recovery_1st_stage_command = ''

# execute a command in second stage.
recovery_2nd_stage_command = ''

# maximum time in seconds to wait for the recovering node's postmaster
# start-up. 0 means no wait.
# this is also used as a timer waiting for clients disconnected before
# starting 2nd stage
recovery_timeout = 90

# If client_idle_limit_in_recovery is n (n > 0), the client is forced
# to be disconnected whenever after n seconds idle (even inside an
# explicit transactions!) in the second stage of online recovery.
# n = -1 forces clients to be disconnected immediately.
# 0 disables this functionality(wait forever).
# This parameter only takes effect in recovery 2nd stage.
client_idle_limit_in_recovery = 0

# Specify table name to lock. This is used when rewriting lo_creat
# command in replication mode. The table must exist and has writable
# permission to public. If the table name is '', no rewriting occurs.
lobj_lock_table = ''

# If true, enable SSL support for both frontend and backend connections.
# note that you must also set ssl_key and ssl_cert for SSL to work in
# the frontend connections.
ssl = false
# path to the SSL private key file
#ssl_key = './server.key'
# path to the SSL public certificate file
#ssl_cert = './server.cert'

# If either ssl_ca_cert or ssl_ca_cert_dir is set, then certificate
# verification will be performed to establish the authenticity of the
# certificate. If neither is set to a nonempty string then no such
# verification takes place. ssl_ca_cert should be a path to a single
# PEM format file containing CA root certificate(s), whereas ssl_ca_cert_dir
# should be a directory containing such files. These are analagous to the
# -CAfile and -CApath options to openssl verify(1), respectively.
#ssl_ca_cert = ''
#ssl_ca_cert_dir = ''

# Debug message verbosity level. 0: no message, 1 <= : more verbose
debug_level =1



D. Edit the pool-hba.conf file

# Put your actual configuration here
# ----------------------------------
#
# If you want to allow non-local connections, you need to add more
# "host" records. In that case you will also need to make pgpool listen
# on a non-local interface via the listen_addresses configuration parameter.
#

# TYPE DATABASE USER CIDR-ADDRESS METHOD

# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
hostnossl all all 192.168.186.51/32 trust
hostnossl all all 10.209.152.225/32 trust
hostnossl all all 0.0.0.0/0 trust
host all all 127.0.0.1/32 trust


E. Create the pgpool database/pgpool user and pgpool_catalog schema owned by pgpool in slave server-side.

$ createuser -p 5432 pgpool

$ createdb -p 5432 -O pgpool pgpool

$ ./psql -p 5432 -d pgpool -U pgpool
pgpool=#create schema pgpool_catalog owner with pgpool;

Now execute the pgpool catalog script.

pgpool=# \i $PGPATH/share/pgpool-II/system_db.sql

This will create a new catalog called “pgpool_catalog” on “pgpool” database.


Configure Management Access To PGPOOL-II

You can manage PGPOOL-II from the command line and to do that you need to setup a username & password. In the install directory/etc there is a pcp.sample.conf file. You need to add a username & password to this file. The username can be anything you want, the password has to be an md5 hash which you can create by the supplied command pg_md5. Here’s how:

cd /usr/local/pgsql84/etc

cp pcp.conf.sample pcp.conf

/usr/local/pgsql84/bin/pg_md5 -p
password: <enter your password>

vi pcp.conf and add your username & password like this:

admin:e8a48653851e28c69d0506508fb27fc5
Save the file.

Start the pgpool-II:

nohup /usr/local/pgsql84/bin/pgpool -f /usr/local/pgsql84/etc/pgpool.conf -a /usr/local/pgsql84/etc/pool_hba.conf -n -d >> /usr/local/pgsql/pgpool/pgpool.log 2>>/usr/local/pgsql/pgpool/pgpool.log &

-f path

the path to the configuration file.

-a path

the path to the pool_hba configuration file.

-n

do not start as daemon. Error messages go to stdout or stderr. Thus you could play with utilities such as logger and rotatelogs. You need to run in background explicitly if you use this option.

-d

lots of debugging messages come out

Stopping pgpool:

You can stop pgpool by using "stop" option:

$ pgpool [-f config_file] -m {s[mart]|f[ast]|i[mmediate]} stop

If there's any live connection from frontend, it will wait until the connection terminated.

To force it to be stopped, try:

$ pgpool -m f[ast] stop

or

$ pgpool -m i[mmediate] stop


Getting internal status of pgpool:

You could use psql or whatever to obtain the internal status of pgpool by issuing a special SQL command:

=# psql -p 9999 -c 'show pool_status' pgpool

::PostgreSQL 9.1 :: Reseting statistics in cluster level


Currently we don't have such feature to rest statistics on cluster level. You have to use two PostgreSQL functions like pg_stat_reset_shared(text), pg_stat_reset ().

There are a few statistics that are only kept on a per-cluster basis, such as those reported in pg_stat_* and pg_stat_bgwriter. Since pg_stat_reset only resets per-database statistics.



psql=# select pg_stat_reset(); -- Reset all statistics counters for the current database to zero (requires superuser privileges)

psql=# select pg_stat_reset_shared('bgwritter'); -- It will reset the statistics shown by pg_stat_bgwritter.

In PostgreSQL 9.1,You can now know when stats have been reset last. For a database, for instance:

psql=# SELECT datname, stats_reset FROM pg_stat_database;

datname | stats_reset
-----------+-------------------------------
template1 |
template0 |
postgres | 2012-03-29 19:22:05.946641+02
test | 2012-03-29 19:22:09.133483+02