Bric::DBA - Bricolage Database Administration Guide.
$Revision: 1.9.2.5 $
$Date: 2003/09/29 17:00:39 $
This guide is intended for the database administrator in charge of maintaining the Bricolage PostgreSQL database.
By default, PostgreSQL has pretty lax authentication settings, at least locally. Client authentication is controlled by the file pg_hba.conf in the $PGDATA directory, e.g., /usr/local/pgsql/data/pg_hba.conf. The default settings allow completely trusted connections to the server by any local user using either Unix domain sockets or TCP/IP connections (although by default, PostgreSQL does not bind to a TCP socket). These settings are as follows:
local all trust
host all 127.0.0.1 255.255.255.255 trust
These settings allow anyone connecting via local Unix domain sockets to connect to all databases in a completely trusted manor. They won't even be prompted for a password! Similarly, hosts connecting from the local IP address, 127.0.0.1 with Mask 255.255.255.255, are completely trusted to connect to all databases.
While these settings are fine for limited local use (provided you trust all users on the local system!), they are not good for a production environment. We recommend that clients be required to provide a password -- and preferably that it be encrypted. If the Bricolage web server runs on the local box and no other box will need to connect, we recommend the following settings:
local all md5
# host all 127.0.0.1 255.255.255.255 md5
These are similar to the defaults, except that they require a password, but the password is sent over the wire encrypted using a simple challenge-response protocol. We have commented out the host entry because Bric is running locally and can therefore rely solely on Unix domain sockets. (In this case, you may also wish to remove the -i switch from the PostgreSQL startup script.) Enable it if you find a need to connect locally via TCP/IP.
If you're running PostgreSQL on its own box, and Bricolage (or any subset of the Bricolage application, such as the distribution server) runs on another box, you will need to enable secure connections to that box. To limit the connections to come from that single box, use this configuration:
local all md5
host all 127.0.0.1 255.255.255.255 md5
host bric_dbname 192.168.1.2 255.255.255.255 md5
Where the IP address 192.168.1.2 is the IP address of the box running Bricolage, and 255.255.255.255 is its mask. If you feel comfortable allowing more broad access to the server -- say you have a number of database applications running on multiple servers in the same subnet to connect -- try the following configuration:
local all md5
host all 127.0.0.1 255.255.255.255 md5
host bric_dbname 192.168.1.0 255.255.255.255 md5
Here the 192.168.1.0 IP address represents the 192.168.1 subnet on which the multiple clients live.
Many other combinations of authentication settings are of course possible via the flexible pg_hba.con configuration syntax, but the above are our recommended settings for ensuring the security and integrity of your data. Refer to the PostgreSQL documentation at http://www.postgresql.org/ for more details, including other levels of encryption.
PostgreSQL supports backups and restores via its pg_dump, pg_dumpall, and psql utilities. File system backups are also a good idea. We recommend that you use the pg_dumpall utility on a regular basis to backup the Bricolage database:
pg_dumpall > outfile
This program outputs a series of SQL statements that can then be reloaded into the database using the psql utility:
psql < infile
We recommend the use of pg_dumpall in order to preserve the the entire database system. Read the pg_dumpall man page for more details. However, if you use your PostgreSQL server for other databases, you may wish to place them on separate backup schedules. In that case, use pg_dump to backup each of the databases, including the Bricolage database:
pg_dump bric_dbname > outfile
The restoration is the same as for the pg_dumpall file, except that you must be sure to create the database and users, first.
psql bric_dbname < infile
Read the pg_dump man page for more details. Note that neither pg_dump nor pg_dumpall prevent database access by other processes (e.g., Bricolage), but they will affect performance. It's a good idea to dump the file to a different partition or disk or even server (since pg_dump and pg_dumpall can be used on any server with access to the PostgreSQL server). Changes made to the database during the process of the backup will not be backed up.
In either case, if your Bricolage database is getting big, you may wish to compress the backupfile, or break it into smaller chunks. This can be done using *nix system utilities. For example, to compress a backupfile, use gzip:
pg_dumpall | gzip > filename.gz
gunzip -c filename.gz | psql
You can also use split to break the backup file into smaller chunks. This example breaks the file up into 1 MB chunks:
pg_dumpall | split -b 1m - filename
cat filename.* | psql
New to PostgreSQL is the custom dump format. You can use it to compress dumpfiles on the fly. Consult the pg_dump and pg_dumpall man pages for more information.
File system backups are another option, and may in fact be useful as a backup backup methodology. Use whatever methodology for file system backups that you prefer. Here is an example using tar:
tar -cf backup.tar /usr/local/pgsql/data
There are two restrictions to file system backups, however. First, the PostgreSQL server must be shut down during the backup or during restoration. Second, you can't do selective backups of databases or tables, only the whole database cluster.
For more detail on PostgreSQL backup and restore, please consult the PostgreSQL documentation at http://www.postgresql.org/.
Write Ahead Logging (WAL) is a standard approach to transaction logging. It is automatically enabled in the PostgreSQL server. WAL ensures that the log is written before database records are altered, thereby minimizing the number of disk writes; and in the event of a crash, the database can be recovered using using the log.
Please consult the PostgreSQL documentation for more information on configuring WAL and recovering databases with WAL.
VACUUM is a PostgreSQL SQL command that cleans and analyzes a Postgres database. Its purpose is to reclaim storage and to collect information for the optimizer. PostgreSQL uses a non-overwriting storage manager, which means that DELETE commands don't actually remove data, they just mark it invalid; similarly, UPDATE commands and rolled-back transactions will create more "old versions" of rows. These should be cleaned up on a periodic basis by running VACUUM. Running VACUUM ANALYZE also collects statistics representing the dispersion of the data in each column. The statistics maintained include the number of tuples and number of pages stored in all transactions. This information is valuable when several query execution paths are possible. Running VACUUM ANALYZE periodically will increase the speed of the database in processing user queries.
We recommend that you run VACUUM ANALYZE on your Bricolage database nightly, in order to remove expired rows and keep the database running efficiently.
There are two ways to VACUUM ANALYZE a database. The first is through an interactive shell such as psql. We suggest, however, that you instead schedule a nightly cron job to do the task, and use the vacuumdb utility. Here's an example:
set PGPASSWORD=password; vacuumdb -z -d bric_dbname -U username
If the vacuum is successful, vacuumdb will output "VACUUM". In the event of a failure, it will output vacuumdb: Vacuum failed.
Consult the PostgreSQL documentation for more information.
PostgreSQL requires a certain amount of tuning to operate at optimum efficiency. The default settings for a new PostgreSQL installation are very conservative, and are intended to make sure that the postmaster will be able to start up so that you can get started setting up your databases. It is not, however, a very useful setup for a production database environment, since it severely limits the amount of shared memory that the postmaster can allocate for itself.
To tune PostgreSQL, first determine the amount of memory you want to give to your running PostgreSQL server. In a production environment where PostgreSQL is running on a dedicated machine, this will be the entire memory minus the amount needed to run kernel services. In other situations it will pay to observe the machine while running Bricolage and whatever other services are necessary, and make a guesstimate of the amount of memory you are willing to give to the database server.
After you have decided how much memory to allocate, there are a couple of settings to make. First is the shared memory setting, which in PostgreSQL can be set via the shared_buffers setting in the postgresql.conf file in your PostgreSQL data directory. This setting controls the number of shared buffers that the database server can use. Each buffer is 8Kb. It's important to remember that this is only a holding area, and not the total memory available to the server. As such, resist the urge to set this number to a large portion of your RAM, as doing so will actually degrade performance on many OSs. Members of the pgsql-performance mailing list have found useful values in the range of 1000-6000, depending on available RAM, database size, and number of concurrent queries. No one has yet reported positive results for any number over 6000.
It's important that your shared memory settings not exceed the actual amount of physical RAM minus the amount of memory the kernel needs, since this would involve using swap, thereby erase the performance gains of granting the shared memory to PostgreSQL!
Since Bricolage is a large database and we recommend the use of a server with lots of RAM, you will likely want to set shared_buffers to a value between 2048 and 4096. (Bricolage typically doesn't have too many concurrent connections.) In the following example, we'll demonstrate how to set the value to 4096 (32 MB).
First you may need to set your kernel to allow for the shared memory allocation. On a Linux 2.2.x or 2.4.x systems shared memory is controlled through settings in the /proc filesystem. So, for our example, where we've decided to give PostgreSQL up to 32MB of shared memory, here's how you would make the kernel setting (note that the kernel setting is in bytes, so our setting is 32 * 1024 * 1024 = 33554432):
$ echo 33554432 >/proc/sys/kernel/shmall
$ echo 33554432 >/proc/sys/kernel/shmmax
Now you can update your postgresql.conf file's shared memory buffers setting to take advantage of this shared memory:
shared_buffers = 4096
Next, it's probably also a good idea to boost your sort memory to get a better response time to the very large queries used to look up Bricolage objects. The default setting is 1024 (again these are 8Kb buffers). You'll probably want this set to at least 8192 (32 MB).
sort_mem = 8192
And finally, since Bricolage is a web application with persistent database connections, you can likely decrease the value of the max_connections setting in postgresql.conf from its default value of 32 to between 6 and 16 or so. Watch your Bricolage server to see how many connections it holds to the database in order to determine the optimum setting. Since the amount of shared memory that PostgreSQL will request from the OS kernel is the value of max_connections * shared_buffers, lowering this number can also help you to find a middle ground between a healthy value for shared_buffers and needing to reconfigure the kernel's SHMMAX.
max_connections = 8
See Bruce Momjian's article "PostgreSQL Hardware Performance Tuning" at http://www.ca.postgresql.org/docs/momjian/hw_performance/ for more on tuning your PostgreSQL server. Lyris has also posted a useful article on this topic at http://www.lyris.com/lm_help/6.0/tuning_postgresql.html.
Note: Do this at your own risk!!!
Another common cause of poor performance in a PostgreSQL database is the existence of very very large tables. Bricolage can sometimes have this sort of problem since by design it is very conservative about eliminating old data. As a result, certain tables tend to grow quite large.
For instance, if you notice that the Bricolage database has become quite large you might decide that it makes sense to actually delete some of the old data. In particular the job table tends to grow quite large, with most of the data actually not being used for much of anything, so from time to time we do this:
DELETE FROM job
WHERE pending = 0
AND comp_time IS NOT NULL
This can result in significant boosts in preview and publish performance, since, for each job-related query, PostgreSQL will no longer have to load a very very large index into memory, sometimes paging it several times. Of course it also saves disk space.
Of course if you have some reason to generate reports on complete jobs you won't be able to do this.
David Wheeler <david@wheeler.net>
Mark Jaroski <jaroskim@who.int>