[Pkg-postgresql-public] Proposed structure for coexisting major versions

Oliver Elphick olly@lfix.co.uk
Wed, 22 Oct 2003 00:29:46 +0100


The attached proposal is written primarily for Debian.  Its
motivation is that the current package upgrade process is pretty flaky
and also that the current packaging does not really provide for multiple
simultaneous postmasters, which are wanted by people hosting several
database clusters.

I assume these proposals may also be of interest to Red Hat packagers.

I am copying it to the PostgreSQL list in case there are any
obvious problems that will make it impracticable or harmful and in the
hope of suggestions for its improvement.  If the general idea is
acceptable to the core team, I will rework it to be incorporated into
7.5.  The major changes would be in the directory names, which currently
assume Debian's file structure.




PostgreSQL client wrapper proposal
==================================


Current situation
-----------------

When a new major version of PostgreSQL is released, it is necessary to
dump and reload the database.  The old software must be used for the dump,
and the new software for the reload.

This is a major problem for Red Hat and Debian, because a dump and reload is
not required by every upgrade and by the time the need for a dump is
realised, the old software may be deleted.  Debian has certain rather
unreliable procedures to save the old software and use it to do a dump, but
these procedures often go wrong.  Red Hat's installation environment is so
rigid that it is not practicable for the Red Hat packages to attempt an
automatic upgrade.  At the moment, Debian offers a debconf choice for
whether to attempt automatic upgrading; if it fails or is not allowed, a 
manual upgrade must be done, either from a pre-existing dump or by
manual invocation of the postgresql-dump script.

There was once an upstream program called pg_upgrade which could be used
for in-place upgrading.  This does not currently work and does not seem to
be a high priority with upstream developers.

It is possible to run different versions of PostgreSQL simultaneously, and
indeed to run the same version on separate database clusters simultaneously.
To do so, each postmaster must listen on a different port, so each client
must specify the correct port.  If it were possible to have two separate
versions of the PostgreSQL packages installed simultaneously, it would be
simple to do database upgrades by dumping from the old version and
uploading to the new.  However, the current structure of the packaging does
not permit this.



Proposed changes
----------------

I propose to change the Debian packaging to create a new package for each
major version.  The criterion for creating a new package is that initdb is
required when upgrading from the previous version. Thus, the existing
postgresql packages will become postgresql-7.3 and the new packages which
are currently in experimental will become postgresql-7.4 (and similarly
for all the binary packages).  

I propose to backport this to the current woody version (7.2.1) so that
upgrading to sarge will be made much easier.  When a new set of
versioned packages is put in the archive, the previous set should depend
on the new set, so as to force its installation.  This will enable old
packages to be removed.  All versioned packages should provide postgresql,
but there should also be a dummy postgresql package which will depend on
postgresql-7.2.  When sarge becomes the new stable release, that package
can be dropped from the archive.

Each versioned package will install into /usr/*/postgresql/{version}. 
In order to allow users easily to select the right package when working,
I propose to modify the Debian package's pg_wrapper program to read the
current version set by/for any user and to fork the correct executable
with the correct library versions according to those preferences.
/usr/bin will provide executables as soft-links to pg_wrapper.

This procedure will also allow separate database clusters to be maintained
for the use of different groups of users; these clusters need not all be
of the same major version.  This will allow much greater flexibility for
those people who need to make application software changes consequent on
a PostgreSQL upgrade.


Not really for developers
-------------------------

Since this proposal deals only with major versions oft te software, it is
probably not a suitable structure for PostgreSQL developers who may want
to be running minor versions in parallel for debugging.  However, it is
important that this structure should not interfere with their ability to
do that.


Detailed structure
------------------

The system will be managed by two simple datafiles, one of which will tabulate
clusters against major versions and a unique port number.  The other will
tabulate user against cluster and default database.  The default database
need not be specified, and will default to the user's own login name, in
accordance with current upstream behaviour.

Configuration:
--------------

[In accordance with Debian policy, all configuration files are in /etc;
they are currently soft-linked to the files in $PGDATA; this is controlled
by an additional flag to initdb: --debian-conffile]

/etc/default/postgresql:		Default values for all versions
/etc/postgresql/cluster_ports:		maps clusters to versions and ports
/etc/postgresql/user_clusters:		maps users against clusters and dbs
/etc/postgresql/config-{version}/:	Default per-version configuration
					postgresql.conf, etc.
/etc/postgresql/{port}/			Configuration for a specific cluster
					containing cluster-specific files
					soft-linked to PGDATA/ (esp. pg_hba.conf)

/etc/default/postgresql will replace the current
/etc/postgresql/postmaster.conf and /etc/postgresql/postgresql.env.
Any per-cluster information currently in those files must be stored in
/etc/postgresql/cluster_ports or under /etc/postgresql/{port}/

Package files and programs:

/usr/lib/postgresql/{version}/:		files for a specific version
/usr/share/postgresql/{version}:
/usr/share/doc/postgresql/{version}:
/usr/bin/{program}-{version}:           scripts calling "pg_wrapper -v {version}"
/usr/lib/postgresql/bin/{program}-{version}

Common programs:

/usr/bin/pg_wrapper:			environment chooser and program
					selector
/usr/bin/{program}                      soft-links to pg_wrapper
/usr/lib/postgresql/bin/{program}       soft-links to pg_wrapper for
					administrator programs


cluster_ports file
------------------

# This file maps PostgreSQL database clusters against their database
# versions, the ports on which their postmasters are to listen, the Unix
# logins of their administrators and the directories where their data are
# to be found.
#
# Cluster:  the name of the cluster
# Active:   (yes/no) whether a postmaster should be started at boot time
# Version:  Which major version of the software is needed
# User:     The Unix login of the cluster owner and administrator
# Port:     The port on which this cluster's postmaster should listen
# Pgdata:   The pathname of the data directory for this cluster
#
# This file will be altered by initdb, which will add or amend lines when
# new clusters are created.
#
# CLUSTER     ACTIVE  VERSION   USER       PORT  PGDATA

postgresql    yes     7.3       postgres   5432  /var/lib/postgres/7.3/data
developers    yes     7.4       postgres   5433  /var/lib/postgres/7.4/data
mycustomer01  yes     7.3       cust01     5436  /usr3/cust/cust01/data


user_clusters file
------------------

# This file maps users against the database clusters to which they will
# connect by default.  Any user may use "pg-wrapper -c" to set a new
# default to be stored in ~/.postgresqlrc and the existence of that file
# will supersede anything stored here, unless FORCED is set.  If a
# database is specified, that will be the one connected to by client tools
# if none is specified on the command line.  If the database specified
# here is "*", this is interpreted as the database whose name is the same
# as the user's login.  (Setting the database to "*" will provide the
# current default upstream behaviour for command line tools.)
#
# When scanning this file, the first matching line will be used.  It is a
# good idea to provide a default explicitly, with a final line with both
# user and group set to "*".  If there is no default, the implicit default
# is to connect to the cluster whose postmaster is listening on port 5432
# and to the database matching the user's login name.
#
# If FORCED is set, matching users will not be able to override the
# choices preselected for them.  In order to enforce this, pg_wrapper will
# have SGID mode set and will be owned by postgres.postgres; all the
# /usr/lib/postgresql/{version}/ directories will have mode 750 set and be 
# owned by postgres.postgres. [NOTE: is this feature beneficial? access
# can be restricted in other ways.  On the other hand, its use helps
# prevent ignorant users' causing problems by private enterprise.]
#
# This file may be altered by pg_wrapper -u when run by root.
#
# In the context of this file, user and group refer to the Unix login or
# group, not to PostgreSQL users and groups
#
# USER       GROUP        CLUSTER            FORCED         DATABASE

cust01       *            mycustomer01       no             template1
*            cust01       mycustomer01       yes            apps
*            *            postgresql         no             *


Changes to existing programs:
-----------------------------

initdb
------

Add options to specify parameters to be written into cluster_ports:

-C {cluster}
--cluster={cluster}

-p {port}
--port={port}

The administrator login recorded will be that of the owner of the current
login session.  The version will be that of the initdb instance, and the
database path will be the specified PGDATA.

When the --debian-conffile option is specified, the configuration files
will be written in /etc/postgresql/config-{version} (if they do not exist
already).  Any options that apply to a specific cluster only will be
written to /etc/postgresql/{port}.  This will always include pg_hba.conf
and pg_ident.conf.


postmaster
----------

Read the config files in order: /etc/postgresql/[port}/*, then
/etc/postgresql/config-{version}/* for any missing options and finally
/etc/default/postgresql for any more that are not yet set.

The presence of inappropriate options in /etc/default/postgresql will not
be an error, since appropriate options change from one version to another.

pg_hba.conf and pg_indent.conf are only ever read from
/etc/postgresql/[port}/*


psql
----

Abandon the current non-standard error abort if a connection database
is not specified; psql is not expected to be run directly and all
connection parameters should be provided by pg_wrapper as specified above.


other command-line programs
---------------------------

*All* command-line programs will be accessed through pg_wrapper, including
those which are currently in /usr/lib/postgresql/bin/

It will be possible for the postgres Unix user to execute programs directly
from /usr/lib/postgresql/{version}/bin, but he should be advised against
doing this in normal circumstances.


pg_wrapper
----------

[pg_wrapper is a program written for Debian in C.  It is soft-linked to
the various command-line clients and its job is to set the environment
correctly for the linked program (because Debian does not allow a program
to depend on the correct setting of environment variables).  It is not
currently allowed to be called under its own name.]

Allow the program to be called directly.  When called directly, it will
allow a user to display his own connection choices, or to change them for
the current session, or for all sessions (by writing ~/.postgresqlrc).
When called by root, it will allow user_clusters to be changed.

Add options:

-u {user}      root only - change user_clusters entry for the specified user.
-g {group}     root only - change user_clusters entry for the specified group.
-f             root only, with -u or -g - set the FORCE flag, so that the user
               cannot override the choice set by -c and -d
-c {cluster}   specify the cluster for the user
-d {database}  specify the connection database for the user in that cluster
-s             (if -u and -g are not given) save the settings in
               ~/.postgresqlrc

When called as a wrapper with PGHOST not set or through the loopback
interface or from the machine's own IP address, pg_wrapper will read
user_clusters first to find the default (or forced) behaviour.  If FORCE
is not specified, it will then read ~/.postgresqlrc, if it is present
and if PGCLUSTER is not set in the environment, and will set PGCLUSTER
according to what it finds.  It will then set PGPORT to the appropriate
value (from cluster_ports) for the cluster in $PGCLUSTER.

Version specific links of pg_wrapper will be provided, to enable the
use of the appropriate software version when connecting to a remote machine
(which may have a different version or multiple versions of PostgreSQL
on it).  The same behaviour may be achieved with another new command-line
option:

pg_wrapper -v {major version} [-h {remotehost}] -c {command} {command options}

If -h is given or PGHOST is set and the specified host is not the local
machine, the connection is made to the remote machine accoding to whatever
is set in the current environment.

pg_wrapper -l   will list the current settings and show whether or not
                they are forced and whether they are temporary or saved.

Errors: 	If PGCLUSTER is set to a value which is not found in
		cluster_ports, pg_wrapper will exit with return code 16.

		If an attempt is made to connect to the local system with
		disallowed options (trying to get round FORCED), exit
		with return code 20



/etc/init.d/postgresql
----------------------

This [boot-time] script must now start a postmaster for each active
instance in clusters_ports, setting PGPORT and PGDATA and the effective
user id appropriately for each instance and starting the correct version
of the software for each cluster.


pg_ctl
------

This script must become cluster-aware and be able to start the correct
version of the postmaster for each cluster.

Add options

-c {cluster} 		read the appropriate settings from cluster_ports
			(and abort if the EUID is wrong).

-A			operate on all clusters for the current EUID

-a			(with start) operate on active clusters for the
                        current EUID

-r			operate only on running clusters - those for which
			postmaster is running - for the current EUID



-- 
Oliver Elphick                                Oliver.Elphick@lfix.co.uk
Isle of Wight, UK                             http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
                 ========================================
     "If ye then be risen with Christ, seek those things 
      which are above, where Christ sitteth on the right 
      hand of God. Set your affection on things above, not 
      on things on the earth."              Colossians 3:1,2