Skip to main content

Doing in-place, single-node Postgres upgrades with almost no downtime

·2560 words·13 mins

On my Debian servers, I use PostgreSQL from the PostgreSQL Global Development Group (PGDG) APT repository. This allows me to install any version of PostgreSQL, including the latest one, even on an older Debian release. For example, on Debian 11, the packaged Postgres is 13, which was released 5 years ago.

Thanks to the PGDG APT repo, I can install version 17, which was released a few months ago. This allows me to keep up with the latest features and performance improvements of Postgres.

root@server ~# lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 11 (bullseye)
Release:        11
Codename:       bullseye

root@server ~# dpkg -l | grep postgres
ii  postgresql-16                  16.6-1.pgdg110+1                   amd64        The World's Most Advanced Open Source Relational Database
ii  postgresql-16-repack           1.5.2-1.pgdg110+1                  amd64        reorganize tables in PostgreSQL databases with minimal locks
ii  postgresql-client-16           16.6-1.pgdg110+1                   amd64        front-end programs for PostgreSQL 16
ii  postgresql-client-common       267.pgdg110+1                      all          manager for multiple PostgreSQL client versions
ii  postgresql-common              267.pgdg110+1                      all          PostgreSQL database-cluster manager

I use my Ansible role for Postgres to install and manage the repo and Postgres.

root@server ~# cat /etc/apt/sources.list.d/apt_postgresql_org_pub_repos_apt.list
deb http://apt.postgresql.org/pub/repos/apt/ bullseye-pgdg main

To keep up with recent versions, I need to do major upgrades. For multi-instance setups, it’s possible to do in-place rolling upgrades with no downtime.

For a single instance, downtime is needed. The most obvious way of doing a major Postgres update is to dump the databases from the old version, install the new version, and reimport the data. However, this can mean a lot of downtime and space needed for large databases.

Fortunately, Postgres provides a tool, pg_upgrade, to do in-place upgrades without the need for reimporting the data:

pg_upgrade (formerly called pg_migrator) allows data stored in PostgreSQL data files to be upgraded to a later PostgreSQL major version without the data dump/restore typically required for major version upgrades, e.g., from 12.14 to 13.10 or from 14.9 to 15.5. It is not required for minor version upgrades, e.g., from 12.7 to 12.8 or from 14.1 to 14.5.

Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files.

To make things even simpler, the APT-packaged Postgres provides a Perl wrapper for that, called pg_upgradecluster.

A “cluster” in this context is a PostgreSQL instance:

root@server ~# pg_lsclusters
Ver Cluster Port Status Owner    Data directory              Log file
16  main    5432 online postgres /var/lib/postgresql/16/main /var/log/postgresql/postgresql-16-main.log

pg_upgradecluster supports upgrading with pg_dump and pg_restore.

But there is a better way: by using hard links, we don’t need to copy the data and can upgrade with just a few seconds of downtime.

I successfully used this technique multiple times to upgrade from version 12 all the way to 17.

Let’s see how I upgrade an 88 GB Postgres instance from Postgres 16 to 17 on Debian 11.

root@server ~# du -hs /var/lib/postgresql/16/main/* | sort -hr
88G     /var/lib/postgresql/16/main/base
129M    /var/lib/postgresql/16/main/pg_wal
45M     /var/lib/postgresql/16/main/pg_xact
6.5M    /var/lib/postgresql/16/main/pg_multixact
656K    /var/lib/postgresql/16/main/global
232K    /var/lib/postgresql/16/main/pg_subtrans
16K     /var/lib/postgresql/16/main/pg_logical
4.0K    /var/lib/postgresql/16/main/postmaster.pid
4.0K    /var/lib/postgresql/16/main/postmaster.opts
4.0K    /var/lib/postgresql/16/main/postgresql.auto.conf
4.0K    /var/lib/postgresql/16/main/PG_VERSION
4.0K    /var/lib/postgresql/16/main/pg_twophase
4.0K    /var/lib/postgresql/16/main/pg_tblspc
4.0K    /var/lib/postgresql/16/main/pg_stat_tmp
4.0K    /var/lib/postgresql/16/main/pg_stat
4.0K    /var/lib/postgresql/16/main/pg_snapshots
4.0K    /var/lib/postgresql/16/main/pg_serial
4.0K    /var/lib/postgresql/16/main/pg_replslot
4.0K    /var/lib/postgresql/16/main/pg_notify
4.0K    /var/lib/postgresql/16/main/pg_dynshmem
4.0K    /var/lib/postgresql/16/main/pg_commit_ts

Preparing for the upgrade #

Although this process is supposed to be safe, you can still do a backup just in case.

sudo -Hiu postgres pg_dump -Fc my_database | pigz > backup.tar.gz

For minimal downtime, logical backups + WAL archiving with Barman is a good approach.

Install the new version #

Since I’m using PGDG, I have the latest version available, here 17.4:

root@server ~ apt search postgresql-17 | head
Sorting...
Full Text Search...
postgresql-17/bullseye-pgdg 17.4-1.pgdg110+2 amd64
  The World's Most Advanced Open Source Relational Database

postgresql-17-age/bullseye-pgdg 1.5.0~rc0-3.pgdg110+2 amd64
  Graph database optimized for fast analysis and real-time data processing

postgresql-17-age-dbgsym/bullseye-pgdg 1.5.0~rc0-3.pgdg110+2 amd64
  debug symbols for postgresql-17-age

Install the new version. In my case, I’m using the pg_repack extension, so I also install it for the new version.

root@server ~# apt install postgresql-17 postgresql-17-repack
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
  postgresql-17 postgresql-17-repack
0 upgraded, 2 newly installed, 0 to remove and 5 not upgraded.
Need to get 18.4 MB of archives.
After this operation, 61.5 MB of additional disk space will be used.
Get:1 http://apt.postgresql.org/pub/repos/apt bullseye-pgdg/main amd64 postgresql-17 amd64 17.4-1.pgdg110+2 [18.3 MB]
Get:2 http://apt.postgresql.org/pub/repos/apt bullseye-pgdg/main amd64 postgresql-17-repack amd64 1.5.2-1.pgdg110+1 [111 kB]
Fetched 18.4 MB in 0s (38.2 MB/s)
Preconfiguring packages ...
Selecting previously unselected package postgresql-17.
(Reading database ... 157429 files and directories currently installed.)
Preparing to unpack .../postgresql-17_17.4-1.pgdg110+2_amd64.deb ...
Unpacking postgresql-17 (17.4-1.pgdg110+2) ...
Selecting previously unselected package postgresql-17-repack.
Preparing to unpack .../postgresql-17-repack_1.5.2-1.pgdg110+1_amd64.deb ...
Unpacking postgresql-17-repack (1.5.2-1.pgdg110+1) ...
Setting up postgresql-17 (17.4-1.pgdg110+2) ...
Creating new PostgreSQL cluster 17/main ...
/usr/lib/postgresql/17/bin/initdb -D /var/lib/postgresql/17/main --auth-local peer --auth-host scram-sha-256 --no-instructions
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/17/main ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... Europe/Paris
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok
update-alternatives: warning: forcing reinstallation of alternative /usr/share/postgresql/17/man/man1/psql.1.gz because link group psql.1.gz is broken
Setting up postgresql-17-repack (1.5.2-1.pgdg110+1) ...
update-alternatives: using /usr/lib/postgresql/17/bin/pg_repack to provide /usr/bin/pg_repack (pg_repack) in auto mode
Processing triggers for postgresql-common (267.pgdg110+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:

By default, a new cluster will be created and run on another port (here 5433 instead of 5432). They use a different data directory, so there is no conflict.

root@server ~# pg_lsclusters
Ver Cluster Port Status Owner    Data directory              Log file
16  main    5432 online postgres /var/lib/postgresql/16/main /var/log/postgresql/postgresql-16-main.log
17  main    5433 online postgres /var/lib/postgresql/17/main /var/log/postgresql/postgresql-17-main.log

We want to use the upgrade cluster feature to upgrade our existing cluster to 17.

Cluster 16 will become cluster 17, so we need to drop the existing empty cluster 17:

root@server ~# pg_dropcluster 17 main --stop
root@server ~# pg_lsclusters
Ver Cluster Port Status Owner    Data directory              Log file
16  main    5432 online postgres /var/lib/postgresql/16/main /var/log/postgresql/postgresql-16-main.log

Doing the in-place upgrade #

We’re now going to use pg_upgradecluster to upgrade the cluster. The key argument to use is --method=link or --method=upgrade --link. That means we’ll use hard links instead of dumping the data, which will be much faster.

-m, --method=dump|upgrade|link|clone
Specify the upgrade method. dump uses pg_dump(1) and pg_restore(1), upgrade uses pg_upgrade(1). The default is dump.
link and clone are shorthands for -m upgrade --link and -m upgrade --clone, respectively.
-k, --link
In pg_upgrade mode, use hard links instead of copying files to the new cluster. This option is merely passed on to pg_upgrade. See pg_upgrade(1) for details.

Let’s do it!

pg_upgradecluster 16 main -m link

The number here is the cluster we want to upgrade from. In my case, from 16 to 17.

⚠️ There will be downtime during this command, between when the old cluster is stopped and the new cluster is started.

Here’s my run:

root@server ~# time pg_upgradecluster 16 main -m link
Stopping old cluster...
Creating new PostgreSQL cluster 17/main ...
/usr/lib/postgresql/17/bin/initdb -D /var/lib/postgresql/17/main --auth-local peer --auth-host scram-sha-256 --no-instructions --encoding UTF8 --lc-collate en_US.UTF-8 --lc-ctype en_US.UTF-8 --locale-provider libc
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.UTF-8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

fixing permissions on existing directory /var/lib/postgresql/17/main ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default "max_connections" ... 100
selecting default "shared_buffers" ... 128MB
selecting default time zone ... Europe/Paris
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Copying old configuration files...
Copying old start.conf...
Copying old pg_ctl.conf...
Running init phase upgrade hook scripts ...

/usr/lib/postgresql/17/bin/pg_upgrade -b /usr/lib/postgresql/16/bin -B /usr/lib/postgresql/17/bin -p 5432 -P 5433 -d /etc/postgresql/16/main -D /etc/postgresql/17/main --link
Finding the real data directory for the source cluster        ok
Finding the real data directory for the target cluster        ok
Performing Consistency Checks
-----------------------------
Checking cluster versions                                     ok
Checking database user is the install user                    ok
Checking database connection settings                         ok
Checking for prepared transactions                            ok
Checking for contrib/isn with bigint-passing mismatch         ok
Checking data type usage                                      ok
Creating dump of global objects                               ok
Creating dump of database schemas
                                                              ok
Checking for presence of required libraries                   ok
Checking database user is the install user                    ok
Checking for prepared transactions                            ok
Checking for new cluster tablespace directories               ok

If pg_upgrade fails after this point, you must re-initdb the
new cluster before continuing.

Performing Upgrade
------------------
Setting locale and encoding for new cluster                   ok
Analyzing all rows in the new cluster                         ok
Freezing all rows in the new cluster                          ok
Deleting files from new pg_xact                               ok
Copying old pg_xact to new server                             ok
Setting oldest XID for new cluster                            ok
Setting next transaction ID and epoch for new cluster         ok
Deleting files from new pg_multixact/offsets                  ok
Copying old pg_multixact/offsets to new server                ok
Deleting files from new pg_multixact/members                  ok
Copying old pg_multixact/members to new server                ok
Setting next multixact ID and offset for new cluster          ok
Resetting WAL archives                                        ok
Setting frozenxid and minmxid counters in new cluster         ok
Restoring global objects in the new cluster                   ok
Restoring database schemas in the new cluster
                                                              ok
Adding ".old" suffix to old global/pg_control                 ok

If you want to start the old cluster, you will need to remove
the ".old" suffix from /var/lib/postgresql/16/main/global/pg_control.old.
Because "link" mode was used, the old cluster cannot be safely
started once the new cluster has been started.
Linking user relation files
                                                              ok
Setting next OID for new cluster                              ok
Sync data directory to disk                                   ok
Creating script to delete old cluster                         ok
Checking for extension updates                                ok

Upgrade Complete
----------------
Optimizer statistics are not transferred by pg_upgrade.
Once you start the new server, consider running:
    /usr/lib/postgresql/17/bin/vacuumdb --all --analyze-in-stages
Running this script will delete the old cluster's data files:
    ./delete_old_cluster.sh
pg_upgradecluster: pg_upgrade output scripts are in /var/log/postgresql/pg_upgradecluster-16-17-main.FYsJ
Disabling automatic startup of old cluster...
Starting upgraded cluster on port 5432...
Running finish phase upgrade hook scripts ...
vacuumdb: processing database "mstdn": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "postgres": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "template1": Generating minimal optimizer statistics (1 target)
vacuumdb: processing database "mstdn": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "postgres": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "template1": Generating medium optimizer statistics (10 targets)
vacuumdb: processing database "mstdn": Generating default (full) optimizer statistics
vacuumdb: processing database "postgres": Generating default (full) optimizer statistics
vacuumdb: processing database "template1": Generating default (full) optimizer statistics

Success. Please check that the upgraded cluster works. If it does,
you can remove the old cluster with
    pg_dropcluster 16 main

Ver Cluster Port Status Owner    Data directory              Log file
16  main    5433 down   postgres /var/lib/postgresql/16/main /var/log/postgresql/postgresql-16-main.log
Ver Cluster Port Status Owner    Data directory              Log file
17  main    5432 online postgres /var/lib/postgresql/17/main /var/log/postgresql/postgresql-17-main.log

________________________________________________________
Executed in   48.32 secs   fish           external
   usr time    2.56 secs    0.00 micros    2.56 secs
   sys time    2.21 secs  1779.00 micros    2.21 secs

The upgrade took 48 seconds! Which is considerably faster than dumping and importing. For a database of my size (88 GB - hundreds of millions of rows), this would take more than an hour.

At this point, the Postgres instance is up and running with the new version, on the same port (5432). There is no need to change anything on the database clients, including pgbouncer.

root@server ~# pg_lsclusters
Ver Cluster Port Status Owner    Data directory              Log file
16  main    5433 down   postgres /var/lib/postgresql/16/main /var/log/postgresql/postgresql-16-main.log
17  main    5432 online postgres /var/lib/postgresql/17/main /var/log/postgresql/postgresql-17-main.log

We can check that we’re indeed running the new version, 17:

root@server ~# ps faux | grep postgres
postgres     846  0.0  0.0  19788  7640 ?        Ssl  21:59   0:01 /usr/sbin/pgbouncer /etc/pgbouncer/pgbouncer.ini
postgres     847  0.1  0.1 714664 25220 ?        Ssl  21:59   0:01 /usr/local/bin/postgres_exporter --web.listen-address=0.0.0.0:9187
root       10551  0.0  0.0   6316   700 pts/0    S+   22:23   0:00          \_ grep --color=auto postgres
postgres   10397  0.1  0.7 4416876 125384 ?      Ss   22:21   0:00 /usr/lib/postgresql/17/bin/postgres -D /var/lib/postgresql/17/main -c config_file=/etc/postgresql/17/main/postgresql.conf
postgres   10398  0.0  0.0 4417008 5804 ?        Ss   22:21   0:00  \_ postgres: 17/main: checkpointer
postgres   10399  0.0  0.2 4417032 38236 ?       Ss   22:21   0:00  \_ postgres: 17/main: background writer
postgres   10401  0.0  0.1 4416876 22680 ?       Ss   22:21   0:00  \_ postgres: 17/main: walwriter
postgres   10402  0.0  0.0 4418452 8656 ?        Ss   22:21   0:00  \_ postgres: 17/main: autovacuum launcher
postgres   10403  0.0  0.0 4418456 6668 ?        Ss   22:21   0:00  \_ postgres: 17/main: logical replication launcher
postgres   10428  0.0  0.1 4419004 22140 ?       Ss   22:21   0:00  \_ postgres: 17/main: postgres postgres [local] idle
postgres   10429  0.0  0.1 4420480 28016 ?       Ss   22:21   0:00  \_ postgres: 17/main: postgres postgres [local] idle
root@server ~# sudo -Hiu postgres psql
psql (17.4 (Debian 17.4-1.pgdg110+2))
Type "help" for help.

postgres=# SHOW server_version;
         server_version
--------------------------------
 17.4 (Debian 17.4-1.pgdg110+2)
(1 row)


postgres=# \l
                                                     List of databases
   Name    |  Owner   | Encoding | Locale Provider |   Collate   |    Ctype    | Locale | ICU Rules |   Access privileges
-----------+----------+----------+-----------------+-------------+-------------+--------+-----------+-----------------------
 mstdn     | postgres | UTF8     | libc            | en_US.UTF-8 | en_US.UTF-8 |        |           |
 postgres  | postgres | UTF8     | libc            | en_US.UTF-8 | en_US.UTF-8 |        |           |
 template0 | postgres | UTF8     | libc            | en_US.UTF-8 | en_US.UTF-8 |        |           | =c/postgres          +
           |          |          |                 |             |             |        |           | postgres=CTc/postgres
 template1 | postgres | UTF8     | libc            | en_US.UTF-8 | en_US.UTF-8 |        |           | postgres=CTc/postgres+
           |          |          |                 |             |             |        |           | =c/postgres
(4 rows)

postgres=#

Cleaning up #

We still have the data for the previous version:

root@server ~# du -hs /var/lib/postgresql/16/main/base/*
7.7M    /var/lib/postgresql/16/main/base/1
7.4M    /var/lib/postgresql/16/main/base/4
7.7M    /var/lib/postgresql/16/main/base/14101
88G     /var/lib/postgresql/16/main/base/272795
24M     /var/lib/postgresql/16/main/base/pgsql_tmp
root@server ~# du -hs /var/lib/postgresql/17/main/base/*
7.7M    /var/lib/postgresql/17/main/base/1
7.5M    /var/lib/postgresql/17/main/base/4
7.6M    /var/lib/postgresql/17/main/base/14101
88G     /var/lib/postgresql/17/main/base/272795

Fortunately, and this is a big advantage of this solution, the hard links mean that we don’t need double the space.

Which is handy for me 🙈

root@server ~# df -h /
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda1       151G  113G   32G  79% /

If everything is behaving as expected, we can remove the old cluster:

root@server ~# pg_dropcluster 16 main
root@server ~# pg_lsclusters
Ver Cluster Port Status Owner    Data directory              Log file
17  main    5432 online postgres /var/lib/postgresql/17/main /var/log/postgresql/postgresql-17-main.log

And remove the old version:

root@server ~ [SIGINT]# apt-get autoremove postgresql-16
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following packages will be REMOVED:
  postgresql-16 postgresql-16-repack postgresql-client-16
0 upgraded, 0 newly installed, 3 to remove and 24 not upgraded.
After this operation, 69.0 MB disk space will be freed.
Do you want to continue? [Y/n]
(Reading database ... 159145 files and directories currently installed.)
Removing postgresql-16-repack (1.5.2-1.pgdg110+1) ...
Removing postgresql-16 (16.6-1.pgdg110+1) ...
Removing postgresql-client-16 (16.6-1.pgdg110+1) ...
Processing triggers for postgresql-common (267.pgdg110+1) ...
Building PostgreSQL dictionaries from installed myspell/hunspell packages...
Removing obsolete dictionary files:

All done ✨ pg_upgrade makes it very easy to keep up with the Postgres releases, with minimal downtime for non-replicated clusters. Thanks Postgres ❤️