JRehkemper.de

WAL Log Archiving and Point in Time Recovery on PostgreSQL

WAL Logs are the transaction logs of PostgreSQL. They contain every query the is executed. That way Postgres is able to verify that the database is in a consistent state after a crash.
If you decide to use WAL Archiving as a Backup-Strategy you also have the possibility to restore the database to any point in time (if you have the neede WAL-Logs), hence the name Point in Time Recovery.

Enabling WAL Archiving

First we need to configure postgres to backup the wal-log-files. The simplest solution is to write them to a NFS-Share.

Therefor we need to edit the postgres-config.

vim /var/lib/pgsql/data/postgresql.conf
# Enable WAL-Log Creation
wal_level = replica

# Enable WAL Archiving
archive_mode = on

This will create Archive-Logs, but we still need a way to back them up. Postgres does not care how you do it and you are free to do what ever works for you.

vim /var/lib/pgsql/data/postgresql.conf
archive_command = 'test ! -f /mnt/postgres-backup/%f && cp %p /mnt/postgres-backup/%f'

Create BaseBackup

Next thing is create a BaseBackup. This is simply a tar of the /var/lib/pgsql/data-directory.
When recovering this will generate a base to replay the WAL-Logs.

# -D: Directory to put the BaseBackup
# -R: Create Recovery.conf
# -F: Format the Backup as tar
# -z: use gzip compression for tar files
# -P: Show Progress
# -c: Create a Checkpoint now
pg_basebackup -D /mnt/postgres-backup/basebackup/$(date -Iseconds) -R -F tar -z -P -c fast

Restart Postgres

Now restart your Instance to activate the configuration changes.

systemctl restart postgresql

Checking WAL Archiving

WAL Logs will be archived every time Postgres switches to a new logfile. Depending on your configuration and the load on your instance this could take quite a long time.
If you don’t want to wait you can trigger it buy running the following SQL-Command.

SELECT pg_switch_wal();

Now you should see, that the WAL-Logs are copied to the NFS-Share.

If you don’t see any files, have a look in /var/lib/pgsql/data/log/.
If your error message is something along the lines of “permissions denied”, it could be an SELinux issue.

You can also monitor the archiving process using the “pg_stat_archiver” view.

postgres=# SELECT * FROM "pg_stat_archiver";

archived_count |    last_archived_wal     |      last_archived_time       | failed_count | last_failed_wal | last_failed_time |          stats_reset
----------------+--------------------------+-------------------------------+--------------+-----------------+------------------+-------------------------------
             17 | 000000010000000000000087 | 2024-08-10 06:00:51.242255+02 |            0 |                 |                  | 2023-08-04 09:25:25.872096+02
(1 row)

Additional BaseBackups

Right now in case of recovery you need to replay all the logs since the last basebackup. If you have a lot of WAL-Logs this can take some time.
You can create additional basebackups to speed things up.

pg_basebackup -D /mnt/postgres-backup/basebackup/$(date -Iseconds) -R -F tar -z -P -c fast

If you look at your WAL directory again, you will see a .backup file in there. This file does not hold database-data but information about the basebackup like a timestamp and the start and end WAL-Log file. If you want to recover the last basebackup you can delete all WAL Files that are older than the .backup file, but don’t delete them right away! There is a tool for that.

Clean up old WAL-Logs

If you want to delete wal-logs older than your last basebackup, you can do that with pg_archivecleanup.
Simply specify the name of the .backup-file in your wal-archive and Postgres will take care of the rest. That way you will not accidentally delete logs that are still needed.
But keep in mind. As soon as you delete the WAL-Logs you can only go back to the basebackup and not to any timestamp.

pg_archivecleanup 00000001000000000000007B.00000028.backup

Disaster Strikes!

Now we can try to recover our database from our backups.

First we need to stop the database.

systemctl stop postgresql

Next we will delete our Database-Directory. You could also use another server and copy the backup files.

sudo rm -rf /var/lib/pgsql/data

Starting the Recovery

First we will restore our basebackup.

sudo cp -r /mnt/postgres/basebackup/<basebackup> /var/lib/pgsql/data
sudo tar -xzf /var/lib/pgsql/data/base.tar.gz
sudo mv /var/lib/pgsql/data/pg_wal.tar.gz /var/lib/pgsql/data/pg_wal/
sudo tar -xzf /var/lib/data/pg_wal/pg_wal.tar.gz
sudo rm -r /var/lib/data/base.tar.gz
sudo rm -r /var/lib/data/pg_wal/pg_wal.tar.gz
sudo chown postgres:postgres -R /var/lib/pgsql
sudo chown 075 /var/lib/pgsql/data

The next step is to modify the new postgresql.conf and setup the recovery options.

vim /var/lib/pgsql/data/postgresql.conf
restore_command = 'cp /mnt/postgres-backup/WAL/%f %p'

# optional
# If you only want to recovery up to a certain time, enter a timestamp here
recovery_target_time = '2024-08-10 20:15:00'

In addition to that create a recovery.signal in the data-directory as will. This will tell Postgres to use the recovery settings and not start as normal.

touch /var/lib/pgsql/data/recovery.signal

After that we can start the database and it should recover from it’s backups

systemctl start postgresql

If you have a look in /var/lib/pgsql/data/log/ you will see the recovery in progress.
As long as revovery is in progress you can only do read-only queries.

If you are missing WAL-Logs because they were not backed up when the database nuked (or in this case go deleted) postgres will be stuck in recovery since it is not successfuly finished.
You can check this with this query.

SELECT  pg_is_in_recovery();

It will return true if it is in recovery and otherwise false.

The logs will tell you why.
If you are missing WAL-Logs, you have to options. Find the missing logs and place them in the backup-directory or quit the recovery and accept the inherent dataloss.
If you decied to exit the recovery mode because you know that the missing logs are gone and not recoverable, you can promote the instance to a read-write instance.

SELECT pg_promote();

Now your postgres instance is back up and running as usual.

profile picture of the author

Jannik Rehkemper

I'm an professional Linux Administrator and Hobby Programmer. My training as an IT-Professional started in 2019 and ended in 2022. Since 2023 I'm working as an Linux Administrator.