Disaster Recovery in a Postgres Database

When dealing with PostgreSQL, a well-defined disaster recovery strategy is not just a necessity but a foundational element that ensures the longevity and reliability of your database services. Here, we explore the process of setting up a disaster recovery plan for PostgreSQL databases, complete with code configurations and execution strategies.

Building a Solid Backup Foundation

SQL Dump Backup:

The pg_dump utility can be used to create a full backup of your database. Here’s how you can perform an SQL dump:

pg_dump -U your_username -W -F t your_database > your_backup.tar

This command will prompt for a password due to the -W flag and create a tar-formatted backup file.

File System-Level Backup:

For a file system-level backup, ensure that the database is not being written to or use the pg_start_backup() and pg_stop_backup() functions to put the database in backup mode.

SELECT pg_start_backup('my_backup_label');

After copying the data directory, end the backup mode:

SELECT pg_stop_backup();

Continuous Archiving:

To set up continuous archiving, modify the postgresql.conf file:

wal_level = replica
archive_mode = on
archive_command = 'cp %p /path_to_wal_archive/%f'

These settings ensure that QAL records are written to a specified archive location.

Using Replication on your Database

Streaming Replication:

Adjust your postgreswl.conf and pg_hba.conf files on the primary server:

# postgresql.conf
wal_level = replica
max_wal_senders = 3
max_replication_slots = 3
hot_standby = on

# pg_hba.conf
host replication all your_standby_ip/32 md5

On the standby server, use the recovery.conf file to configure the connection:

standby_mode = 'on'
primary_conninfo = 'host=primary_ip port=5432 user=replication_user password=rep_user_pass'
trigger_file = '/path_to_trigger/trigger_file'

Logical Replication

Create a publication on your primary server for the tables you want to replicate:

CREATE PUBLICATION my_publication FOR TABLE my_table;

Subscribe to the publication from the standby server:

CREATE SUBSCRIPTION my_subscription CONNECTION 'dbname=my_database host=primary_host user=my_user password=my_pass' PUBLICATION my_publication;

Failover and Switchover Mechanisms

Failover:

To promote a standby server to primary, you can create a trigger file as specified in your recovery.conf:

touch /path_to_trigger/trigger_file

The standby will then take over as the primary server.

Switchover:

A switchover can be managed with the pg_ctl utility:

pg_ctl promote -D /path_to_data_directory

Testing and Validating the Plan:

To ensure your disaster recovery plan works, regularly schedule mock disaster scenarios and run through the recovery process:

# Simulate a disaster recovery scenario
pg_ctl stop -D /path_to_data_directory -m immediate
# Begin the recovery process as per your documented plan

Documenting and Routine Drills

Ensure that you maintain a detailed recovery plan and conduct routine drills:

# Semi-annual disaster recovery drill


echo "Starting disaster recovery drill..." | mail -s "DR Drill Initiated" [email protected]

When running PostgreSQL on AMAzon RDS, the process for disaster recovery is different because Amazon handles much of the lower-level database management for you.

Automated Backups: RDS provides automated backups that capture the entire DB instance and transaction logs. This allows you to restore to any point within the retention period, up to the last five minutes of database use.
Manual Snapshots: You can also take DB snapshots manually in RDS. These snapshots are user-initiated and are retained until you delete them.

To manage backups and recoveries in RDS, you would use the AWS Management Console, AWS CLI, or RDS API. For example, to create a DB snapshot using the AWS CLI, you would use:

aws rds create-db-snapshot --db-instance-identifier mydbinstance --db-snapshot-identifier mydbsnapshot

For disaster recovery, you can restore from these snapshots or rely on the point-in-time recovery feature:

aws rds restore-db-instance-to-point-in-time --source-db-instance-identifier mydbinstance --target-db-instance-identifier mydbinstance-restored --restore-time "2023-11-09T12:00:00.000Z"

Remember that when using a managed service like RDS, the AWS infrastructure handles much of the replication and failover capabilities. For instance, if you're using a Multi-AZ deployment, RDS automatically provisions and maintains a synchronous standby replica in a different Availability Zone. This replica will be promoted to primary if the original primary DB instance becomes unavailable.

DigitalOcean Managed Databases for PostgreSQL:

Automated Backups:

DigitalOcean Managed Databases automatically create daily backups of your data.
These backups are retained for seven days.

Manual Snapshots:

You can create manual snapshots (called on-demand backups in DigitalOcean) of your database at any time.
These snapshots are retained until you choose to delete them.

Managing Backups and Recovery:

Use the DigitalOcean Control Panel or DigitalOcean API to manage backups and recovery.
To create an on-demand backup, navigate to the Databases section in your DigitalOcean Control Panel, select your database, and then use the "Backups" tab to create a new backup.
For recovery, you can restore from any of the automated backups or manual snapshots. This will create a new database cluster that you can then switch over to.

Google Cloud SQL for PostgreSQL:

Automated Backups:

Google Cloud SQL performs automated daily backups of your PostgreSQL database, and these are retained for a configurable amount of time.
You can also configure the backup start time and set up binary logging for more granular point-in-time recovery.

Manual Snapshots:

In Google Cloud SQL, you can create manual backups (snapshots) of your database instance.
These can be done from the Google Cloud Console or using the gcloud command-line tool.

Managing Backups and Recovery:

Access the Google Cloud Console, navigate to your SQL instances, and select your PostgreSQL instance to manage backups.
To create a manual backup, click on the instance and then select the Backups tab to start a new backup.
For recovery, you can restore from any of the existing backups. This process will replace the current instance with the backup or you can choose to restore the backup to a new instance.

Example gcloud Command for Creating a Manual Backup:

gcloud sql backups create --async --instance=my-instance

Example gcloud Command for Restoring a Backup:

gcloud sql instances restore-backup --backup-id=BACKUP_ID --instance=INSTANCE_NAME --restore-instance=NEW_INSTANCE_NAME

In both DigitalOcean and GCP, the disaster recovery plan should be regularly tested by restoring backups to a test instance to ensure that your data can be effectively recovered in the event of an actual disaster.

Conclusion

Embedding disaster recovery into your PostgreSQL database management ensures not just continuity, but also your database’s integrity and availability to become resilient to disruptions.