When using Postgres SASS we are not care about how cloud provider managed them but one a day, we need to run a
postgres cluster on premise, we need to care about that. We setup 2,3
postgres server, once act as
primary server, the other one act as
standby server. When
primary server is down, we login into
standby server then triggering reload them, move them to
primary instance. But there is a tool help us do it automaticaly.
repmgr is an open-source tool suite for managing replication and failover in a cluster of PostgreSQL servers. It enhances PostgreSQL's built-in hot-standby capabilities with tools to set up standby servers, monitor replication, and perform administrative tasks such as failover or manual switchover operations.
In this scope of this post, i will deployed two
postgres server using
docker compose. Then put
haproxy in the front-of them,
haproxy will detect the
primary and proxy traffic to this server. The architecture look like this.
For Demo purpose i will deployed two postgres server using
postgres will share network namespace, so they can communicate by service name.
haproxy will expose port
5432 to the host machine.
I will use image from
bitnami, they packaged all the need tool for us.
docker-compose.yml above, i defined two services,
pg-0 is the initial primary server. Let deploy it first using
docker compose up pg-0
It's up. Then we will deploy
docker compose up pg-1
From the log, you will see that
pg-1 joined cluster at
Go to the
pg-0 service and check the log.
pg-0 detected new standby server.
Now we already have two server.
haproxy service to
docker-compose.yaml file, also preprare
haproxy. Service look like this.
configs: haproxy: file: ./haproxy.cfg services: haproxy: image: haproxy mem_limit: 8192m ports: - 3000:3000 - 5432:5432 privileged: true configs: - source: haproxy target: /usr/local/etc/haproxy/haproxy.cfg
The content of
global maxconn 4096 ulimit-n 30000 external-check insecure-fork-wanted log stdout format raw daemon notice resolvers docker nameserver dns1 127.0.0.11:53 hold valid 10s defaults log global mode tcp option tcplog option dontlognull option tcp-check option srvtcpka http-reuse aggressive timeout client 180m timeout server 180m timeout connect 2s1 listen stats bind *:3000 mode http stats enable stats uri / stats refresh 2s listen primary bind *:5432 maxconn 2048 option external-check external-check command /build-psql/primary-check.sh default-server resolvers docker init-addr none check inter 2s fall 1 rise 2 slowstart 4000 server pg-0 pg-0:5432 server pg-1 pg-1:5432 listen standy bind *:5433 maxconn 2048 option external-check balance roundrobin external-check command /build-psql/standby-check.sh default-server resolvers docker init-addr none check inter 2s fall 1 rise 2 slowstart 4000 server pg-0 pg-0:5432 server pg-1 pg-1:5432
haproxy.cfg config, i defined
external health-check command so i can detect which service is
primary and also
standby. I used function
pg_is_in_recovery(), you can check it here.
primary-check.sh is look like this.
#!/bin/bash VIP=$1 VPT=$2 RIP=$3 PG_MONITOR_USER=postgres PG_MONITOR_PASS=postgres PG_MONITOR_DB=postgres if [ "$4" == "" ]; then RPT=$VPT else RPT=$4 fi STATUS=$(PGPASSWORD="$PG_MONITOR_PASS" /usr/local/pgsql/bin/psql -qtAX -c "select pg_is_in_recovery()" -h "$RIP" -p "$RPT" --dbname="$PG_MONITOR_DB" --username="$PG_MONITOR_USER") #echo "$@ status=$STATUS" if [[ "$STATUS" == "f" ]] then exit 0 else exit 1 fi
standby-check.sh is look like
#!/bin/bash VIP=$1 VPT=$2 RIP=$3 PG_MONITOR_USER=postgres PG_MONITOR_PASS=postgres PG_MONITOR_DB=postgres if [ "$4" == "" ]; then RPT=$VPT else RPT=$4 fi STATUS=$(PGPASSWORD="$PG_MONITOR_PASS" /usr/local/pgsql/bin/psql -qtAX -c "select pg_is_in_recovery()" -h "$RIP" -p "$RPT" --dbname="$PG_MONITOR_DB" --username="$PG_MONITOR_USER") if [[ "$STATUS" == "t" ]] then exit 0 else exit 1 fi
docker compose up haproxy
http://localhost:3000 to watch stats.
At this moment,
haproxy detected the right service for us.
pg-0 is primary and
standby. Let use
psql to send a query to database.
Time to test if
pg-1 will up when our
pg-0 dead. Let stop
pg-1 service and see the log. You will see it detected the failure of
pg-0 and promote itself to be a primary server.
Go to the
At this moment,
haproxy detect that there is no
standby server due to
pg-0 dead and
pg-1 is primary instace. Go to
psql try to execute a query. You will be able to do
Next time, up
pg-0 service, i will expect
haproxy will update
standby listener status.
Check the log of
pg-1, you will see
pg-0 joined at
haproxy dashboard, it deteted new standby via check script file.
That is, thank for
repmgr, we reduced a lot of effort to manage them. Thank you for reading this post.