While designing a database system most of the time you have to pick a compromise between availability, durability, performance and cost. Most of the time system design prefers being fast by default, thereby leaving a question about its durability need. Postgres community has a different approach to this problem. Postgres does have several options that can be used for performance gains while maintaining the durability aspect.
A combination of Postgres features like synchronous replication and EDB’s Failover Manager is one of the best available options to ensure that there is no compromise on durability, availability and performance. Further in this blog I am going to discuss how PostgreSQL features and EnterpriseDB tools can be combined to achieve Durability and Availability for your database platform.
PostgreSQL’s Synchronous replication feature ensures data reliability for business critical applications. When this feature (Synchronous replication) is combined with EFM tool for continuous monitoring of Postgres database service availability on master server, the combination gives you tremendous versatility. This combination is extensively used in business critical applications where you cannot afford to lose any single transaction. Such applications also require high availability and disaster recovery with RTO 0.
In PostgreSQL’s Synchronous replication, the commit of a transaction will wait until confirmation is received that the commit has been written to the transaction log on both master and standby servers. The application will not receive any explicit acknowledgement of the transaction until the WAL data is ensured to be received by the standby.
Transaction locks are held when a transaction is writing on the database. In sync replication, these transaction locks continue to be held until the transaction is committed on both the master and standby servers resulting in a lower performance for database applications due to the increased response time. As sync replication is dependent on data being sent to standby server, there is an impact on performance due to network latency and limited bandwidth of network links between two databases. To overcome this challenge you need an infrastructure that is highly reliable with multi-path network and lower latency.
You must be very careful when using synchronous replication. Write-availability of your master will be impacted if the standby server goes down. If the standby is not available while master server is writing to it, it will force the transaction on the master to wait. To overcome this, you can configure multiple standby names with sync-preference order in the parameter synchronous_standby_names. This parameter defines the priority of the database servers by the sequence of entry. For example, when “synchronous_standby_names =node1, node2”, node1 has the priority as the master server. From v9.6 onward you can also specify a count e.g. ‘2(node1, node2, node3)’ meaning 2 of these 3 servers will be synchronous replica with a priority order same as the order of the list.
When using sync replication, data on the standby will be consistent with the master to ensure no data loss for any transaction by maintaining 2 copies of the data. However, waiting for acknowledgement from standby server in sync replication setup will have a performance overhead for write transactions on master. The impact of this depends on several factors including but not limited to network bandwidth, network latency between the sites hosting database servers and infrastructure.
Standby server’s availability can have an impact on your master database’s write-availability. Hence it would be wise to consider at least one additional standby in addition to the number of synchronous standby servers that you need. This also helps in ensuring better availability and fault tolerance. For instance in case any one of the node crashes/fails, in a 2-node setup, it ends up with a single instance making it incapable for high availability. This can be overcome by having additional standby-node setup with the parameter configuration “synchronous_standby_names=standby1, standby2, standby3”, where standby1 acts as preferred synchronous replica and standby2 potential synchronous standby if standby 1 is unavailable. For example in case of node 1 fails, node 2 is promoted as master & node 3 as current standby keeping up the high availability. Once node 1 is manually brought up, it behaves as a potential standby. The DB servers running on PPAS version 9.5AS and earlier support Sync replication configuration for 2 servers.
While synchronous replication is helpful in boosting the durability of data and transactions, streaming replication by itself cannot ensure a highly available database platform which involves zero configuration changes on application and near-zero downtime without manual intervention. This is where EnterpriseDB adds value.
The automatic failover between the DB nodes is achieved with EDB Failover Manager Tool.When master node crashes, standby node is promoted to accept writes, the VIP attached to the crashed master is moved to the newly promoted node. You can control failover time by setting the timeout parameters in the configuration file.
Sync replication is one of the best solutions where applications require high durability. The effectiveness of this feature is more enhanced when combined with EFM. The combination of the two features – durability and high availability allows the environment to be more effective with appropriate utilization of infra, reliable network specifications, efficient sizing for transaction logs, archive logs and data directory.
– Prathima Mulpuri I Database Solution Consultant, Ashnik