In continuation of my previous post, I will talk about HADR for SQL Azure databases. As we know, we can’t take traditional backups for Azure DBs therefore all solution revolving around it will also change. As an organization, we will still want to be always available and resilient to any outages or accidental issues.
If you are not aware of what HADR means , please check this post.
Currently, if we talk about HADR for on-premise SQL servers, the solutions we think about are:
1. AlwaysON availability groups
2. SQL server Failover cluster
3. Mirroring with high safety and witness
5. Log Shipping
All of the above are somewhere dependent on SQL DB and T-log backups. Therefore, any of the above configuration is not feasible to be configured manually with SQL Azure databases. However, logically similar solutions are available.
If you relate well, we are paying for Database as a Service and Microsoft takes whole responsibility to keep it up and running based on their committed SLA. But still, we need to manage HADR at some level to manage our customized requirement and above SLA.
Business Continuity and Disaster Recovery (BCDR Instead of HADR) is another term which is more in trend for SQL Azure databases. lets discuss on the various features available for SQL Azure DBs and how to manage them.
For SQL Azure DBs, there is an uptime SLA committed by Microsoft. Based on the latest documentation, the uptime SLA for:
1. Web and Business tier databases is 99.9%
2. Basic , Standard and Business Tiers is 99.99%.
Now, you may be thinking in numbers on how many hours/minutes of downtime per month with 99.99% or 99.9%. The calculation which goes into it is:
Monthly Uptime %=(Maximum Available Minutes-Downtime)/(Maximum Available Minutes) .
For more information, please check this link
Moreover, by default there are two additional replicas created for any SQL database hosted on azure. It helps to maintain the uptime SLA. If the primary replica fails the secondary replica will take a charge and all the new connections will be routed to the new primary replica by the gateway. The changes are replicated on quorum basis i.e. if changes are committed on primary and one of the replicas, the commit is completed and the commit on second replica will follow.
Even 99.99% is not enough for 24X7 application and it may cause huge financial loss if the database is down for a single minute.
There are three major types of failures, where we need resilience to support such application:
1. Recovery from Accidental deletion of data (Point in time restore – Oops recovery)
2. Machine outage where our DB is hosted in Azure
3. Datacenter/regional outages
- For the first option , please refer my blog post.
- For second option , the secondary copy from the local replicas(as discussed above) is brought online.
- For the third option, we need to manage with the below features of SQL Azure DB.
1. Geo restore – It’s discussed in the post
2. Geo Replication (Active and Standard)
When it comes to HADR, we often talk about RTO and RPO. However, ERT is newly introduced term for Azure which is subset of RTO:
Estimated Recovery Time (ERT) – The estimated duration for the database to be fully functional after a restore/failover request. It may happen that database has come up but application components are still being recovered. Therefore, for database ERT applies and for application RTO.
Recovery Time Objective (RTO) – The time interval that defines the maximum acceptable duration of unavailability before the application fully recovers from a disruptive event.
Recovery Point Objective (RPO) – The amount of most recent data changes (time interval) the application could lose after recovery.
|Point In Time Restore||Any restore point within 7 days||Any restore point within 14 days||Any restore point within 35 days|
|Geo-Restore||ERT* < 12h
RPO† < 1h
|ERT* < 12h
RPO† < 1h
|ERT* < 12h
RPO† < 1h
|Standard Geo-Replication||Not included||ERT* < 30s
RPO† < 5s
|ERT* < 30s
RPO† < 5s
|Active Geo-Replication||Not included||Not included||ERT* < 30s
RPO† < 5s
We will discuss specifically about Standard and Active Geo replication , further.
Standard Geo Replication – This feature is only available for Standard and Premium tiers. It creates a standby copy of the database in the DR paired region but is not readable. Once the failover is done, secondary becomes primary and then application can connect it for read/write purpose. This is best fit for applications where you need aggressive RPO and need just a standby copy of the primary database.
Just choose a server and your secondary database will be hosted there with the same tier, which in our case is standard. As the Secondary copy is just for standby i.e. we can’t read data from it, only 75% of the actual tier cost is charged.
For detailed information on Standard geo replication please check the post
Active Geo Replication : This replication is available only for Premier tier databases. With this replication , you can create four replicas in any datacenter and will be read only. This can be used for load balancing the workload of the application.
For more information , please check this link
Here is the comparison of Standard and Active Geo-replication:
|Online application upgrade||No||Yes|
|Online application relocation||No||Yes|
|Read load balancing||No||Yes|
I will write about Database Sharding in the next post! Stay tuned !
The views expressed on this website/blog are mine alone and do not reflect the views of my company. All postings on this blog are provided “AS IS” with no warranties, and confers no rights.