Azure Disaster Recovery as a Service (DRaaS): Architecture, RPO/RTO, and What It Actually Costs
Country Manager, India
AI, Manufacturing, DevOps, and Managed Services. 17+ years across Manufacturing, E-commerce, Retail, NBFC & Banking

Azure Disaster Recovery as a Service (DRaaS) is a managed offering layered on Azure Site Recovery (ASR), Azure Backup, and a tested operational runbook. The provider replicates production workloads to a secondary Azure region (or from on-prem to Azure), keeps the replication healthy, owns the failover procedure, and runs scheduled failover drills. Typical RPO targets land between 30 seconds and 5 minutes for ASR-replicated VMs; RTO depends almost entirely on application complexity, not on the DR product.
Key Takeaways
- DRaaS on Azure is mostly a managed wrapper around Azure Site Recovery — the technology is mature; the operational rigour is the differentiator.
- Realistic ASR RPO is 30 seconds for most VM workloads; RTO is 30–120 minutes for a fully orchestrated multi-tier failover.
- Self-managed ASR is a viable cheaper path if you have a dedicated Azure operations team that runs failover drills quarterly. DRaaS is the right call when DR sits on a single overworked sysadmin.
- The cost model is asymmetric: replication storage and outbound bandwidth are cheap; the real cost is the secondary-region compute that runs only during failover, plus the ASR licensing per protected instance.
- Without a tested failover runbook, replication is theatre — every regulated framework (NIS2, ISO 22301, SOC 2) now expects evidence of executed DR drills, not just configured replication.
What "Azure DRaaS" Actually Means
The market uses "Azure DRaaS" for two different things and you should pin down which you're buying.
1. ASR-as-a-managed-service. A partner runs Azure Site Recovery for you: configures replication, monitors health, owns the runbook, and executes failover when needed. Most reputable Azure DRaaS offerings sit here. The customer's secondary region is provisioned in their own Azure subscription; the partner has access to operate it.
2. Provider-hosted DR. A partner replicates your workloads into their Azure tenant — you don't run a secondary subscription, they do. This is rarer at the enterprise level because data-residency and chain-of-custody requirements (GDPR Article 28, NIS2 supply-chain controls) usually favour DR in the customer's own subscription. It's more common for SMB customers without an in-house Azure footprint.
Either way, the underlying technology is Azure Site Recovery — Microsoft's first-party DR replication service — combined with Azure Backup for point-in-time recovery and Azure Monitor for replication-health alerts. The DRaaS provider's value is operational, not technological: continuous monitoring, drift detection, scheduled drills, RTO/RPO reporting, and the runbook that survives the person who originally wrote it leaving the organisation.
Need expert help with azure disaster recovery as a service (draas)?
Our cloud architects can help you with azure disaster recovery as a service (draas) — from strategy to implementation. Book a free 30-minute advisory call with no obligation.
Azure Site Recovery: The Native Building Block
ASR is the engine. It replicates VMs (Azure-to-Azure, on-prem-to-Azure, or AWS-to-Azure), tracks replication health, and orchestrates failover via a recovery plan. Three deployment patterns dominate:
- Azure-to-Azure (A2A) — replicates from one Azure region to another. Most common for cloud-native workloads. RPO <1 min, RTO 30 min for most multi-tier apps with a recovery plan.
- VMware/Hyper-V to Azure — uses an on-prem ASR agent. Useful as a stepping stone in DR-led migrations — replicate first, fail over to Azure, then make Azure the primary.
- Physical-server-to-Azure — for workloads that resist virtualization. Same agent model as VMware/Hyper-V; the cutover ergonomics are tougher.
ASR replicates at the disk level using continuous replication. Changes are sent to a cache storage account in the secondary region, then committed to managed disks attached to inactive replica VMs. Network configuration (VNet, NSG, public IP) is mapped via a recovery plan; on failover, ASR creates the target VM, attaches the replicated disks, applies the mapped networking, and runs any pre/post scripts (Azure Automation runbooks) defined in the plan.
ASR does not replicate Azure PaaS services — App Service, Azure SQL, Cosmos DB, Storage Accounts, Key Vault. Those have their own DR mechanisms: geo-redundant storage (RA-GRS), Azure SQL active geo-replication or auto-failover groups, Cosmos multi-region writes, and Key Vault soft-delete with cross-region replication. A complete Azure DR design always combines ASR (for VMs) with the right native DR feature for each PaaS dependency.
RPO and RTO: What's Realistic
The ASR documentation quotes RPO of 30 seconds for most VM workloads and RTO under 2 hours. Those numbers are achievable, but only if the design respects the constraints.
Realistic RPO ranges:
- 30 seconds — Linux/Windows VMs without high disk write throughput.
- 1–5 minutes — VMs with sustained >100 MB/s disk writes (databases under load, transaction-heavy applications). The cache storage account becomes the bottleneck.
- 15+ minutes — when network bandwidth between source and Azure region is constrained (under-provisioned ExpressRoute or Site-to-Site VPN). Often discovered during the first real drill.
Realistic RTO ranges:
- 15 minutes — single-VM web tier with no application orchestration. Trivial.
- 30–60 minutes — typical 3-tier application (web, app, database) with a well-written recovery plan that boots tiers in dependency order.
- 2–4 hours — applications with manual cutover steps (DNS changes that don't propagate, license keys bound to source VMs, hardcoded IPs in config files). The fix is in the application, not in ASR.
- Days — applications no one fully owns. ASR replicates the disk; nobody knows how to validate the application after failover.
The honest framing: ASR delivers the infrastructure RTO. The end-to-end service RTO is a function of how clean the application's startup sequence is. DR drills surface this gap; pre-drill design reviews almost never do.
The Architecture: Replication, Failover, Failback
Replication
ASR provisions a Recovery Services Vault in the target region. Each protected VM gets an ASR Mobility Service (agent) installed on the source. The agent captures disk writes and ships them to a cache storage account in the target region. ASR commits writes to replica managed disks attached to a stopped VM in the target. If the target region is different from the source region (cross-region failover), the cache storage and replica VMs live in the target region's resource group.
Failover
Failover is initiated either manually (recovery plan from the Azure portal or PowerShell) or automatically (rare in production — most teams keep failover human-in-the-loop). The recovery plan boots the replica VMs in dependency order, runs pre-action Azure Automation runbooks (e.g. update DNS), starts the VMs, then runs post-action runbooks (e.g. validate database, run smoke tests). DNS update is the hidden long pole — public DNS TTL needs to be 60–300 seconds before a failover, not configured during.
Failback
Failback reverses replication after the source region recovers. The replica VMs in the target region become the source; the original source becomes the target. Disks re-replicate (initial sync is full disk, not incremental, because the source has been stale). This is the often-skipped phase in DR drills — a true drill should fail back, not just fail over, because the failback validates that you can return to steady state. Many programs only test failover and never fail back, then discover during a real incident that failback takes 36 hours.
DRaaS vs Self-Managed ASR: When Each Wins
| Decision factor | Self-managed ASR | Managed Azure DRaaS |
|---|---|---|
| Azure operations team size | 3+ engineers, including a designated DR owner | 0–2 engineers, DR is one of many duties |
| Drill cadence | Quarterly minimum, on a calendar with named owners | Provider runs drills on contract |
| Compliance evidence requirements | You document and retain | Provider supplies drill reports, RPO/RTO attestations |
| Run-rate cost | ASR licensing + storage + secondary compute (drills only) | Above + 20–40% management premium |
| Recovery during a real incident | Internal team executes runbook | Provider's NOC executes; customer signs off |
| Best fit | Mature Azure platform team with strong runbook hygiene | Regulated workloads, lean ops teams, multi-tenant DR estates |
The wrong reason to buy DRaaS is to avoid learning Azure. The right reason is to buy operational rigour — drill cadence, runbook discipline, audit-grade evidence — that an internal team would deprioritize the moment a higher-priority incident lands. We see plenty of self-managed ASR setups that work; we also see plenty where the last drill ran 22 months ago.
Cost Model — Where the Numbers Hide
Azure DR cost has three layers, and only one is well understood:
1. ASR licensing. A flat per-instance monthly fee billed by Microsoft (free for the first 31 days per protected instance). Predictable; usually ~$25/instance/month. Easy to forecast.
2. Replication storage. Cache storage in the target region plus replica managed disks. For a 1 TB protected VM you're paying for ~1 TB of replica disk storage continuously. Standard HDD disks in the target are common because the disks are inactive until failover. Inexpensive but cumulative — a 100-VM estate adds up.
3. Failover compute. The secondary-region VMs run at $0/hr until failover, because they're stopped. During a drill or real failover, they run at full price. A 100-VM failover at typical D-series SKUs costs roughly $400–800 per hour during the drill window. Drills are usually 4–8 hours; an annual budget of $30–50K just for drill compute is normal for mid-size estates.
Hidden costs that catch teams out:
- Cross-region bandwidth on initial replication — the first sync of a 100-TB estate to the secondary region racks up egress charges that dwarf the ongoing replication cost.
- ExpressRoute Premium add-on — required for cross-region ExpressRoute connectivity, which you'll need if your DR strategy preserves on-prem network connectivity during failover.
- Azure Backup — usually paired with ASR; separate storage cost, billed per protected instance and per GB stored.
Compliance: NIS2, ISO 22301, SOC 2
Three frameworks now treat DR as a documented, tested control rather than a configured feature.
NIS2 Directive (active since October 2024 across the EU) requires entities in essential and important sectors to implement business continuity and crisis management measures, including documented disaster recovery procedures and tested backups. National implementations (BSI in Germany, NSM in Norway, MSB in Sweden) typically expect evidence of at least one annual full DR drill with executive sign-off.
ISO 22301 codifies business continuity management. Audit evidence required: business impact analysis (BIA), recovery time objectives per service, tested response and recovery procedures, and post-drill improvement records.
SOC 2 (Trust Services Criterion CC9.1) requires the organisation to identify, select, and develop risk mitigation activities for disruptions — DR drills satisfy this when properly evidenced. The auditor will ask for the drill log, not the replication configuration screenshot.
The pattern: a configured ASR replication is no longer compliance evidence on its own. The artefacts that matter are the drill report, the RPO/RTO attestation, and the post-drill action items closed against named owners. Managed DRaaS providers ship those artefacts as part of their monthly reporting; self-managed teams have to write them.
How Opsio Approaches Azure DR
Opsio runs Azure DR programs out of our 24/7 NOC for customers across the EU and India. The standard model is ASR-as-a-managed-service in the customer's own subscription, paired with Azure Backup for point-in-time recovery and a runbook reviewed quarterly. We schedule full DR drills semi-annually with executive sign-off, and rerun the drill after any material change to the application — new tier, dependency, or external integration. The drill report is the artefact that matters; the replication is the prerequisite, not the deliverable.
Frequently Asked Questions
What Azure service can be used to handle disaster recovery?
Azure Site Recovery (ASR) is the primary service for VM-level disaster recovery — it handles replication, failover, and failback for Azure-to-Azure, on-prem-to-Azure, and AWS-to-Azure scenarios. For PaaS services, the right native feature varies: geo-redundant storage for blobs, active geo-replication for Azure SQL, multi-region writes for Cosmos DB, and zone-redundant configurations for App Service. A complete Azure DR design pairs ASR with PaaS-specific DR features and Azure Backup for point-in-time recovery.
What is the difference between BaaS and DRaaS?
Backup-as-a-Service (BaaS) protects against data loss — restoring an individual file, database table, or VM image to a point in time. Recovery is measured in hours-to-days and the goal is the data, not the service. Disaster Recovery as a Service (DRaaS) protects against service outage — keeping a replica of the entire production environment hot enough to fail over within minutes-to-hours. Mature programs run both: DRaaS covers regional outages and infrastructure failures; BaaS covers ransomware, accidental deletion, and corruption that DR replication would faithfully replicate to the secondary site.
What RPO and RTO can Azure Site Recovery actually deliver?
For most VM workloads, ASR delivers an RPO of 30 seconds to 5 minutes depending on disk write throughput and network bandwidth. RTO depends on the application: a single-tier web service can recover in 15 minutes, a well-orchestrated 3-tier application in 30–60 minutes, and a complex application with manual cutover steps in 2–4 hours. The RTO ceiling is almost always set by the application's startup sequence, not by ASR itself.
Is Azure DRaaS cheaper than self-managed ASR?
Direct Azure cost is the same — the underlying ASR licensing, storage, and compute charges flow through to the customer's Azure subscription either way. Managed DRaaS adds a 20–40% management premium on top of those costs. The premium buys monitoring, runbook ownership, scheduled drills, and audit-grade evidence. The break-even is operational: if your team would skip drills, miss replication-health alerts, or fail to update the runbook after application changes, the managed premium is cheaper than the missed-recovery scenario it prevents.
How often should we drill Azure DR?
Quarterly for non-regulated workloads; semi-annually with executive sign-off and full failback for workloads under NIS2, ISO 22301, or SOC 2. The drill must include real failover (not a sandbox-only test failover) for at least one tier per environment annually. Drills more frequent than quarterly tend to lose executive attention and become check-the-box exercises; less frequent than annually fails most current compliance frameworks.
Related Services
Related Articles
About the Author

Country Manager, India at Opsio
AI, Manufacturing, DevOps, and Managed Services. 17+ years across Manufacturing, E-commerce, Retail, NBFC & Banking
Editorial standards: This article was written by a certified practitioner and peer-reviewed by our engineering team. We update content quarterly to ensure technical accuracy. Opsio maintains editorial independence — we recommend solutions based on technical merit, not commercial relationships.