Quick Answer
Mean time to detect (MTTD) and mean time to resolve (MTTR) are the two metrics most directly tied to user pain during incidents. Targets vary widely by industry because operating tempo, regulatory exposure, and cost of downtime differ enormously. Strong programs commit to MTTD and MTTR by service tier, not as a single estate-wide number, and they measure against a realistic baseline before promising aggressive targets. Key Terms MTTD is mean time to detect, from the start of an incident to when it is identified by monitoring or a human report. MTTR is mean time to resolve, from detection to confirmed restoration of service. MTTA (mean time to acknowledge) sits between detection and active response and is a useful intermediate metric. Dwell time is the security-incident equivalent of MTTD, often measured in days or weeks for undetected breaches. Typical Ranges by Industry Industry / Workload MTTD Target MTTR Target Tier-1
Key Topics Covered
Mean time to detect (MTTD) and mean time to resolve (MTTR) are the two metrics most directly tied to user pain during incidents. Targets vary widely by industry because operating tempo, regulatory exposure, and cost of downtime differ enormously. Strong programs commit to MTTD and MTTR by service tier, not as a single estate-wide number, and they measure against a realistic baseline before promising aggressive targets.
Key Terms
MTTD is mean time to detect, from the start of an incident to when it is identified by monitoring or a human report. MTTR is mean time to resolve, from detection to confirmed restoration of service. MTTA (mean time to acknowledge) sits between detection and active response and is a useful intermediate metric. Dwell time is the security-incident equivalent of MTTD, often measured in days or weeks for undetected breaches.
Typical Ranges by Industry
| Industry / Workload | MTTD Target | MTTR Target |
|---|---|---|
| Tier-1 web and SaaS (consumer-facing) | 1 to 5 minutes | 30 to 60 minutes |
| BFSI (banking, capital markets) | 2 to 10 minutes | 1 to 2 hours |
| Healthcare clinical systems | 5 to 15 minutes | 1 to 4 hours |
| Manufacturing OT / industrial | 10 to 30 minutes | 2 to 4 hours |
| Retail e-commerce (peak season) | 1 to 5 minutes | 15 to 60 minutes |
| Internal enterprise apps | 15 to 60 minutes | 4 to 8 hours |
| Security incidents (well-tuned SOC) | Under 1 hour | Hours to days |
| Security incidents (average enterprise) | ~280 days | Weeks to months |
Note that security MTTD (dwell time) is dramatically worse industry-wide than IT operations MTTD because attackers actively hide and many enterprises lack mature detection. Closing this gap is the central case for MDR and SOC investment.
Need help with cloud?
Book a free 30-minute meeting with one of our cloud specialists. We'll analyse your situation and provide actionable recommendations — no obligation, no cost.
What to Look For in Your Own Numbers
Measure MTTD and MTTR by severity tier and by service. Aggregate numbers hide problems; the per-service breakdown shows where to invest. Pair MTTD with detection coverage (what percentage of incidents are detected by monitoring versus reported by users). User-reported incidents inflate effective MTTD because the clock starts at the actual outage, not at user report. A program with strong tooling but weak user reporting paths can look better than it is.
How to Close the Gap
- For MTTD: add synthetic checks against critical user journeys, tighten alert thresholds on the noisiest false negatives, and instrument SLIs aligned to user-visible behavior.
- For MTTR: author runbooks for the top 20 ticket categories, automate the safe remediation steps, and ensure on-call engineers have one-click access to recovery actions.
- For both: review every incident with a blameless post-mortem and track action items to closure. Trends improve when learning becomes a discipline, not an aspiration.
A common pitfall is setting estate-wide MTTR targets that are unrealistic for low-tier workloads, which causes teams to either game the metric or burn out trying to meet it. Tier-aware targets keep the program honest.
How Opsio Helps
Opsio's 24/7 managed troubleshooting service publishes MTTD and MTTR by service tier in monthly reports and tracks action items from post-incident reviews. Read the pillar on 24/7 IT incident response, compare with incident response as a service, or contact us to baseline your current numbers.
Frequently Asked Questions
Are these benchmarks the same globally?
Targets are broadly consistent across mature markets but vary with regulatory regime and customer expectations. For example, BFSI MTTR targets in the EU are influenced by DORA reporting windows, while US healthcare is shaped by HIPAA breach notification. Use the table as a starting reference and adjust for your jurisdiction and contractual commitments.
Why is security MTTD so much worse than IT MTTD?
Attackers actively evade detection, while infrastructure failures are passive and visible. The industry average dwell time often cited is around 280 days, driven by enterprises without dedicated SOC capability or mature EDR. Programs with mature MDR or SOC operations achieve sub-hour MTTD by combining endpoint telemetry, SIEM correlation, and proactive threat hunting.
Should we report MTTD and MTTR to the board?
Yes, by service tier with trend lines. Board-level reporting forces honest conversations about underinvestment in low-tier coverage and overcommitment on aspirational targets. Pair the metrics with cost of downtime per minute to translate technical numbers into financial impact.
How quickly can we improve MTTD and MTTR?
Most programs see meaningful MTTR improvement within one quarter of focused runbook investment, often 30% to 50% reduction in covered categories. MTTD improvement is faster, achievable within weeks when synthetic checks and tuned alerts are added. Sustained improvement requires the post-mortem loop running consistently month after month.
Is automating remediation safe?
For well-understood failure modes, yes. Restart a stuck service, rotate a credential, scale out a queue: these are low-risk and high-impact. Avoid automating actions with broad blast radius (mass deletes, region failover) until both detection accuracy and rollback paths are mature. The honest test is whether the action is safe to run at 3 a.m. without a human in the loop.
Related Guides
Written By

Country Manager, Sweden at Opsio
Johan leads Opsio's Sweden operations, driving AI adoption, DevOps transformation, security strategy, and cloud solutioning for Nordic enterprises. With 12+ years in enterprise cloud infrastructure, he has delivered 200+ projects across AWS, Azure, and GCP — specialising in Well-Architected reviews, landing zone design, and multi-cloud strategy.
Editorial standards: This article was written by cloud practitioners and peer-reviewed by our engineering team. We update content quarterly for technical accuracy. Opsio maintains editorial independence.