Active incidents
7
Down 12% vs last week
Open Sev0/1
2
One new Sev1 overnight
Automation coverage
68%
+4% this quarter
MTTR rolling 30d
41m
-6m week over week
Live signals
Rolling telemetry view across our critical workloads (updated 2 minutes ago).
Latency P95: 112 ms
Availability: 99.982%
Active responders: 42
| Service | Health | Change summary | Owning team |
|---|---|---|---|
| Northwind Checkout | CRITICAL | +8% error rate vs baseline (3 min) | Commerce Ops |
| Contoso Auth Broker | MEDIUM | Login P95 trending +220 ms | Identity Platform |
| Fabric Telemetry | LOW | Backfill completed, queues nominal | Telemetry Core |
| Azure Event Bus | HIGH | West Europe routing degraded - under mitigation | Messaging |
Strategic initiatives
Programs underway to harden posture and reduce incident load.
Regional failover simulation
Coordinated run across EU clusters to validate blue/green policies and manual recovery playbooks.
Telemetry contract hardening
Backfill missing resource tags and enforce schema validation for ingestion pipelines.
ICM triage assistant rollout
Pilot LLM-based copilot to summarize incident threads and recommend fastest mitigations.
Action queue
Operational follow-ups tracked for the current rotation.
Publish weekly health note
Aggregate outage narratives and risk callouts for executive briefing.
Confirm DR drill staffing
Verify responders signed up for the November 3 west coast failover exercise.
Close Sev1 #4189231
Ensure closure checklist complete, postmortem scheduled, and customer comms sent.