Post-Mortem
Post-Mortem
Project: {{PROJECT_NAME}} Version: {{VERSION}} Date: {{DATE}} Author: {{AUTHOR}} Status: Draft | In Review | Approved Reviewers: {{REVIEWERS}}
Document History
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | {{DATE}} | {{AUTHOR}} | Initial draft |
Blameless Culture Statement
This post-mortem is conducted in a blameless spirit. Our goal is to understand how and why the incident occurred — not to assign fault to individuals. People make the best decisions they can with the information and tools available at the time. When things go wrong, we look for systemic improvements that make the right action easier and the wrong action harder for everyone.
1. Incident Reference & Metadata
| Field | Value |
|---|---|
| Incident ID | INC-{{YYYY}}-{{SEQ}} |
| Severity | P{{SEVERITY}} |
| Incident Report | INC-{{YYYY}}-{{SEQ}} |
| Post-Mortem Facilitator | {{FACILITATOR}} |
| Post-Mortem Date | {{PM_DATE}} |
| Attendees | {{ATTENDEES}} |
| Status | Draft / In Review / Final |
2. Executive Summary
{{EXECUTIVE_SUMMARY}}
Example: "A database index was dropped during a migration on {{DATE}}, causing query performance to degrade by 50× under load. This resulted in a 1h 23min degraded service period affecting {{USERS}} users. We have restored the index, added migration validation tooling, and created safeguards to prevent similar incidents."
3. Impact Summary
| Metric | Value |
|---|---|
| Total duration | {{DURATION}} (detected at {{DETECTED}}, resolved at {{RESOLVED}}) |
| Users affected | {{USER_COUNT}} ({{USER_PERCENT}}% of user base) |
| Requests affected | {{REQUEST_COUNT}} ({{REQUEST_PERCENT}}% error rate during incident) |
| Estimated revenue impact | ${{REVENUE}} |
| SLA breach | {{SLA_BREACH}} |
| SLA credits owed | ${{CREDITS}} |
4. Detailed Timeline
timeline
title Incident Timeline
{{TIME_1}} : {{EVENT_1}}
{{TIME_2}} : {{EVENT_2}}
{{TIME_3}} : {{EVENT_3}}
{{TIME_4}} : {{EVENT_4}}
{{TIME_5}} : {{EVENT_5}}
| Time | Event | MTTD/MTTR Marker |
|---|---|---|
| {{T1}} | {{EVENT}} | ← Incident start |
| {{T2}} | {{EVENT}} | |
| {{T3}} | {{EVENT}} | ← Detection (MTTD = T3 - T1) |
| {{T4}} | {{EVENT}} | |
| {{T5}} | {{EVENT}} | |
| {{T6}} | {{EVENT}} | |
| {{T7}} | {{EVENT}} | |
| {{T8}} | {{EVENT}} | ← Resolved (MTTR = T8 - T1) |
MTTD (Mean Time to Detect): {{MTTD}} minutes MTTR (Mean Time to Resolve): {{MTTR}} minutes
5. Root Cause Analysis
5.1 5 Whys Analysis
| Why # | Question | Answer |
|---|---|---|
| Why 1 | Why did users experience {{SYMPTOM}}? | {{WHY_1}} |
| Why 2 | Why did {{WHY_1_ANSWER}} happen? | {{WHY_2}} |
| Why 3 | Why did {{WHY_2_ANSWER}} happen? | {{WHY_3}} |
| Why 4 | Why did {{WHY_3_ANSWER}} happen? | {{WHY_4}} |
| Why 5 | Why did {{WHY_4_ANSWER}} happen? | {{WHY_5}} |
Root cause: {{ROOT_CAUSE}}
5.2 Contributing Factors
| Factor | Type | Action Required |
|---|---|---|
| {{FACTOR_1}} | Technical / Process / Human | Yes / No |
| {{FACTOR_2}} | Technical / Process / Human | Yes / No |
| {{FACTOR_3}} | Technical / Process / Human | Yes / No |
5.3 Trigger Event
The specific trigger for this incident: {{TRIGGER}}
6. What Went Well
- {{CATEGORY_1}}: {{DESCRIPTION}}
- {{CATEGORY_2}}: {{DESCRIPTION}}
- {{CATEGORY_3}}: {{DESCRIPTION}}
7. What Went Wrong
- {{CATEGORY_1}}: {{DESCRIPTION}}
- {{CATEGORY_2}}: {{DESCRIPTION}}
- {{CATEGORY_3}}: {{DESCRIPTION}}
8. Where We Got Lucky
- {{LUCKY_1}}
- {{LUCKY_2}}
- {{LUCKY_3}}
9. Action Items
Short-Term Fixes (This Sprint)
| # | Action | Owner | Due | Priority | Ticket |
|---|---|---|---|---|---|
| 1 | {{SHORT_TERM_1}} | {{OWNER}} | {{DATE}} | Critical | {{TICKET}} |
| 2 | {{SHORT_TERM_2}} | {{OWNER}} | {{DATE}} | High | {{TICKET}} |
| 3 | {{SHORT_TERM_3}} | {{OWNER}} | {{DATE}} | Medium | {{TICKET}} |
Long-Term Improvements (Next Quarter)
| # | Action | Owner | Due | Priority | Ticket |
|---|---|---|---|---|---|
| 1 | {{LONG_TERM_1}} | {{OWNER}} | {{DATE}} | High | {{TICKET}} |
| 2 | {{LONG_TERM_2}} | {{OWNER}} | {{DATE}} | Medium | {{TICKET}} |
Process Changes
| # | Change | Owner | Implementation Date |
|---|---|---|---|
| 1 | {{PROCESS_1}} | {{OWNER}} | {{DATE}} |
| 2 | {{PROCESS_2}} | {{OWNER}} | {{DATE}} |
10. Follow-Up Tracking
Follow-up review date: {{FOLLOWUP_DATE}} (4 weeks after incident) Follow-up owner: {{FOLLOWUP_OWNER}}
| Action Item | Expected Completion | Verified Complete | Effective |
|---|---|---|---|
| {{ACTION_1}} | {{DATE}} | Yes / No | Yes / No / TBD |
| {{ACTION_2}} | {{DATE}} |
11. Recurrence Prevention
Before this incident: {{BEFORE_STATE}}
After implementing action items: {{AFTER_STATE}}
Confidence in prevention: {{CONFIDENCE}} / 10 Residual risk: {{RESIDUAL_RISK}}
12. Review & Sign-Off
Post-mortem presented at: {{MEETING}} on {{MEETING_DATE}} Meeting recording: {{RECORDING_LINK}} Meeting notes: {{NOTES_LINK}}
Related Documents
Approval
| Role | Name | Date | Signature |
|---|---|---|---|
| Author | |||
| Reviewer | |||
| Approver |