A postmortem is the short, honest write-up you produce after an incident: what happened, why, and what you will change so it is less likely to happen again. Its only job is learning. If it turns into a search for someone to blame, people stop being candid, and the write-up stops being useful.
Why blameless
Blameless does not mean no accountability. It means you assume everyone acted reasonably given what they knew at the time, and you focus on the systems and conditions that let the failure happen rather than the person who happened to be holding the keyboard. People who fear blame hide detail, and hidden detail is exactly what you need to fix the real cause.
The structure
- Summary: two or three sentences a busy reader can take away.
- Impact: who was affected, how badly, and for how long.
- Timeline: what happened and when, from first signal to resolution.
- Root cause: the underlying condition, not just the trigger.
- What went well and what did not: detection, response, communication.
- Action items: concrete changes, each with an owner and a due date.
A template you can copy
- Summary: On [date], [service] was [impact] for [duration]. The cause was [root cause]. We have [key fix] and are tracking [follow-ups].
- Impact: [who/what was affected], [scope], [start] to [end].
- Timeline: [time] first alert. [time] acknowledged. [time] cause identified. [time] fix applied. [time] resolved.
- Root cause: [the underlying condition that made this possible].
- What went well: [e.g. detection was fast]. What did not: [e.g. the alert was noisy / comms were late].
- Action items: [change] owner [name] by [date].
Make the action items real
The part that prevents a repeat is the action items, and they only count if they are specific, owned, and dated. Aim for a few changes you will actually make, not a long list you will not. Revisit them; an action item with no follow-through is just a note. Publishing the postmortem, or at least the summary, on your status page or to affected customers is often the right call: it shows you took it seriously.