Guides
Practical, honest guides on running a reliable service: status pages, uptime monitoring, incident communication, on-call, and keeping customers in the loop when a provider you depend on has an issue.
What is a status page, and how to set one up
A status page is the public page where you tell customers how your service is doing. Here is what goes on one, when to make it public, and how to set one up in minutes.
Read the guide →Uptime monitoring, explained
What uptime monitoring actually checks, the difference between HTTP, keyword, heartbeat and TCP checks, and how to avoid the false-positive alerts that train people to ignore the pager.
Read the guide →Monitoring your dependencies: showing third-party outages on your status page
When a provider you rely on goes down, your customers feel it as your outage. Here is how to monitor your third-party dependencies and post a clear, honest heads-up on your status page.
Read the guide →Incident communication: writing status updates customers trust
What to say during an outage, how often to say it, and templates for each stage of an incident. Clear, honest updates cost you far less trust than the outage itself.
Read the guide →On-call and escalation basics
How on-call rotations, escalation ladders and acknowledgements fit together, so the right person is reached when something breaks, without burning out the team.
Read the guide →How to write a postmortem (blameless, with a template)
A postmortem turns an outage into something your team learns from instead of repeats. Here is the structure, why blameless matters, and a template you can copy.
Read the guide →