Blog
Practical, honest writing on running a reliable service: guides, product comparisons, and straight answers to common questions about status pages, uptime monitoring, incidents, on-call, and the providers you depend on.
- Guide
What is a status page, and how to set one up
A status page is the public page where you tell customers how your service is doing. Here is what goes on one, when to make it public, and how to set one up in minutes.
Read → - Guide
Uptime monitoring, explained
What uptime monitoring actually checks, the difference between HTTP, keyword, heartbeat and TCP checks, and how to avoid the false-positive alerts that train people to ignore the pager.
Read → - Guide
Monitoring your dependencies: showing third-party outages on your status page
When a provider you rely on goes down, your customers feel it as your outage. Here is how to monitor your third-party dependencies and post a clear, honest heads-up on your status page.
Read → - Guide
Incident communication: writing status updates customers trust
What to say during an outage, how often to say it, and templates for each stage of an incident. Clear, honest updates cost you far less trust than the outage itself.
Read → - Guide
On-call and escalation basics
How on-call rotations, escalation ladders and acknowledgements fit together, so the right person is reached when something breaks, without burning out the team.
Read → - Guide
How to write a postmortem (blameless, with a template)
A postmortem turns an outage into something your team learns from instead of repeats. Here is the structure, why blameless matters, and a template you can copy.
Read → - Comparison
Best status page tools in 2026
An honest comparison of the leading status page tools in 2026, by category and use case, including which have built-in monitoring, on-call and third-party dependency tracking. Written by Sentivel, with Sentivel included on the same terms.
Read → - Question
How do I monitor a cron job?
Use a heartbeat (also called a dead-man's switch). Instead of something reaching out to your job, the job pings a unique URL each time it finishes successfully. If a ping does not arrive within the window you expect, the monitor marks it down and alerts you. That is the right tool for cron jobs, workers and backups, which have no public URL for an outside check to hit.
Read → - Question
What does 99.9% uptime mean (and what is a good SLA)?
99.9% uptime (three nines) allows about 8 hours 46 minutes of downtime per year, or roughly 43 minutes per month. Each extra nine cuts that by about ten times. What counts as good depends on what you are running: 99.9% is a common baseline for most SaaS, while infrastructure people depend on heavily often targets 99.95% or higher.
Read → - Question
How often should uptime checks run?
Every one to two minutes is the common choice and a good default. More frequent checks detect an outage sooner but add load and cost; less frequent checks save resources but widen the window before you know. Whatever interval you pick, pair it with a confirmation threshold so a single failed check never pages anyone.
Read → - Question
What is the difference between a status page and uptime monitoring?
Uptime monitoring is how you detect a problem: scheduled checks that tell you when a service is failing. A status page is how you communicate it: the public page where customers see current health and incident updates. They work best together, where the monitoring drives the status shown on the page so it stays honest without anyone toggling it by hand.
Read → - Question
How do I tell customers a third-party outage is affecting me?
Map the affected part of your product to the provider it depends on, then post an advisory that attributes the cause to the provider and scopes the impact, for example: we are aware of an issue with a provider we rely on for payments; some checkouts may be affected until it resolves. Name the dependency, say what is affected, promise an update, and avoid unverifiable promises.
Read → - Question
What is an escalation policy?
An escalation policy is the ordered ladder an alert climbs until someone acknowledges it: page the on-call person, wait, then page them again or page a secondary, then widen to a group or a manager. Each step has a timeout, so a missed alert is backstopped automatically instead of an incident going unanswered. Acknowledging stops the ladder from climbing further.
Read → - Question
What is the difference between HTTP and heartbeat monitoring?
HTTP monitoring reaches out: it requests a URL on a schedule and judges the response by status code, timing and optionally a keyword. Heartbeat monitoring is the inverse: your job pings the monitor on each run, and the absence of an expected ping is the signal that something is wrong. Use HTTP for anything with a public URL, and heartbeats for cron jobs and workers that have none.
Read → - Question
Should my status page be public or private?
Public is the norm for customer-facing products: it answers the are-you-down question for everyone and builds trust with a visible track record. A private page (password or login gated) suits internal tools or pre-launch products that are not ready to publish uptime. You can start private and switch to public later.
Read →