Skip to main content
When things go wrong, having a clear process makes all the difference. Phare’s incident management system takes care of the entire incident lifecycle automatically, from the moment a problem is detected to when everything’s back to normal.

Programmatic incidents

When a monitor detects trouble (respecting your confirmation settings and any keyword/SSL checks), Phare springs into action by creating an incident. Each incident is automatically linked to the affected monitor and includes detailed information about what went wrong. Right away, your team gets notified through your configured alert rules, no manual intervention required. Programmatic incidents track the entire lifecycle from detection through resolution, giving you a complete picture of what happened during the outage.

Manual incidents

Sometimes you need to create an incident even when your monitors haven’t detected a problem. Perhaps a customer reported an issue with a service you’re not actively monitoring, or you want to proactively communicate about planned maintenance. Phare makes it easy to create incidents manually:
  1. Select the affected monitor(s)
  2. Set the impact level
  3. Add a descriptive title
  4. Include details about the situation
Manual incidents follow the same lifecycle as programmatic ones, but you’ll need to resolve them manually when the issue is fixed. This gives you full control over the incident’s duration and communication timeline.

Incident recovery

When your service recovers (again, respecting your recovery confirmation settings), Phare automatically resolves the incident and lets your team know through your alert rules.

Impact

Not all incidents are created equal. That’s why Phare lets your team classify incidents by their impact level:
  • Unknown: Still investigating, not sure how bad it is
  • Operational: False alarm—everything’s working as expected
  • Maintenance: Planned downtime for heavy-duty updates
  • Degraded performance: Things are slower / working less well than usual
  • Partial outage: Some users are affected or certain features are down
  • Major outage: Everything is on fire, users can’t access the service or are severely impacted
Impact levels help your team prioritize responses internally and keep users informed through status pages. If left as “unknown,” status pages will simply show the monitor as “down” without additional details.

Event timeline

Incident event timeline
The event timeline gives you the full picture of what happened during an incident. You can see every region status, which alert rules fired, and follow the incident’s progression from start to finish. This chronological view helps with post-incident analysis and improves future response procedures.

Incident comments

Team members can add private comments to share insights or document solutions, helping everyone resolve issues faster. These internal notes are perfect for:
  • Documenting troubleshooting steps
  • Sharing relevant links to logs or dashboards
  • Coordinating response efforts between team members
  • Recording root cause analysis findings
The rich text editor supports Markdown, making it easy to include formatted text, code snippets, or links to relevant resources. Comments are only visible to your team members and never appear on public status pages. Each comment is timestamped and attributed to its author, creating a clear record of who did what during incident response.

Incident updates

Publish an incident update
While comments help your team collaborate privately, incident updates are designed for external communication. These updates allow you to keep your users informed about:
  • What’s happening with the incident
  • What you’re doing to fix it
  • When they can expect resolution
  • Workarounds they can use in the meantime
Each update allow you to communicate progress as you mitigate the issue, and automatically published to your connected status pages, helping maintain transparency with your users during outages. Consistent, clear communication during incidents builds trust with your users, they’ll appreciate knowing you’re on top of the situation even when things aren’t working perfectly.

Smart incident merging

Smart incident merging reduces noise by grouping multiple, similar failures into a single incident that evolves over time. Instead of opening a new incident for every affected monitor, Phare can merge them together, keep one timeline, and notify your team more intelligently.

Configuration

Smart incident merging can be configured per-project, allowing you to tailor the behavior to different services or teams. You can also set a preferred time window for merging, balancing responsiveness with noise reduction.
Smart Incident merging project settings
Start with a moderate window of 10 to 30 minutes. If you often experiences bursty or cascading failures, a longer window will reduce duplicate incidents further.

How it works

Only incidents belonging to the same project can be merged together. When a new incident is created, even before confirmation, Phare checks for any existing open incidents in the project that share the same reported issue. If it finds one within the configured time window, it merges the new incident into the existing one. This approach, even tough simple, provide fast and quality results for most incident scenarios.

Dedicated alert rule events

Two alert rule events are available to help you react precisely to incident propagation and partial recoveries: Paired with notification thread on the Email, Discord, or Slack integrations, you can keep notifications focused: one creation, threaded follow‑ups for expansions and partial recoveries, and a final recovery when everything is back to normal.