The Monitoring page is where you watch for trouble and track recovery. It is organized into three tabs — Incidents, Recoveries, and Rules — with a summary strip across the top.

The recovery strip #

The strip at the top of the page answers three questions at a glance:

Open incidents — failures that still need attention.
AI recoveries today — how many failures AI handled, and the success rate.
MTTR — mean time to recover, with a trend sparkline.

Incidents tab #

The Incidents tab lists automation failures. Use the filter chips to narrow the view:

Open — failures not yet resolved.
AI working — a remediation is in progress.
Resolved — failures that recovered (by rerun, by AI, or manually).
All

Click a row to expand it. An expanded incident shows:

The failing automation and the error classification (timeout, query conflict, data volume, syntax, permission, missing object, or unknown).
The error message from SFMC.
If AI ran, the diagnosis and a line-level diff of the proposed SQL or configuration change.
Actions: Review & promote, Re-run, Decline, or Pause.

Recoveries tab #

The Recoveries tab is a chronological timeline of every recovery action, grouped by day. Filter by AI actions, Auto-reruns, or Resolved, and export the view to CSV for reporting.

Each entry records the original automation, the attempt number, the outcome, and how long recovery took.

Rules tab #

The Rules tab shows the monitoring rules protecting this Business Unit as a grid of cards. Each card shows the rule type (automation or folder), whether AI recovery is on, and stats — times triggered in the last 7 days, success rate, and average MTTR. Add a rule with the dashed Add rule card.

How a failure becomes an incident #

An automation errors in SFMC.
The event reaches Nimbus and the CloudPage enriches it with per-step detail.
Nimbus checks the failure against your monitoring rules.
If a rule matches, a rerun attempt is created and the incident appears here.
The incident updates live as the rerun — and, if needed, AI remediation — progresses.

If no rule matches, the failure is still recorded on the Events page; it simply does not trigger automatic recovery.

Manual reruns #