AU-5(2): Real-time Alerts
AU-5(2) requires you to generate real-time (your defined time window) alerts to named responders when audit logging fails or becomes unreliable, so you can restore logging before you lose forensic visibility. To operationalize it, define failure events, set an alert SLA, route alerts to on-call roles, and retain evidence that alerts fired and were handled.
Key takeaways:
- Define “real-time” in minutes and make it measurable with monitoring and ticketing.
- Alert on logging pipeline failures (agent, transport, storage, parsing, time sync) and on “logging disabled” conditions.
- Keep evidence that alerts triggered, reached the right responders, and resulted in timely remediation.
AU-5(2): Real-time Alerts is a requirement about preserving audit evidence. If audit logging fails quietly, you lose the timeline you need for incident response, insider investigations, and compliance reporting. Auditors typically treat logging failures as high-impact because they undermine the integrity of your monitoring program and your ability to prove what happened.
This control enhancement is deceptively simple: “send an alert in real time.” The operational work is deciding what counts as an audit failure event, where to detect it (endpoint, collector, SIEM, cloud service), who must be notified, and how you prove the alerting works continuously, not just on paper.
For a Compliance Officer, CCO, or GRC lead, the fastest path is to turn AU-5(2) into an implementable control card: objective, owner, event triggers, routing rules, response expectations, and an evidence bundle. Then partner with Security Operations and platform owners to wire up monitoring on the log pipeline’s critical control points. The outcome you want is simple to explain in an exam: “If logging breaks, we know quickly, and we can show you the alerts and the fixes.”
Regulatory text
NIST requirement (AU-5(2)): “Provide an alert within [real-time period] to [personnel, roles, and/or locations] when the following audit failure events occur: [audit logging failure events requiring real-time alerts].” 1
Operator translation (what you must do):
- Define the real-time period (a specific number of minutes) for audit-failure alerts.
- Name the recipients (roles, teams, escalation points, or locations such as an on-call rotation).
- Enumerate audit failure events that trigger alerts (logging disabled, log forwarding stops, SIEM ingestion failures, storage full, agent crash, time sync issues affecting audit integrity, etc.).
- Implement detection and routing so alerts actually fire within your defined window.
- Retain proof that the alerts are generated, delivered, and acted on.
Plain-English interpretation
AU-5(2) is a reliability requirement for your audit logging capability. Your logging system is a control. If it fails, that failure must become visible immediately to the people who can fix it. “Real-time alerts” means:
- A defined maximum detection and notification window, and
- An operational pathway (on-call, paging, ticket creation) that drives restoration of logging, plus
- Evidence that the pathway works under normal conditions and failure conditions.
A common audit finding is “logs exist,” but nobody detects when a subset stops arriving. AU-5(2) targets that exact blind spot.
Who it applies to (entity and operational context)
Applies to:
Operational contexts where auditors expect AU-5(2) to be implemented:
- Centralized logging architectures (SIEM, log analytics platforms, managed detection services).
- Cloud-native logging (cloud audit trails, platform logs, control plane logs) routed to a central store.
- Endpoint and server logging via agents/forwarders.
- High-value systems where loss of audit logs would block investigations (identity systems, privileged access tooling, key business applications).
Control owners (typical):
- Primary: Security Operations / Detection Engineering (alert logic, routing, runbooks).
- Shared: Platform/SRE (log pipeline reliability), IAM (audit sources), GRC (requirements and evidence).
What you actually need to do (step-by-step)
Step 1: Create the AU-5(2) control card (your “operator contract”)
Build a one-page control card that states:
- Objective: Detect audit logging failure quickly and notify responders.
- Scope: Which environments and log sources are in-scope (prod first, then others).
- Real-time period: The maximum allowed time from failure to alert delivery.
- Recipients: On-call role(s), escalation path, and a backup route if paging fails.
- Trigger events: The exact failure events that must generate alerts.
- Response requirements: Expected triage steps and restoration steps.
- Exceptions: Any approved gaps and compensating monitoring.
This directly addresses a frequent risk: teams can’t show ownership, operating cadence, or evidence. 1
Step 2: Define “audit failure events” in a way engineering can instrument
Use categories that map to where failures occur. Example event list you can adapt:
- Source stopped logging: audit daemon/service stopped; audit policy disabled; endpoint agent not running.
- Forwarding failure: agent queue backed up; forwarder can’t reach collector; authentication failures to collector.
- Collector/ingestion failure: parsing errors spike; ingestion pipeline down; message broker outage.
- Storage failure: log bucket permissions changed; WORM/immutability misconfig; storage full or retention misapplied.
- Integrity failure: system time drift beyond tolerance; duplicate sequence gaps; unexpected log volume drop to near-zero for a critical source.
Write each as: Signal → Threshold/condition → Expected alert. Avoid vague phrasing like “logging issues.”
Step 3: Implement monitoring at each control point (don’t rely on one signal)
Practical pattern: monitor both heartbeat and volume.
- Heartbeat: “did we receive any logs from this source in the last window?”
- Volume: “did we receive roughly expected log volume for this source and log type?”
Also monitor the alerting system itself (paging integration failures, webhook errors). AU-5(2) is undermined if the alert channel is brittle.
Step 4: Route alerts to roles, not people, and enforce acknowledgement
Define:
- Primary on-call (SOC or platform on-call).
- Secondary escalation (security engineering manager, SRE manager).
- Ticket creation rules (every alert creates an incident record).
This is where many programs fail: alerts exist in a dashboard, but nobody is required to acknowledge them. Make acknowledgement part of the evidence trail.
Step 5: Test failure modes and retain results
Run controlled tests (in a non-production or approved window) such as:
- Stop an agent.
- Block egress to the collector.
- Disable a cloud audit trail export.
Confirm the end-to-end: failure occurs → detection → alert delivered → ticket created → responder action recorded → logging restored. Then store the test record as audit evidence.
Step 6: Run recurring control health checks and track remediation
Add a recurring review that checks:
- Are all in-scope sources still reporting?
- Did alerts fire for any failures, and were they resolved?
- Were any alert rules disabled or modified, and was that approved?
Track issues to validated closure with due dates. This demonstrates sustained operation, not a one-time build. 1
Required evidence and artifacts to retain
Auditors generally want to see design, operation, and proof of response. Maintain an “evidence bundle” per period (monthly or quarterly, aligned to your program), including:
- Control card / runbook for AU-5(2) (owner, scope, triggers, routing, escalation, exceptions).
- Alert rule inventory (export or screenshots) showing: name, logic, enabled status, notification targets.
- Notification configuration evidence (paging/on-call routing, distribution lists, integrations).
- Sample alert artifacts: alert payloads, SIEM events, paging notifications (redact sensitive data).
- Incident/ticket records tied to audit-failure alerts showing triage and resolution notes.
- Test records for failure-mode tests, including date, tester, steps, and results.
- Change management records for any modifications to logging, pipelines, or alert thresholds that could affect detection.
A strong evidence bundle is minimal but complete: inputs (rules/config), outputs (alerts/tickets), and oversight (reviews/tests). 1
Common exam/audit questions and hangups
“What is your defined ‘real-time period’?”
Have a written value and show how monitoring meets it (alert timestamps).
“Which audit failure events generate alerts?”
Produce your trigger list and map it to log pipeline components.
“Who receives the alerts, and how do you ensure coverage after hours?”
Show role-based routing, on-call schedule integration, and escalation rules.
“How do you know the alerting still works?”
Show recurring tests and recent real alerts with closure evidence.
“What happens if the SIEM is down?”
Have a secondary detection path for pipeline health (platform monitoring) or an alternate alert route for critical failures.
Frequent implementation mistakes (and how to avoid them)
-
Dashboard-only “alerts.”
Fix: require push notifications (paging/ticket) to a staffed role. -
Only monitoring the SIEM, not the upstream pipeline.
Fix: add monitors at source, forwarder, collector, and storage. -
Undefined “real-time.”
Fix: pick a concrete window and measure it with timestamps across alert generation and delivery. -
No ownership split between Security and Platform teams.
Fix: document RACI in the control card; define who fixes what class of failures. -
Evidence gaps: can’t show alerts were handled.
Fix: link alerts to tickets automatically; require closure notes and restoration confirmation.
Enforcement context and risk implications
No public enforcement cases were provided in the source catalog for this requirement, so this page does not cite specific actions. Practically, AU-5(2) failures translate into:
- Investigation risk: you may not be able to reconstruct events during an incident.
- Control integrity risk: audit logging controls become “paper controls” if failure is undetected.
- Customer and assessor risk: third-party due diligence often probes whether logging failures are detected and escalated, especially for regulated data environments.
Practical 30/60/90-day execution plan
First 30 days (stand up the control design and minimum viable alerting)
- Create the AU-5(2) control card: scope, owners, real-time period, recipients, failure events.
- Inventory critical log sources and the pipeline path for each (source → forwarder → collector → SIEM/storage).
- Implement alerts for the highest-risk failure modes: “no logs received” for critical sources, ingestion pipeline down, storage permission changes.
- Create automatic ticketing for every AU-5(2) alert and assign to on-call roles.
Days 31–60 (expand coverage and prove operation)
- Add monitoring for upstream failure points (agent health, queue depth, forwarding auth failures).
- Add integrity checks relevant to your environment (time sync drift monitors, parser failure rate).
- Run controlled failure-mode tests and store results in the evidence repository.
- Hold a cross-functional review (SOC + SRE + GRC) and document any approved exceptions.
Days 61–90 (stabilize, harden, and audit-proof)
- Tune thresholds to reduce noise without losing true failures; document tuning decisions.
- Implement a recurring control health check and remediation tracker with validated closure.
- Build an “audit ready” evidence packet: last period alert samples, ticket closures, test results, and current rule inventory.
- If you use Daydream for GRC workflows, link the AU-5(2) control card to owners, evidence tasks, and recurring health checks so evidence collection is consistent across cycles.
Frequently Asked Questions
What counts as an “audit failure event” for AU-5(2)?
Any condition that stops required audit logs from being generated, forwarded, ingested, stored, or trusted. Define events across the full logging pipeline (source, transport, collector, storage) so a single point of failure does not hide the problem.
How do we define “real-time” without overpromising?
Pick a concrete alert window your teams can meet consistently and measure it with timestamps (event detection time and notification time). Document the value in the control card and keep evidence that alerts are delivered within that window.
Do alerts need to go to individuals by name?
The requirement allows alerts to personnel, roles, and/or locations. In practice, route to roles (SOC on-call, SRE on-call) so coverage survives turnover and after-hours staffing.
Our SIEM already alerts on “ingestion stopped.” Is that sufficient?
It’s a start, but many failures happen upstream (agent stopped, forwarding blocked, permissions changed on log export). Add upstream monitors and keep at least one detection path that does not depend on the SIEM being fully healthy.
What evidence will an auditor expect for AU-5(2)?
They will ask for alert rule configuration, notification routing, samples of alerts, and linked incident/ticket records showing investigation and restoration. Include test results from controlled logging failure simulations.
How do we handle third-party-managed systems where we don’t control the logging stack?
Contract for audit-log health monitoring and real-time notification requirements in the third party’s obligations, then verify with evidence (alerts, incident reports, service health metrics). Track any gaps as exceptions with compensating monitoring and documented acceptance.
Footnotes
Frequently Asked Questions
What counts as an “audit failure event” for AU-5(2)?
Any condition that stops required audit logs from being generated, forwarded, ingested, stored, or trusted. Define events across the full logging pipeline (source, transport, collector, storage) so a single point of failure does not hide the problem.
How do we define “real-time” without overpromising?
Pick a concrete alert window your teams can meet consistently and measure it with timestamps (event detection time and notification time). Document the value in the control card and keep evidence that alerts are delivered within that window.
Do alerts need to go to individuals by name?
The requirement allows alerts to personnel, roles, and/or locations. In practice, route to roles (SOC on-call, SRE on-call) so coverage survives turnover and after-hours staffing.
Our SIEM already alerts on “ingestion stopped.” Is that sufficient?
It’s a start, but many failures happen upstream (agent stopped, forwarding blocked, permissions changed on log export). Add upstream monitors and keep at least one detection path that does not depend on the SIEM being fully healthy.
What evidence will an auditor expect for AU-5(2)?
They will ask for alert rule configuration, notification routing, samples of alerts, and linked incident/ticket records showing investigation and restoration. Include test results from controlled logging failure simulations.
How do we handle third-party-managed systems where we don’t control the logging stack?
Contract for audit-log health monitoring and real-time notification requirements in the third party’s obligations, then verify with evidence (alerts, incident reports, service health metrics). Track any gaps as exceptions with compensating monitoring and documented acceptance.
Authoritative Sources
Operationalize this requirement
Map requirement text to controls, owners, evidence, and review workflows inside Daydream.
See Daydream