AU-5: Response to Audit Logging Process Failures

AU-5 requires you to detect audit logging process failures and alert the right personnel within a defined time window, using a documented, testable process. To operationalize it quickly: define what “logging failure” means in your environment, wire automated alerts from log sources and pipelines to on-call responders, and retain evidence showing alerts fired and were handled. 1

Key takeaways:

  • Define “audit logging process failure” concretely 1 and assign a single accountable owner.
  • Implement automated alerting with a stated time-to-alert, on-call routing, and a runbook tied to incident/problem management.
  • Retain a minimum evidence bundle: alert rules, routing, test results, and tickets proving detection and response.

AU-5 is one of those controls that looks small on paper and becomes a real operational differentiator during an audit or incident. If audit logs silently stop flowing, you lose visibility into authentication events, admin actions, and security-relevant changes right when you need them most. Examiners and assessors rarely accept “we would notice” as a control; they expect defined triggers, named responders, and proof that alerts fire and get handled.

The operational goal is simple: if any part of your audit logging chain fails (generation, collection, forwarding, parsing, storage, retention, or access), your team gets notified fast enough to reduce the window of missing logs and to preserve investigative integrity. AU-5 is also a reliability requirement. It forces you to treat logging as production infrastructure with monitoring, paging, and operational discipline.

This page translates AU-5 into an implementable checklist a CCO, security lead, or GRC owner can run with: scoping, alert design, ownership, runbooks, testing, and the evidence artifacts that keep audits moving. All requirement language is based on NIST SP 800-53 Rev. 5. 2

Regulatory text

Requirement excerpt (AU-5): “Alert [personnel or roles] within [time period] in the event of an audit logging process failure; and” 1

What the operator must do:

  1. Pick the roles who must be notified (by job function, not a person’s name). Typical roles include Security Operations (SOC), Platform/SRE on-call, and the system owner for the affected service.
  2. Define the time period for alerting (your internal standard). Auditors will expect it to be explicit and applied consistently.
  3. Implement detection and alerting for audit logging failures across the full logging path, not just at the SIEM.
  4. Prove it works through test evidence and operational tickets showing alerts were triggered and handled.

NIST intentionally leaves the bracketed fields open because “right people” and “fast enough” depend on mission impact and system categorization. Your job is to make those placeholders real and defensible. 3

Plain-English interpretation (what AU-5 means in practice)

If audit logs stop being created, forwarded, ingested, or stored, you must know quickly and have an accountable team respond. AU-5 is less about writing a policy and more about building a monitoring-and-response loop for logging.

A practical interpretation that passes audits:

  • You have defined failure conditions (examples below).
  • You have automated alerts that route to an on-call responder.
  • You have a runbook that restores logging and documents the gap.
  • You retain evidence that the process runs consistently.

Who AU-5 applies to (entity + operational context)

AU-5 is commonly required where NIST SP 800-53 is the governing control set, including:

  • Federal information systems
  • Contractor systems handling federal data 1

Operationally, AU-5 applies anywhere you rely on audit logging to support:

  • Incident detection and investigation
  • Insider threat monitoring
  • Compliance reporting and forensic readiness
  • Access governance and privileged activity oversight

It also applies to third parties in your environment when they provide critical parts of the logging path (managed SIEM, managed detection, cloud logging backends). If you outsource, you still need clear responsibilities and evidence that failures trigger alerts to your team or theirs, with contractual clarity.

Define scope: what counts as an “audit logging process failure”

Write these down as your control’s “trigger events.” Auditors will look for precision.

Common failure modes to include:

  • Log generation failure: the system stops emitting audit events (agent stopped, audit daemon down, Windows event forwarding broken).
  • Collection/forwarding failure: collector offline, blocked network path, TLS cert expired, queue backlog, dropped events.
  • Parsing/normalization failure: pipeline change causes events to be rejected or misrouted.
  • Storage/retention failure: SIEM index full, object storage permissions changed, retention policy misconfigured.
  • Integrity/immutability breaks: audit log store becomes writable by unauthorized roles (often treated as a failure because it breaks trust).
  • Coverage loss: a critical log source disappears from inventory (for example, a new production cluster has no forwarding configured).

If you only monitor “SIEM is up,” you will miss silent data loss. AU-5 expects you to notice missing or failed logging, not just a down dashboard.

What you actually need to do (step-by-step)

Step 1: Create a control card (one-page operating spec)

Create a requirement-level control card that includes:

  • Objective: detect audit logging process failures and alert responders within your defined time window.
  • Owner: one accountable role (often Head of Security Operations, Detection Engineering lead, or Platform SRE manager).
  • In-scope systems: list or link to your log source inventory.
  • Trigger events: the failure modes you defined.
  • Response steps: runbook pointer and ticketing requirements.
  • Exceptions: approved cases where logging is intentionally disabled (rare, time-bound, and documented).

This “control card” prevents the common audit failure: everyone agrees AU-5 matters, but nobody can show who runs it or what “good” looks like.

Step 2: Map the logging chain and assign monitoring points

Create a simple logging dataflow: Source → Agent → Forwarder/Collector → Pipeline → SIEM/Storage → Retention/Access

For each stage, define a monitor:

  • Heartbeat/agent health
  • Event throughput (rate drops, sustained zero)
  • Queue depth / backpressure
  • Ingestion errors and rejected events
  • Storage health and permission drift

You want at least one detection that catches silent log loss (for example, “no auth events from system X for a sustained period”) rather than only infrastructure uptime.

Step 3: Implement alerting and routing (roles + time period)

Operational requirements to specify:

  • Alert destinations: on-call paging tool, email distribution list for secondary notification, ticket auto-creation.
  • Severity rules: which sources trigger paging vs ticket-only (use a tiering model tied to system criticality).
  • Responder roles: SOC on-call, SRE on-call, system owner.
  • Escalation: what happens if the first responder doesn’t acknowledge.

Keep it role-based (SOC On-Call, Platform On-Call), not person-based, so the control survives organizational changes.

Step 4: Write the runbook for “logging failure”

Your runbook should be short and executable:

  • Confirm the failure (what dashboards/queries prove the gap).
  • Restore logging (restart agent, fix credentials/certs, unblock firewall, roll back pipeline change).
  • Bound the impact window (start/end time of missing logs).
  • Preserve evidence (export relevant system logs, pipeline errors, and change records).
  • Open/relate an incident or problem record.
  • Document compensating monitoring if logs are unavailable (for example, temporarily increase endpoint telemetry or cloud control-plane logging).

Step 5: Test the control and schedule health checks

Auditors will ask, “How do you know this works?” Have a repeatable test:

  • Simulate an agent stop in a non-prod environment.
  • Simulate pipeline rejection (schema change) in a test pipeline.
  • Validate alert fires, routes correctly, and produces a ticket.

Then run recurring control health checks:

  • Alert rule review (are monitors still enabled and scoped correctly?)
  • Coverage review (new systems added, decommissioned systems removed)
  • Evidence sampling (recent alerts and response tickets)

If you manage this in Daydream, treat AU-5 as a living control: store the control card, link monitors to owners, and track health checks and remediation items to closure so audits don’t become a scramble.

Required evidence and artifacts to retain

Aim for an evidence bundle that answers: what is the requirement, who owns it, how it runs, and proof it operated.

Minimum evidence set:

  1. AU-5 control card (owner, roles to alert, time-to-alert, trigger events, scope, exceptions).
  2. Logging architecture / dataflow diagram (can be lightweight).
  3. Alert configuration evidence: screenshots or exports of alert rules, thresholds, routing targets, escalation policy.
  4. On-call documentation: current on-call schedule reference and role definitions (not personal phone lists).
  5. Runbook for audit logging failures.
  6. Test results: dated test record showing alert fired and response steps were followed.
  7. Operational records: tickets/incidents from real events (with timestamps, actions taken, closure notes).
  8. Exception approvals: time-bound approvals when logging was intentionally disabled, plus compensating controls.

Retention period is not specified in the AU-5 excerpt you provided; align retention to your broader logging and incident record retention requirements rather than inventing a number.

Common exam/audit questions and hangups

Expect these:

  • “Define audit logging process failure.” If your answer is vague, you will get a finding. Bring your trigger list.
  • “Who is alerted, and how fast?” Auditors want roles and a concrete time window.
  • “Show me evidence an alert fired and was handled.” Provide tickets and alert history, not just a policy.
  • “How do you detect silent failures?” If you only monitor SIEM uptime, expect pushback.
  • “How do you cover cloud-native logs and SaaS?” Show coverage for cloud control-plane logs and critical SaaS audit logs if they are in scope.

Frequent implementation mistakes (and how to avoid them)

  1. Mistake: Monitoring the SIEM, not the data.
    Fix: Add “absence of expected events” detections per critical source or service tier.

  2. Mistake: Alerts route to an individual.
    Fix: Route to roles and on-call rotations; keep a backup distribution list.

  3. Mistake: No documented time-to-alert.
    Fix: Set an internal standard and apply it consistently across environments.

  4. Mistake: No evidence of testing.
    Fix: Run a repeatable negative test and retain the record.

  5. Mistake: Exceptions become permanent.
    Fix: Require expiry dates and compensating monitoring documented in the exception.

Risk implications (why AU-5 gets scrutiny)

Audit logging failures create blind spots that can:

  • Delay detection of unauthorized access
  • Prevent reconstruction of events during incident response
  • Undermine trust in audit records if storage integrity fails

From a governance perspective, the risk is also procedural: if you cannot prove alerts and response are operating, assessors treat logging as uncontrolled even if the tooling exists.

Practical 30/60/90-day execution plan

First 30 days (stabilize and define)

  • Name the AU-5 owner and responders (role-based).
  • Publish the AU-5 control card with your alert time window.
  • Inventory critical log sources and map the logging chain for each.
  • Identify top failure modes and pick the first set of monitors.

Days 31–60 (implement and connect operations)

  • Configure alerts for the highest-risk sources (identity, privileged access, production workloads, boundary devices).
  • Wire alerts to on-call and ticketing with an escalation path.
  • Publish a runbook and train responders.
  • Run at least one controlled test and capture evidence.

Days 61–90 (prove operation and harden)

  • Expand coverage to remaining in-scope sources and cloud/SaaS audit logs where applicable.
  • Add silent-failure detection (missing events) for high-impact sources.
  • Start recurring health checks and track remediation to closure.
  • Package the evidence bundle for audits and customer diligence.

Frequently Asked Questions

What counts as an “audit logging process failure” for AU-5?

Any condition where required audit events are not generated, not delivered, not ingested, or not retained as expected. Define failures across the full chain (source, forwarding, pipeline, storage) and document them as AU-5 trigger events. 1

Do we need paging, or is email enough?

AU-5 requires alerting within your defined time window; the control doesn’t prescribe paging. In practice, use paging for high-impact systems where delayed awareness creates unacceptable visibility gaps, and email or tickets for lower tiers.

How do we show evidence that alerts are working without waiting for a real outage?

Run a controlled test in a safe environment (stop an agent, break forwarding, or force pipeline rejection) and retain the alert record plus the response ticket. Auditors accept test evidence when it’s repeatable and dated.

What if a third party runs our SIEM or logging pipeline?

Treat AU-5 as a shared responsibility: contract for failure detection and notification, define who is alerted (your team, theirs, or both), and obtain evidence such as alert history and incident records. You still need to demonstrate the requirement is met for your environment.

How do we handle “silent log loss” where systems appear healthy?

Add detections based on the absence of expected events (for example, no authentication events from a critical source) and coverage checks that flag missing sources. This is the most common gap assessors focus on during technical walkthroughs.

Can we meet AU-5 with a policy and manual checks?

A policy helps, but AU-5 is hard to defend without automated detection plus operational records (alerts, tickets, and tests). Manual checks tend to fail on timeliness and evidence quality during audits.

Footnotes

  1. NIST SP 800-53 Rev. 5 OSCAL JSON

  2. NIST SP 800-53 Rev. 5

  3. NIST SP 800-53 Rev. 5 DOI

Frequently Asked Questions

What counts as an “audit logging process failure” for AU-5?

Any condition where required audit events are not generated, not delivered, not ingested, or not retained as expected. Define failures across the full chain (source, forwarding, pipeline, storage) and document them as AU-5 trigger events. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

Do we need paging, or is email enough?

AU-5 requires alerting within your defined time window; the control doesn’t prescribe paging. In practice, use paging for high-impact systems where delayed awareness creates unacceptable visibility gaps, and email or tickets for lower tiers.

How do we show evidence that alerts are working without waiting for a real outage?

Run a controlled test in a safe environment (stop an agent, break forwarding, or force pipeline rejection) and retain the alert record plus the response ticket. Auditors accept test evidence when it’s repeatable and dated.

What if a third party runs our SIEM or logging pipeline?

Treat AU-5 as a shared responsibility: contract for failure detection and notification, define who is alerted (your team, theirs, or both), and obtain evidence such as alert history and incident records. You still need to demonstrate the requirement is met for your environment.

How do we handle “silent log loss” where systems appear healthy?

Add detections based on the absence of expected events (for example, no authentication events from a critical source) and coverage checks that flag missing sources. This is the most common gap assessors focus on during technical walkthroughs.

Can we meet AU-5 with a policy and manual checks?

A policy helps, but AU-5 is hard to defend without automated detection plus operational records (alerts, tickets, and tests). Manual checks tend to fail on timeliness and evidence quality during audits.

Authoritative Sources

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream
AU-5: Response to Audit Logging Process Failures | Daydream