AU-5: Response to Audit Logging Process Failures
AU-5 requires you to detect audit logging process failures and alert the right personnel (or roles) fast enough that you can restore logging before you lose material security evidence. Operationally, that means instrumenting log pipelines end-to-end, defining “failure,” routing alerts to an owned on-call path, and retaining proof that alerts fired and were handled. 1
Key takeaways:
- Define what counts as an “audit logging process failure” across endpoints, apps, collectors, and SIEM, then monitor for it.
- Configure alert routing with clear ownership, escalation, and a response runbook tied to incident management.
- Retain evidence that failures trigger alerts within your defined time window and that responders restore logging and document impact.
AU-5: response to audit logging process failures requirement is a control you either operationalize tightly or you discover during an incident that you have blind spots right when you need evidence most. The control text is short, but the scope is not: “audit logging” includes the generation of logs on systems, the transport and collection of those logs, and the downstream processing that makes them searchable and usable for detection and investigations. A failure anywhere in that chain can create an evidentiary gap.
For a Compliance Officer, CCO, or GRC lead, the fastest way to make AU-5 real is to treat it like an availability and integrity requirement for your security telemetry. You need (1) a clear failure definition, (2) monitoring that catches it, (3) alerting that reaches accountable humans quickly, and (4) a repeatable response that restores logging and records what was lost. Your assessment success will hinge less on the tooling brand and more on whether you can show a working, owned process with recurring evidence.
This page gives requirement-level guidance you can hand to Security Operations, Platform, and IT, then track to closure in your control repository.
Regulatory text
Requirement excerpt: “Alert {{ insert: param, au-05_odp.01 }} within {{ insert: param, au-05_odp.02 }} in the event of an audit logging process failure; and” 1
Plain-English interpretation
You must:
- Detect when audit logging breaks, and
- Send an alert to a defined audience (a person, team, or role) within a defined time window that you set for your environment. 1
AU-5 is assessed as a “prove it” control. Auditors will look for objective evidence that failures are detected, alerts are generated, and people respond. The placeholders in the text mean you must choose and document:
- Who gets alerted (au-05_odp.01), and
- How quickly they must be alerted after failure (au-05_odp.02). 1
Who it applies to (entity and operational context)
AU-5 commonly applies to:
- Federal information systems and
- Contractor systems handling federal data (for example, systems supporting federal programs, regulated contracts, or environments aligned to NIST 800-53 baselines). 1
Operationally, it applies anywhere you rely on audit logs for:
- Security detection and response (SIEM/SOAR use cases)
- Investigations and forensics
- Compliance evidence (access logs, admin actions, data access trails)
- Third-party hosted services where logging is delivered via APIs or export jobs
If your logging is partially owned by a third party (cloud provider, managed security provider, SaaS platform), AU-5 still lands on you as the system owner: you must confirm the failure modes and alerting paths are covered contractually and technically.
What you actually need to do (step-by-step)
Step 1: Define “audit logging process failure” in your environment
Create a failure taxonomy that is specific enough to monitor. Include at least:
- Log generation failure: auditd disabled, Windows event logging stopped, app audit module off, agent not running.
- Collection failure: collector/forwarder down, API export job failing, message queue backlog beyond threshold.
- Transport failure: TLS/auth failures to log endpoints, DNS/network path breaks, dropped events.
- Parsing/indexing failure: SIEM ingestion pipeline errors, schema changes, license/volume caps causing drops.
- Time integrity failure: clock drift or timestamp corruption that makes logs unreliable.
Deliverable: a short “AU-5 failure definition” standard that engineering teams can map to monitors.
Step 2: Set alert audience and timing targets (the ODPs)
Document:
- Alert recipients (roles): SOC on-call, platform on-call, security engineering on-call, or a shared on-call rotation; include escalation to a manager if unacknowledged.
- Alert timing window: your defined maximum time from failure detection to alert delivery.
Write this as testable statements:
- “If endpoint audit logging stops on production servers, alert SOC on-call and platform on-call.”
- “If SIEM ingestion drops to zero for a covered source, page SOC on-call.”
This is where many programs fail: they never choose the ODP values, so the requirement stays ambiguous.
Step 3: Build end-to-end monitoring for logging health
Implement monitoring at multiple layers so a single blind spot does not defeat AU-5.
A practical minimum set:
- Source heartbeat: agents/forwarders emit a heartbeat event; alert on missing heartbeat.
- Volume anomalies: alert on “sudden drop to near-zero” log rates per source category.
- Pipeline health: collector queue depth, error rates, API export job status.
- SIEM ingestion confirmation: synthetic “canary” event generated at the source and verified in the SIEM search/index within an expected time.
Keep monitors scoped to your “covered systems” list (see Step 4) so you can defend completeness.
Step 4: Define the “covered systems” inventory for AU-5
You need a bounded scope to manage:
- Tier 0/1 identity systems
- Production workloads handling regulated data
- Security infrastructure (EDR, IAM, firewalls)
- Admin consoles and privileged access paths
Tie this to your asset inventory or CMDB. If your inventory is weak, start with a prioritized list: “systems in scope for audit logging and AU-5 monitoring.”
Step 5: Route alerts through incident management with ownership
AU-5 alerts should behave like operational incidents:
- Create an alert-to-incident integration (PagerDuty/Opsgenie/ServiceNow/Jira, or equivalent).
- Require acknowledgement and track time-to-acknowledge.
- Define escalation rules for missed acknowledgements.
Add a simple severity rubric:
- High: logging failure on regulated production systems or core security telemetry.
- Medium: partial degradation, single collector down with redundancy intact.
- Low: non-production or low-sensitivity systems.
Step 6: Create a response runbook (what responders do)
Your runbook must cover:
- Triage: confirm scope (which systems, which log sources, when it started).
- Containment: restore logging quickly (restart service, redeploy agent, fix credentials, expand storage, roll back parser changes).
- Impact assessment: identify time window of missing logs; note whether alternate logs exist (e.g., cloud control plane logs).
- Recovery validation: confirm canary/heartbeat is visible end-to-end again.
- Post-incident action: open a problem ticket for root cause and preventive actions.
Make the runbook explicit about who owns what: SOC verifies visibility; Platform/IT restores services; GRC tracks evidence and updates control operation notes.
Step 7: Test the control and keep recurring evidence
Do at least two types of tests:
- Tabletop: walk through a collector outage and show decision points and comms.
- Functional test: intentionally stop a non-production logging agent (or disable a test log export) and verify alerting, ticketing, and restoration steps.
If you use Daydream as your control system of record, map AU-5 to a named owner, link the runbook, and schedule recurring evidence requests so you are not assembling artifacts during an audit.
Required evidence and artifacts to retain
Keep evidence that demonstrates design and operating effectiveness:
Control design artifacts
- AU-5 control narrative: failure definition, in-scope systems, alert recipients, timing target 1
- Alerting architecture diagram (high-level is fine)
- Runbook / SOP for logging failure response
- RACI or on-call ownership documentation
Operating evidence (recurring)
- Screenshots or exports of alert rules (heartbeat missing, ingestion drop, pipeline errors)
- Incident/ticket records showing:
- alert fired
- acknowledgement
- actions taken to restore logging
- closure notes describing impact window
- SIEM query evidence that canary events appear as expected
- Change records if fixes involved config changes
Evidence hygiene tip: store artifacts with consistent naming (“AU-5 Logging Failure Alert Test - [system] - [date]”) and link them directly to the control record.
Common exam/audit questions and hangups
Expect these questions and prepare short, evidence-backed answers:
-
“What do you consider an audit logging process failure?”
Have your taxonomy and examples ready. -
“Who is alerted, and how fast?”
Show the documented recipients and timing target, plus alert routing configuration. 1 -
“How do you know logs are arriving in the SIEM, not just leaving the host?”
Show end-to-end confirmation (canary event, ingestion dashboards). -
“Show me an example from the last period.”
Be ready with a real incident or a controlled test with tickets and timestamps. -
“What systems are covered?”
Provide the in-scope inventory and the rationale for exclusions.
Frequent implementation mistakes (and how to avoid them)
| Mistake | Why it fails AU-5 | Fix |
|---|---|---|
| Monitoring only the SIEM’s health, not sources | You miss endpoint/app logging being disabled | Add source heartbeat and agent status monitoring |
| Alerts go to a shared inbox/Slack channel only | No accountable acknowledgement path | Page an on-call role; enforce escalation |
| “Failure” is undefined | Auditors treat coverage as arbitrary | Document failure conditions and thresholds |
| Tests are informal | No repeatable evidence | Run a planned functional test and retain artifacts |
| Too many false positives | Teams mute alerts | Tune thresholds; separate high/medium; implement deduplication |
Risk implications (why operators treat this as high-stakes)
When audit logging fails, you risk:
- Gaps in detection and delayed incident response
- Inability to reconstruct events for investigations
- Audit findings for control operation failure, especially if you cannot show alerting and response
From a governance perspective, AU-5 is a forcing function: if you cannot reliably detect logging failures, you also cannot credibly claim your downstream detection controls are effective.
Practical execution plan (30/60/90)
You asked for speed. Use these phases as a deployment cadence; keep the dates aligned to your operating rhythm.
First 30 days (baseline and ownership)
- Assign AU-5 control owner (Security Ops or Security Engineering) and a GRC coordinator.
- Define failure taxonomy and in-scope systems list.
- Document alert recipients (roles) and timing target. 1
- Inventory current monitors and gaps across endpoints, collectors, and SIEM ingestion.
By 60 days (instrumentation and runbooks)
- Implement missing heartbeat/absence monitoring for top-tier systems.
- Configure alert routing to on-call with escalation and ticket creation.
- Publish the response runbook and train responders (short, scenario-based session).
- Run a tabletop exercise and capture evidence.
By 90 days (prove operating effectiveness)
- Run a functional test in a controlled environment and retain timestamps, alerts, and ticket artifacts.
- Add a recurring control check: periodic review of alert rules, on-call targets, and ingestion dashboards.
- Close the loop with problem management: top root causes, preventive changes, and documentation updates.
- In Daydream, attach the evidence set to AU-5 and schedule the next evidence pull so audits are a retrieval task, not a scramble.
Frequently Asked Questions
What counts as an “audit logging process failure” for AU-5?
Any condition that stops, materially degrades, or corrupts audit log generation, transport, collection, or ingestion into your analysis platform counts. Define the failure modes you will detect and alert on, then treat them as in-scope AU-5 events.
Do I have to alert a person, or is a dashboard alert enough?
AU-5’s text requires you to “alert” a defined audience within a defined time window, so a passive dashboard alone is usually hard to defend. Route alerts to an owned on-call role and keep acknowledgement evidence. 1
We outsource logging to a third party. Are we still responsible?
Yes, you still need assurance that failures are detected and escalated to your team. Cover it in contracts/SLA language and validate with technical checks (for example, canary events or ingestion confirmations).
How do we handle planned maintenance that stops logging?
Treat it as a controlled exception: document the maintenance window, the expected logging impact, compensating monitoring (if any), and restoration validation. Keep the change record and a post-maintenance confirmation that logging resumed.
What evidence is strongest for auditors?
A real incident ticket with timestamps from alert to acknowledgement to restoration is the cleanest. A controlled test that produces the same artifacts is the next best option, as long as it is repeatable and scoped to covered systems.
Our SIEM drops events during volume spikes. Does that trigger AU-5?
If event loss or ingestion failure creates an audit logging gap, treat it as an AU-5-relevant failure condition. Monitor for ingestion errors/quotas and define what “material degradation” means for your environment.
Footnotes
Frequently Asked Questions
What counts as an “audit logging process failure” for AU-5?
Any condition that stops, materially degrades, or corrupts audit log generation, transport, collection, or ingestion into your analysis platform counts. Define the failure modes you will detect and alert on, then treat them as in-scope AU-5 events.
Do I have to alert a person, or is a dashboard alert enough?
AU-5’s text requires you to “alert” a defined audience within a defined time window, so a passive dashboard alone is usually hard to defend. Route alerts to an owned on-call role and keep acknowledgement evidence. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
We outsource logging to a third party. Are we still responsible?
Yes, you still need assurance that failures are detected and escalated to your team. Cover it in contracts/SLA language and validate with technical checks (for example, canary events or ingestion confirmations).
How do we handle planned maintenance that stops logging?
Treat it as a controlled exception: document the maintenance window, the expected logging impact, compensating monitoring (if any), and restoration validation. Keep the change record and a post-maintenance confirmation that logging resumed.
What evidence is strongest for auditors?
A real incident ticket with timestamps from alert to acknowledgement to restoration is the cleanest. A controlled test that produces the same artifacts is the next best option, as long as it is repeatable and scoped to covered systems.
Our SIEM drops events during volume spikes. Does that trigger AU-5?
If event loss or ingestion failure creates an audit logging gap, treat it as an AU-5-relevant failure condition. Monitor for ingestion errors/quotas and define what “material degradation” means for your environment.
Operationalize this requirement
Map requirement text to controls, owners, evidence, and review workflows inside Daydream.
See Daydream