AU-5: Response to Audit Logging Process Failures

9 min readLast verified: February 2026By Isaac Silverman

AU-5 requires you to detect audit logging process failures and alert the right personnel (or roles) fast enough that you can restore logging before you lose material security evidence. Operationally, that means instrumenting log pipelines end-to-end, defining “failure,” routing alerts to an owned on-call path, and retaining proof that alerts fired and were handled. ¹

Key takeaways:

Define what counts as an “audit logging process failure” across endpoints, apps, collectors, and SIEM, then monitor for it.
Configure alert routing with clear ownership, escalation, and a response runbook tied to incident management.
Retain evidence that failures trigger alerts within your defined time window and that responders restore logging and document impact.

AU-5: response to audit logging process failures requirement is a control you either operationalize tightly or you discover during an incident that you have blind spots right when you need evidence most. The control text is short, but the scope is not: “audit logging” includes the generation of logs on systems, the transport and collection of those logs, and the downstream processing that makes them searchable and usable for detection and investigations. A failure anywhere in that chain can create an evidentiary gap.

For a Compliance Officer, CCO, or GRC lead, the fastest way to make AU-5 real is to treat it like an availability and integrity requirement for your security telemetry. You need (1) a clear failure definition, (2) monitoring that catches it, (3) alerting that reaches accountable humans quickly, and (4) a repeatable response that restores logging and records what was lost. Your assessment success will hinge less on the tooling brand and more on whether you can show a working, owned process with recurring evidence.

This page gives requirement-level guidance you can hand to Security Operations, Platform, and IT, then track to closure in your control repository.

Regulatory text

Requirement excerpt: “Alert {{ insert: param, au-05_odp.01 }} within {{ insert: param, au-05_odp.02 }} in the event of an audit logging process failure; and” ¹

Plain-English interpretation

You must:

Detect when audit logging breaks, and
Send an alert to a defined audience (a person, team, or role) within a defined time window that you set for your environment. ¹

AU-5 is assessed as a “prove it” control. Auditors will look for objective evidence that failures are detected, alerts are generated, and people respond. The placeholders in the text mean you must choose and document:

Who gets alerted (au-05_odp.01), and
How quickly they must be alerted after failure (au-05_odp.02). ¹

Who it applies to (entity and operational context)

AU-5 commonly applies to:

Federal information systems and
Contractor systems handling federal data (for example, systems supporting federal programs, regulated contracts, or environments aligned to NIST 800-53 baselines). ¹

Operationally, it applies anywhere you rely on audit logs for:

Security detection and response (SIEM/SOAR use cases)
Investigations and forensics
Compliance evidence (access logs, admin actions, data access trails)
Third-party hosted services where logging is delivered via APIs or export jobs

If your logging is partially owned by a third party (cloud provider, managed security provider, SaaS platform), AU-5 still lands on you as the system owner: you must confirm the failure modes and alerting paths are covered contractually and technically.

What you actually need to do (step-by-step)

Step 1: Define “audit logging process failure” in your environment

Create a failure taxonomy that is specific enough to monitor. Include at least:

Log generation failure: auditd disabled, Windows event logging stopped, app audit module off, agent not running.
Collection failure: collector/forwarder down, API export job failing, message queue backlog beyond threshold.
Transport failure: TLS/auth failures to log endpoints, DNS/network path breaks, dropped events.
Parsing/indexing failure: SIEM ingestion pipeline errors, schema changes, license/volume caps causing drops.
Time integrity failure: clock drift or timestamp corruption that makes logs unreliable.

Deliverable: a short “AU-5 failure definition” standard that engineering teams can map to monitors.

Step 2: Set alert audience and timing targets (the ODPs)

Document:

Alert recipients (roles): SOC on-call, platform on-call, security engineering on-call, or a shared on-call rotation; include escalation to a manager if unacknowledged.
Alert timing window: your defined maximum time from failure detection to alert delivery.

Write this as testable statements:

“If endpoint audit logging stops on production servers, alert SOC on-call and platform on-call.”
“If SIEM ingestion drops to zero for a covered source, page SOC on-call.”

This is where many programs fail: they never choose the ODP values, so the requirement stays ambiguous.

Step 3: Build end-to-end monitoring for logging health

Implement monitoring at multiple layers so a single blind spot does not defeat AU-5.

A practical minimum set:

Source heartbeat: agents/forwarders emit a heartbeat event; alert on missing heartbeat.
Volume anomalies: alert on “sudden drop to near-zero” log rates per source category.
Pipeline health: collector queue depth, error rates, API export job status.
SIEM ingestion confirmation: synthetic “canary” event generated at the source and verified in the SIEM search/index within an expected time.

Keep monitors scoped to your “covered systems” list (see Step 4) so you can defend completeness.

Step 4: Define the “covered systems” inventory for AU-5

You need a bounded scope to manage:

Tier 0/1 identity systems
Production workloads handling regulated data
Security infrastructure (EDR, IAM, firewalls)
Admin consoles and privileged access paths

Tie this to your asset inventory or CMDB. If your inventory is weak, start with a prioritized list: “systems in scope for audit logging and AU-5 monitoring.”

Step 5: Route alerts through incident management with ownership

AU-5 alerts should behave like operational incidents:

Create an alert-to-incident integration (PagerDuty/Opsgenie/ServiceNow/Jira, or equivalent).
Require acknowledgement and track time-to-acknowledge.
Define escalation rules for missed acknowledgements.

Add a simple severity rubric:

High: logging failure on regulated production systems or core security telemetry.
Medium: partial degradation, single collector down with redundancy intact.
Low: non-production or low-sensitivity systems.

Step 6: Create a response runbook (what responders do)

Your runbook must cover:

Triage: confirm scope (which systems, which log sources, when it started).
Containment: restore logging quickly (restart service, redeploy agent, fix credentials, expand storage, roll back parser changes).
Impact assessment: identify time window of missing logs; note whether alternate logs exist (e.g., cloud control plane logs).
Recovery validation: confirm canary/heartbeat is visible end-to-end again.
Post-incident action: open a problem ticket for root cause and preventive actions.

Make the runbook explicit about who owns what: SOC verifies visibility; Platform/IT restores services; GRC tracks evidence and updates control operation notes.

Step 7: Test the control and keep recurring evidence

Do at least two types of tests:

Tabletop: walk through a collector outage and show decision points and comms.
Functional test: intentionally stop a non-production logging agent (or disable a test log export) and verify alerting, ticketing, and restoration steps.

If you use Daydream as your control system of record, map AU-5 to a named owner, link the runbook, and schedule recurring evidence requests so you are not assembling artifacts during an audit.

Required evidence and artifacts to retain

Keep evidence that demonstrates design and operating effectiveness:

Control design artifacts

AU-5 control narrative: failure definition, in-scope systems, alert recipients, timing target ¹
Alerting architecture diagram (high-level is fine)
Runbook / SOP for logging failure response
RACI or on-call ownership documentation

Operating evidence (recurring)

Screenshots or exports of alert rules (heartbeat missing, ingestion drop, pipeline errors)
Incident/ticket records showing:
- alert fired
- acknowledgement
- actions taken to restore logging
- closure notes describing impact window
SIEM query evidence that canary events appear as expected
Change records if fixes involved config changes

Evidence hygiene tip: store artifacts with consistent naming (“AU-5 Logging Failure Alert Test - [system] - [date]”) and link them directly to the control record.

Common exam/audit questions and hangups

Expect these questions and prepare short, evidence-backed answers:

“What do you consider an audit logging process failure?”
Have your taxonomy and examples ready.
“Who is alerted, and how fast?”
Show the documented recipients and timing target, plus alert routing configuration. ¹
“How do you know logs are arriving in the SIEM, not just leaving the host?”
Show end-to-end confirmation (canary event, ingestion dashboards).
“Show me an example from the last period.”
Be ready with a real incident or a controlled test with tickets and timestamps.
“What systems are covered?”
Provide the in-scope inventory and the rationale for exclusions.

Frequent implementation mistakes (and how to avoid them)

Mistake	Why it fails AU-5	Fix
Monitoring only the SIEM’s health, not sources	You miss endpoint/app logging being disabled	Add source heartbeat and agent status monitoring
Alerts go to a shared inbox/Slack channel only	No accountable acknowledgement path	Page an on-call role; enforce escalation
“Failure” is undefined	Auditors treat coverage as arbitrary	Document failure conditions and thresholds
Tests are informal	No repeatable evidence	Run a planned functional test and retain artifacts
Too many false positives	Teams mute alerts	Tune thresholds; separate high/medium; implement deduplication

Risk implications (why operators treat this as high-stakes)

When audit logging fails, you risk:

Gaps in detection and delayed incident response
Inability to reconstruct events for investigations
Audit findings for control operation failure, especially if you cannot show alerting and response

From a governance perspective, AU-5 is a forcing function: if you cannot reliably detect logging failures, you also cannot credibly claim your downstream detection controls are effective.

Practical execution plan (30/60/90)

You asked for speed. Use these phases as a deployment cadence; keep the dates aligned to your operating rhythm.

First 30 days (baseline and ownership)

Assign AU-5 control owner (Security Ops or Security Engineering) and a GRC coordinator.
Define failure taxonomy and in-scope systems list.
Document alert recipients (roles) and timing target. ¹
Inventory current monitors and gaps across endpoints, collectors, and SIEM ingestion.

By 60 days (instrumentation and runbooks)

Implement missing heartbeat/absence monitoring for top-tier systems.
Configure alert routing to on-call with escalation and ticket creation.
Publish the response runbook and train responders (short, scenario-based session).
Run a tabletop exercise and capture evidence.

By 90 days (prove operating effectiveness)

Run a functional test in a controlled environment and retain timestamps, alerts, and ticket artifacts.
Add a recurring control check: periodic review of alert rules, on-call targets, and ingestion dashboards.
Close the loop with problem management: top root causes, preventive changes, and documentation updates.
In Daydream, attach the evidence set to AU-5 and schedule the next evidence pull so audits are a retrieval task, not a scramble.

Frequently Asked Questions

What counts as an “audit logging process failure” for AU-5?

Any condition that stops, materially degrades, or corrupts audit log generation, transport, collection, or ingestion into your analysis platform counts. Define the failure modes you will detect and alert on, then treat them as in-scope AU-5 events.

Do I have to alert a person, or is a dashboard alert enough?

AU-5’s text requires you to “alert” a defined audience within a defined time window, so a passive dashboard alone is usually hard to defend. Route alerts to an owned on-call role and keep acknowledgement evidence. ¹

We outsource logging to a third party. Are we still responsible?

Yes, you still need assurance that failures are detected and escalated to your team. Cover it in contracts/SLA language and validate with technical checks (for example, canary events or ingestion confirmations).

How do we handle planned maintenance that stops logging?

Treat it as a controlled exception: document the maintenance window, the expected logging impact, compensating monitoring (if any), and restoration validation. Keep the change record and a post-maintenance confirmation that logging resumed.

What evidence is strongest for auditors?

A real incident ticket with timestamps from alert to acknowledgement to restoration is the cleanest. A controlled test that produces the same artifacts is the next best option, as long as it is repeatable and scoped to covered systems.

Our SIEM drops events during volume spikes. Does that trigger AU-5?

If event loss or ingestion failure creates an audit logging gap, treat it as an AU-5-relevant failure condition. Monitor for ingestion errors/quotas and define what “material degradation” means for your environment.

NIST SP 800-53 Rev. 5 OSCAL JSON

Frequently Asked Questions

What counts as an “audit logging process failure” for AU-5?

Do I have to alert a person, or is a dashboard alert enough?

We outsource logging to a third party. Are we still responsible?

How do we handle planned maintenance that stops logging?

What evidence is strongest for auditors?

Our SIEM drops events during volume spikes. Does that trigger AU-5?

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream

Regulatory text

Plain-English interpretation

Who it applies to (entity and operational context)

What you actually need to do (step-by-step)

Step 1: Define “audit logging process failure” in your environment

Step 2: Set alert audience and timing targets (the ODPs)

Step 3: Build end-to-end monitoring for logging health

Step 4: Define the “covered systems” inventory for AU-5

Step 5: Route alerts through incident management with ownership

Step 6: Create a response runbook (what responders do)

Step 7: Test the control and keep recurring evidence

Required evidence and artifacts to retain

Control design artifacts

Operating evidence (recurring)

Common exam/audit questions and hangups

Frequent implementation mistakes (and how to avoid them)

Risk implications (why operators treat this as high-stakes)

Practical execution plan (30/60/90)

First 30 days (baseline and ownership)

By 60 days (instrumentation and runbooks)

By 90 days (prove operating effectiveness)

Frequently Asked Questions

What counts as an “audit logging process failure” for AU-5?

Do I have to alert a person, or is a dashboard alert enough?

We outsource logging to a third party. Are we still responsible?

How do we handle planned maintenance that stops logging?

What evidence is strongest for auditors?

Our SIEM drops events during volume spikes. Does that trigger AU-5?

Footnotes

Frequently Asked Questions

Related Resources

Operationalize this requirement