Response to Audit Logging Process Failures
To meet the FedRAMP/NIST AU-5 “Response to Audit Logging Process Failures” requirement, you must detect audit logging failures, alert defined responders within a defined time window, and execute defined corrective actions that restore logging and preserve forensic integrity. Operationalize this with monitored health checks, an incident-style runbook, ticketed remediation, and retained evidence of detection and response. 1
Key takeaways:
- Define what counts as a “logging process failure,” who gets paged, and how fast you must page them. 1
- Instrument end-to-end logging health (source → transport → SIEM) and generate actionable alerts, not dashboards. 1
- Treat failures as security-relevant incidents: triage, contain, restore, backfill where feasible, and retain evidence for assessors. 1
Audit logs are only valuable if they are continuously produced, delivered, and retained. AU-5 exists because logging commonly fails quietly: agents stop, queues fill, certificates expire, permissions drift, SIEM ingestion breaks, or storage hits limits. During that gap, you lose visibility and, in a FedRAMP context, you may also lose the operating evidence you need for authorization, assessor testing, and ongoing continuous monitoring.
This requirement is deliberately specific: define the personnel/roles to notify, define the notification timeframe, and define the additional actions to take after a failure is detected. Assessors look for two things: (1) your control is designed with clear, testable parameters and (2) it operates, proven by real alerts and follow-through records.
The goal of this page is to help you implement AU-5 quickly in a way that stands up to a 3PAO assessment and internal audit: concrete detection patterns, a response workflow, documentation expectations, and the artifacts to retain. You can implement AU-5 with common SIEM/monitoring tools; the differentiator is your end-to-end coverage and your evidence trail. 1
Regulatory text
Requirement (AU-5): “Alert organization-defined personnel or roles within an organization-defined time period in the event of an audit logging process failure; and take organization-defined additional actions.” 1
Operator interpretation (plain English)
You must do three concrete things, and each must be “organization-defined” (written down, approved, and testable):
- Detect audit logging process failures (not just security events).
- Notify the right responders within your defined time window.
- Act to restore logging and reduce risk during the gap, using predefined additional actions. 1
In practice, auditors will push you on definitions:
- What is a “failure” versus a “degradation”?
- How do you know logging is working end-to-end (not only that a service is “up”)?
- What happens if failures occur outside business hours?
- Where is the evidence that alerts fired and tickets were resolved?
Who it applies to (entity and operational context)
This applies to:
- Cloud Service Providers (CSPs) operating systems within a FedRAMP authorization boundary.
- Federal Agencies responsible for implementing/maintaining the authorized baseline in their environment and overseeing the CSP. 1
Operationally, AU-5 touches multiple teams and layers:
- Platform/SRE/Infrastructure: log agents, collectors, forwarders, queues, storage, certificates.
- Security Operations: SIEM ingestion, correlation rules, alert routing, incident handling.
- GRC/Compliance: control definition, SSP language, evidence packaging for 3PAO and continuous monitoring.
- Third parties: managed SIEM/SOC providers, log pipeline providers, and any SaaS producing security logs.
Scope it to all systems required to generate, process, store, or transmit audit logs for in-boundary assets, including identity, network, endpoint, database, and cloud control-plane logs.
What you actually need to do (step-by-step)
1) Define “audit logging process failure” in a way you can measure
Write a short standard that classifies failures into categories you can alert on:
- Source failure: audit daemon stopped, agent down, misconfiguration, audit policy disabled.
- Transport failure: forwarder error, TLS/cert failure, dropped events, blocked egress.
- Ingestion failure: SIEM connector broken, parsing errors, rate limiting, API failures.
- Storage/retention failure: disk full, index blocked, retention policy misapplied.
- Integrity failure: unexpected log deletion, tamper indicators, unauthorized config change.
Make this definition consistent across your SSP/control narrative and your runbooks. 1
2) Set “organization-defined” notifications: roles, channels, and timeframe
Document, at minimum:
- Primary on-call role (e.g., Security Operations on-call or SRE on-call)
- Secondary escalation (e.g., Incident Commander, CISO delegate)
- GRC notification for evidence tracking and potential reporting implications
- Notification channels (paging system, ticketing, email for follow-up)
AU-5 requires you to define a time period. Pick one that matches your operating model and can be proven from alert timestamps and pager/ticket records. 1
3) Implement end-to-end logging health monitoring (not single-point checks)
Build detection that answers: “Are required logs arriving in the central repository on time?”
Common patterns:
- Heartbeat events: each critical log source emits a periodic “I am alive” audit event; alert when missing.
- Ingestion lag: alert on pipeline lag beyond your defined threshold (queue depth, ingestion delay).
- Volume anomaly checks: sudden drop to near-zero audit events from a normally noisy source.
- Collector health: service down, CPU/memory saturation, disk nearing capacity, index write blocks.
- Credential/cert expiry monitoring: connectors often fail on expired secrets/certs.
Map each pattern to your defined failure categories and ensure coverage for the entire pipeline (source → forwarder → collector → SIEM/storage). 1
4) Define “additional actions” and make them executable via a runbook
Your additional actions should reduce exposure during the logging gap and restore logging safely. Typical actions to define:
- Immediate triage: confirm scope (which sources), start time, and whether logs are lost or queued.
- Containment controls during the gap: increase monitoring on adjacent telemetry (IDS, auth logs, cloud activity), restrict high-risk admin changes, or require change approval for privileged actions.
- Restore logging: restart agents/services, roll back config, rotate expired secrets, expand storage, fix network routes, re-enable audit policy.
- Backfill and reconciliation: recover buffered logs, re-run exports, or document irrecoverable gaps with rationale.
- Integrity checks: confirm audit settings are correct, validate no unauthorized changes to audit configuration.
- Post-incident documentation: root cause, corrective actions, and preventive controls.
Write these into a single AU-5 response playbook with decision points (e.g., “If logs were unavailable for security-relevant systems, open a security incident ticket and escalate”). 1
5) Operationalize with ticketing, ownership, and closure criteria
Treat every logging failure alert as a tracked work item:
- Auto-create a ticket from the monitoring alert.
- Assign an owner with an SLA aligned to your “time period” definition.
- Require closure notes: root cause, impacted systems, start/end times, and restoration verification steps.
- Attach evidence (screenshots, logs, change records).
Closure criteria should include: “Validated that logs are flowing end-to-end again” and “Validated audit configuration remains enabled.” 1
6) Prove the control works: recurring tests and evidence packaging
Do controlled tests:
- Stop a log agent in a non-production environment or a test source.
- Break a connector (rotate a secret) in a controlled way.
- Fill a test index to force an ingestion error.
Then collect the artifacts that show: detection → alert → response → restoration. This becomes assessor-ready evidence and reduces scramble during continuous monitoring submissions. 1
Required evidence and artifacts to retain
Keep evidence that demonstrates both control design and control operation:
Design artifacts (what you planned)
- AU-5 procedure/runbook (definitions, roles, timeframe, escalation, additional actions). 1
- Logging architecture diagram (sources, forwarders, SIEM/central store, failover paths).
- Monitoring/alert catalog for logging health (rules, thresholds, routing).
- RACI or on-call schedule mapping roles to responsibilities.
Operating artifacts (what actually happened)
- Alert records (timestamps, impacted source, severity).
- Pager/escalation records (who acknowledged, when).
- Tickets with remediation notes and verification steps.
- Change records linked to fixes (config changes, secret rotations).
- Post-incident reviews for material failures (root cause and prevention).
- Samples of “logging resumed” validation (SIEM search results, ingestion status, heartbeat receipt).
FedRAMP assessors typically accept a curated evidence package: a small set of representative incidents plus a report/export showing logging-failure alerts over a period, with closures. 2
Common exam/audit questions and hangups
Use these as a readiness checklist:
-
“What is your defined time period for alerting, and where is it documented?”
Expect to show the procedure and a sample alert timeline. 1 -
“How do you detect failures across the entire logging pipeline?”
A single SIEM “up” check is not sufficient. Be ready with end-to-end monitoring logic. 1 -
“Show me evidence of a logging failure and your response.”
They will ask for tickets, escalation, and restoration proof. 1 -
“What additional actions do you take besides alerting?”
If you only alert, you have not met AU-5. Additional actions must be predefined and repeatable. 1 -
“How do you ensure critical logs are not lost during outages?”
Have a stance: buffering, retry, backfill process, and gap documentation when not recoverable.
Frequent implementation mistakes and how to avoid them
| Mistake | Why it fails AU-5 | Fix |
|---|---|---|
| Only monitoring SIEM uptime | Logging can fail while the SIEM is “up” | Monitor per-source ingestion, lag, and heartbeat events. 1 |
| Undefined “time period” | Requirement explicitly requires a defined timeframe | Put the timeframe in the AU-5 procedure and align alerting/ticket SLAs to it. 1 |
| Alerts go to a shared inbox | No clear accountability or acknowledgment evidence | Route to on-call paging + ticketing with acknowledgment records. |
| No predefined additional actions | AU-5 requires actions beyond alerting | Publish a runbook with containment/restoration/backfill steps and decision points. 1 |
| No evidence retention | You cannot prove operation during assessment | Keep alert exports, tickets, and escalation logs tied to each event. |
Enforcement context and risk implications
No public enforcement cases were provided in the supplied source catalog for this specific requirement, so this page does not list enforcement examples.
Risk still matters operationally:
- Logging failures create blind spots for detection and incident investigation.
- During FedRAMP assessments and continuous monitoring, weak AU-5 evidence can lead to findings and remediation work that delays authorization activities. 1
Practical 30/60/90-day execution plan
You asked for a plan you can execute quickly; treat these as phases with clear deliverables.
First 30 days (baseline + definitions)
- Publish AU-5 procedure: failure definitions, roles, alert timeframe, escalation path, additional actions. 1
- Inventory required log sources in the authorization boundary and identify “critical” sources (identity, admin actions, network/security tooling).
- Turn on initial health alerts: collector down, ingestion halted, storage capacity thresholds, connector/auth failures.
- Create ticket templates and closure criteria for “logging process failure.”
Days 31–60 (coverage + evidence)
- Add end-to-end checks per major source class: heartbeat, ingestion lag, and volume anomaly rules.
- Implement routing to on-call and confirm acknowledgments are recorded.
- Run at least one controlled test and save the evidence package (alert → page → ticket → fix). 1
- Add a lightweight monthly review: open/close rates, repeat offenders, and preventive fixes.
Days 61–90 (hardening + assessor readiness)
- Expand coverage to edge cases: multi-region failures, failover collectors, API rate limiting, certificate rotation windows.
- Add backfill/reconciliation guidance: what you can recover, how you document gaps.
- Prepare an assessor-ready AU-5 evidence bundle aligned to FedRAMP templates and expectations. 2
- If a third party runs any part of logging/SOC, update contracts/SLAs to match your defined alerting timeframe and evidence needs.
Tooling note (Daydream, when it fits)
If you struggle with consistent evidence across alerts, tickets, and control narratives, Daydream can help by standardizing the AU-5 requirement language, mapping alerts/tickets to the control, and packaging operating evidence for audits and continuous monitoring without manual stitching.
Frequently Asked Questions
What counts as an “audit logging process failure” under AU-5?
Any condition where required audit events are not being generated, transmitted, ingested, stored, or retained as intended. Define categories (source, transport, ingestion, storage, integrity) so monitoring and response are testable. 1
Do we need a SIEM to meet AU-5?
AU-5 requires detection, alerting, and additional actions, not a specific product. A SIEM or centralized log store makes end-to-end verification and evidence retention much easier during FedRAMP assessments. 1
How specific does the “organization-defined time period” need to be?
It must be explicit enough to test: documented in the AU-5 procedure and supported by evidence like alert timestamps and acknowledgment/ticket times. Choose a timeframe you can consistently meet and prove. 1
What “additional actions” do auditors expect beyond alerting?
They expect predefined steps to restore logging and manage risk during the gap, such as containment guidance, restoration steps, validation, and backfill or gap documentation. Write these actions into a runbook and show tickets that followed it. 1
How do we prove logs were actually restored?
Require a verification step in the ticket: show new events arriving from the impacted source in the central repository and confirm the audit configuration remains enabled. Keep screenshots/exports attached to the ticket. 1
What if a third party operates part of our logging pipeline?
Treat the third party as in-scope for AU-5 dependencies. Contractually require timely notification, operational cooperation, and evidence sharing so you can meet your defined roles/time period and retain artifacts for assessment. 1
Footnotes
Frequently Asked Questions
What counts as an “audit logging process failure” under AU-5?
Any condition where required audit events are not being generated, transmitted, ingested, stored, or retained as intended. Define categories (source, transport, ingestion, storage, integrity) so monitoring and response are testable. (Source: NIST Special Publication 800-53 Revision 5)
Do we need a SIEM to meet AU-5?
AU-5 requires detection, alerting, and additional actions, not a specific product. A SIEM or centralized log store makes end-to-end verification and evidence retention much easier during FedRAMP assessments. (Source: NIST Special Publication 800-53 Revision 5)
How specific does the “organization-defined time period” need to be?
It must be explicit enough to test: documented in the AU-5 procedure and supported by evidence like alert timestamps and acknowledgment/ticket times. Choose a timeframe you can consistently meet and prove. (Source: NIST Special Publication 800-53 Revision 5)
What “additional actions” do auditors expect beyond alerting?
They expect predefined steps to restore logging and manage risk during the gap, such as containment guidance, restoration steps, validation, and backfill or gap documentation. Write these actions into a runbook and show tickets that followed it. (Source: NIST Special Publication 800-53 Revision 5)
How do we prove logs were actually restored?
Require a verification step in the ticket: show new events arriving from the impacted source in the central repository and confirm the audit configuration remains enabled. Keep screenshots/exports attached to the ticket. (Source: NIST Special Publication 800-53 Revision 5)
What if a third party operates part of our logging pipeline?
Treat the third party as in-scope for AU-5 dependencies. Contractually require timely notification, operational cooperation, and evidence sharing so you can meet your defined roles/time period and retain artifacts for assessment. (Source: NIST Special Publication 800-53 Revision 5)
Authoritative Sources
Operationalize this requirement
Map requirement text to controls, owners, evidence, and review workflows inside Daydream.
See Daydream