AU-5(4): Shutdown on Failure
AU-5(4) requires you to automatically place a system into shutdown (full or partial) or a controlled degraded mode when audit logging fails, unless an alternate logging capability is available. To operationalize it, define what counts as an audit logging failure, engineer a deterministic fail-action, and prove through testing and evidence that the fail-action triggers reliably. 1
Key takeaways:
- You must choose and implement a fail-action (full shutdown, partial shutdown, or degraded mode) tied to audit logging failure conditions. 1
- “Unless an alternate audit logging capability exists” means you need a designed, verified fallback path, not a hope that logs “usually” show up elsewhere. 1
- Auditors look for evidence of trigger definitions, automated enforcement, and repeatable test results tied to production-like conditions. 2
AU-5(4): Shutdown on Failure exists to prevent a common failure mode: systems keep processing sensitive or high-impact transactions while audit logs are missing, delayed, or silently dropping. That gap turns routine operational issues (disk full, agent crash, misconfiguration, pipeline outage) into a security and compliance incident because you lose non-repudiation and investigation capability during the window of failure.
For a CCO or GRC lead, the fastest path to a defensible implementation is to treat AU-5(4) as an engineering-backed “fail closed for logging” requirement. Your job is to translate the control into (1) clear trigger conditions, (2) a pre-approved fail-action appropriate to the system’s mission, and (3) objective evidence that the action happens automatically unless a real alternate logging capability is active.
This page focuses on requirement-level execution: who owns what, what to build, how to test it, what artifacts to retain, and how to avoid the predictable audit traps. The underlying control text is from NIST SP 800-53 Rev. 5. 3
Regulatory text
NIST AU-5(4) requires: “Invoke a [one or more of: full system shutdown; partial system shutdown; degraded operational mode with limited mission or business functionality available] in the event of [audit logging failures], unless an alternate audit logging capability exists.” 1
Operator translation (what you must do):
- Detect audit logging failure based on defined conditions (not ad hoc judgment).
- Automatically invoke a predefined fail-action: full shutdown, partial shutdown, or degraded mode with limited functionality. 1
- Allow continued operation only if an alternate audit logging capability exists and is actually functioning for the events in scope. 1
Plain-English interpretation
If the system can’t reliably produce audit logs, it should not keep operating normally. You either stop it, stop the risky parts, or force it into a safe mode where only minimal business functions run, and you do this automatically unless you have a verified backup logging path.
Who it applies to (entity + operational context)
This control is commonly applied in:
- Federal information systems and contractor systems handling federal data where NIST SP 800-53 is required by program, contract, or inherited framework mapping. 1
- High-assurance environments where audit logs support incident response, forensics, privileged activity review, and accountability requirements.
Operationally, AU-5(4) is most relevant for:
- Authentication/authorization services (IdP, PAM, API gateways) where missing logs block traceability.
- Systems processing regulated or mission-impacting transactions (financial operations, healthcare workflows, controlled unclassified information environments).
- Shared platforms (Kubernetes clusters, SIEM-forwarding pipelines, central log collectors) where one failure can blind multiple products.
Decide the fail-action: shutdown vs partial shutdown vs degraded mode
Pick the smallest blast radius that still “fails closed” for accountability.
Use this decision matrix:
| System type | Recommended default | Rationale | Example implementation pattern |
|---|---|---|---|
| User-facing app with read/write | Degraded mode | Keep safe reads, stop writes/privileged actions | “Read-only” feature flag when logging unhealthy |
| Admin/privileged plane | Partial shutdown | Stop risky functions first | Disable admin endpoints; block privileged commands |
| Security boundary control (gateway/PAM) | Full shutdown or deny-by-default | Boundary controls without logging are high-risk | Refuse new sessions when audit sink unavailable |
| Batch/ETL pipelines | Stop jobs | Prevent untraceable data changes | Scheduler halts runs; queue paused |
“Degraded mode” must be concrete. “We’ll be careful” does not qualify.
What you actually need to do (step-by-step)
1) Create a control card (make ownership and triggers unambiguous)
Write a one-page “requirement control card” that includes:
- Control objective: enforce fail-action on audit logging failure. 1
- In-scope systems/components: apps, hosts, containers, network devices, cloud services.
- Control owner: typically the platform/SRE lead; GRC owns oversight.
- Trigger events: explicit “audit logging failure” definitions (see below).
- Fail-action: full/partial shutdown or degraded mode.
- Exception rule: alternate audit logging capability exists and is verified.
- Test cadence: when and how you prove it works 1.
This directly addresses a common audit gap: teams cannot show who owns the requirement, how it operates, or what evidence proves it runs. 1
2) Define “audit logging failure” as measurable health signals
Define failure conditions at the right layer(s). Common options:
- Local audit subsystem failure: audit daemon stopped, Windows event log service down, journald issues.
- Agent/forwarder failure: Fluent Bit/Logstash/CloudWatch agent down, backlog above threshold, dropped events detected.
- Pipeline/sink failure: SIEM endpoint unreachable, TLS failures, auth failures, queue full, destination rejecting.
- Integrity failure: log file permissions changed, tamper alerts, unexpected truncation.
Write them as “if X for Y then fail-action,” where X is a monitored signal and Y is a bounded time window you approve internally. Avoid hardcoding numbers in policy unless engineering can reliably measure and alert on them.
3) Engineer the automatic enforcement mechanism (not a manual runbook)
AU-5(4) expects invocation, which in practice means automation.
Patterns that work:
- Service guardrails: systemd unit dependencies that stop the app if the logger unit is unhealthy.
- Admission controls: Kubernetes policy that prevents pods from running without logging sidecar/daemonset availability.
- Application-level circuit breaker: app checks logging health endpoint at startup and continuously; flips to read-only/deny mode if unhealthy.
- Network controls: gateway denies new sessions if it cannot send audit events to the sink (or alternate sink).
- Immutable infrastructure hooks: CI/CD blocks deployment if required logging components absent.
Document exactly what triggers what. Auditors want determinism.
4) Design and prove the “alternate audit logging capability”
The exception clause is narrow: you may continue operating only if an alternate audit logging capability exists. 1
Treat this like a resilience design:
- Alternate sink (secondary SIEM endpoint, secondary log collector cluster, separate account/project).
- Alternate path (agent buffers locally with integrity protection and guaranteed later forwarding).
- Alternate mechanism (out-of-band audit record replication).
Minimum bar for defensibility:
- You can show which events are covered by the alternate capability.
- You can show the alternate capability is monitored and tested.
- You can show the system actually switches to it (automatic failover or defined routing).
5) Test it the way failures happen in real life
Do not only test “we stopped the logging agent” in a dev lab. Include production-like failure modes:
- Sink unreachable (network ACL/DNS break)
- Auth failure (expired cert, rotated token)
- Disk full / backpressure
- CPU starvation / process crash loop
Capture evidence for each test: trigger, system response, and restoration.
6) Build the evidence bundle and retention path
Define the minimum evidence bundle per system or per platform, and store it where auditors can access it without heroics. This is one of the recommended operational controls for sustained compliance. 1
Required evidence and artifacts to retain
Keep artifacts tied to each in-scope system:
- Control card / runbook
- Ownership, triggers, fail-action, exception rules.
- Architecture diagram
- Logging pipeline, primary sink, alternate capability (if any), health checks.
- Configuration evidence
- Policy-as-code snippets, systemd unit files, feature flags, gateway rules, K8s admission policies.
- Monitoring and alert definitions
- Alerts for logging failure signals and for fail-action invocation.
- Test records
- Change tickets, test plan, screenshots/log extracts, CI test output, incident-style exercise notes.
- Exception register
- If a system cannot shutdown/degrade, document risk acceptance, compensating controls, and target remediation date.
- Control health checks
- Periodic confirmation the control remains in place after changes. 1
Daydream tip (where it earns its place): use Daydream to standardize the AU-5(4) control card, track evidence by system, and run recurring control health checks with remediation tasks tied to owners and due dates. That directly addresses the “no owner/no cadence/no evidence” failure pattern auditors flag. 1
Common exam/audit questions and hangups
Expect these questions, and prepare short, evidence-backed answers:
-
“Show me your definition of audit logging failure.”
Provide trigger conditions and where they are monitored. -
“What happens to the system when logging fails?”
Demonstrate the automated fail-action and confirm it is not manual. -
“How do you know it happened?”
Provide alerts plus system state evidence (service stopped, feature flag set, gateway denies). -
“Do you have an alternate audit logging capability?”
Show design and test results; explain scope coverage. -
“Which systems are in scope?”
Provide inventory mapping: business service → components → logging enforcement point.
Hangup to avoid: claiming your SIEM is “highly available” without showing how the system responds when it is not available.
Frequent implementation mistakes (and how to avoid them)
-
Mistake: Manual response (“on-call will shutdown”).
Fix: implement automatic enforcement tied to monitored signals. -
Mistake: Treating “agent running” as “logging works.”
Fix: monitor end-to-end delivery or at least delivery acknowledgments/queue health. -
Mistake: Alternate logging exists only on paper.
Fix: test failover and retain artifacts showing events still reach an audit store. -
Mistake: Degraded mode is vague or unenforced.
Fix: enumerate which functions stop (writes, admin actions, exports) and implement technical blocks. -
Mistake: One-size-fits-all shutdown.
Fix: use the decision matrix; justify fail-action per system criticality and mission needs.
Enforcement context and risk implications
No public enforcement cases were provided in the source catalog for this requirement, so this page does not list specific cases.
Practically, AU-5(4) reduces two audit-facing risks:
- Accountability gaps: you cannot reconstruct who did what during the failure window.
- Containment gaps: attackers often target logging first; continuing normal operations after logging fails makes detection and response weaker.
Treat logging failure as a security-relevant condition, not only an availability issue.
Practical 30/60/90-day execution plan
Specific day-count timelines are not required for compliance, but operators often need a structured rollout. Use this phased plan and adapt it to your change cycles.
First 30 days (foundation)
- Inventory in-scope systems and identify current logging paths and owners.
- Draft AU-5(4) control cards for the top-risk systems first (admin planes, gateways, data stores).
- Define logging failure triggers and monitoring signals for each platform.
- Decide fail-action per system (full, partial, degraded) and get approval from system owners.
Days 31–60 (engineering + pilot)
- Implement automated fail-action for a pilot set of systems.
- Implement alternate logging capability only where business continuity requires it, and document scope.
- Run controlled failure tests; collect evidence artifacts and store them in your GRC repository.
Days 61–90 (scale + operate)
- Expand implementation to remaining in-scope systems using standard patterns.
- Add recurring control health checks (post-release checks, configuration drift detection).
- Formalize exception handling for legacy systems and track remediation to closure. 1
Frequently Asked Questions
What counts as an “audit logging failure” for AU-5(4)?
Define it as measurable conditions where audit events are not being recorded or cannot be delivered to the approved audit store. Use end-to-end signals when possible (delivery failures, sink unreachable, backlog/drops), not only “agent is running.”
Can we choose degraded mode instead of shutdown?
Yes. AU-5(4) explicitly allows degraded operational mode with limited mission or business functionality available. 1 The degraded mode must be enforced technically and documented (what functions are blocked, how it is triggered, how it is reversed).
What qualifies as an “alternate audit logging capability”?
It is a real backup path that continues capturing the required audit events when the primary logging path fails. Document the architecture and prove it works through testing; otherwise you should invoke the shutdown/partial shutdown/degraded mode.
Does AU-5(4) require a full system shutdown?
No. The control allows full shutdown, partial shutdown, or degraded mode. 1 Pick the least disruptive option that still prevents unlogged high-risk activity.
How do we implement this for SaaS or managed services where we can’t force shutdown?
Put the enforcement point at what you control: your application behavior, gateway, or identity layer. If the managed service’s audit stream fails, your system can block sensitive actions, disable admin operations, or force read-only behavior until logging is restored.
What evidence is most persuasive in an audit?
A control card with triggers and fail-action, configuration proof of automation, and test records that show the system entering shutdown/partial shutdown/degraded mode when logging fails. Evidence that the alternate logging capability (if claimed) works under failure conditions is often the deciding factor.
Footnotes
Frequently Asked Questions
What counts as an “audit logging failure” for AU-5(4)?
Define it as measurable conditions where audit events are not being recorded or cannot be delivered to the approved audit store. Use end-to-end signals when possible (delivery failures, sink unreachable, backlog/drops), not only “agent is running.”
Can we choose degraded mode instead of shutdown?
Yes. AU-5(4) explicitly allows degraded operational mode with limited mission or business functionality available. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON) The degraded mode must be enforced technically and documented (what functions are blocked, how it is triggered, how it is reversed).
What qualifies as an “alternate audit logging capability”?
It is a real backup path that continues capturing the required audit events when the primary logging path fails. Document the architecture and prove it works through testing; otherwise you should invoke the shutdown/partial shutdown/degraded mode.
Does AU-5(4) require a full system shutdown?
No. The control allows full shutdown, partial shutdown, or degraded mode. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON) Pick the least disruptive option that still prevents unlogged high-risk activity.
How do we implement this for SaaS or managed services where we can’t force shutdown?
Put the enforcement point at what you control: your application behavior, gateway, or identity layer. If the managed service’s audit stream fails, your system can block sensitive actions, disable admin operations, or force read-only behavior until logging is restored.
What evidence is most persuasive in an audit?
A control card with triggers and fail-action, configuration proof of automation, and test records that show the system entering shutdown/partial shutdown/degraded mode when logging fails. Evidence that the alternate logging capability (if claimed) works under failure conditions is often the deciding factor.
Authoritative Sources
Operationalize this requirement
Map requirement text to controls, owners, evidence, and review workflows inside Daydream.
See Daydream