AU-5(4): Shutdown on Failure

To meet the au-5(4): shutdown on failure requirement, you must configure systems so they automatically invoke a defined shutdown action when audit logging fails, unless you have a validated alternate audit logging capability that preserves audit records. Operationalize this by defining failure conditions, implementing shutdown triggers, documenting exceptions, and testing the behavior on a repeatable schedule.

Key takeaways:

  • Define exactly what “audit logging failure” means in your environment, then map each failure mode to a specific shutdown action.
  • Implement shutdown-on-failure only where it is safe and justified, and document exceptions where alternate logging exists.
  • Evidence is the control: configurations, test results, incident records, and exception approvals must be audit-ready.

AU-5(4) sits in the Audit and Accountability family and is aimed at a specific risk: if audit logging fails silently, you lose the ability to detect and investigate security-relevant activity. This enhancement requires a hard operational guardrail: when logging fails, the system must take a predefined shutdown action, unless you can prove you still have alternate logging coverage.

For a Compliance Officer, CCO, or GRC lead, the fastest path to implementation is to treat AU-5(4) as an engineering requirement with a governance wrapper. You need (1) a clear definition of “audit logging failure” for each in-scope system, (2) a technical mechanism that enforces shutdown (or a defensible alternative logging path), (3) a documented exception process for systems where shutdown is unsafe or impractical, and (4) recurring tests that produce evidence.

This page gives requirement-level guidance you can hand to security engineering, platform owners, and auditors without translating abstract language during an assessment.

Regulatory text

Requirement (verbatim excerpt): “Invoke a {{ insert: param, au-05.04_odp.01 }} in the event of {{ insert: param, au-05.04_odp.02 }} , unless an alternate audit logging capability exists.” 1

How to read this as an operator:

  • You must define the organization-defined shutdown action (the first parameter) and the organization-defined audit logging failure event(s) (the second parameter).
  • If audit logging fails, the system must automatically trigger the shutdown action unless you can demonstrate an alternate audit logging capability that continues capturing required audit events. 1
  • AU-5(4) is an enhancement to AU-5 (Response to Audit Logging Process Failures). It is typically assessed as both a design control (is it built?) and an operating effectiveness control (does it work in real conditions?) under NIST SP 800-53 Rev. 5 programs. 2

Plain-English interpretation (what the requirement really demands)

AU-5(4) requires fail-closed behavior for logging: if you cannot log, you do not continue operating normally. The expectation is that you either:

  1. Stop the system (or stop the risky function) to prevent unlogged activity, or
  2. Prove that another logging pipeline captures the same audit events, so operations can safely continue.

A practical interpretation that works in audits:

  • “Shutdown” does not always mean powering off a data center server. It means an intentional, predefined state change that prevents normal processing when audit logging is not reliable. You define that state change and justify it.
  • “Alternate audit logging capability” must be more than “we’ll look at app logs later.” It should be an engineered path (for example, redundant log agents, buffered forwarding, or independent platform logging) that remains available during the primary failure scenario.

Who it applies to (entity and operational context)

Typical entities:

  • Federal information systems and contractors operating systems that handle federal data where NIST SP 800-53 controls are contractually or programmatically required. 2

Operational contexts where assessors focus:

  • Systems that process sensitive transactions or privileged actions (identity systems, administrative consoles, financial platforms).
  • Centralized logging architectures (SIEM pipelines, log forwarders, agents) where a single failure can create blind spots.
  • Environments with elastic compute (containers, serverless) where logging can fail due to misconfiguration, blocked egress, or exhausted disk/buffer capacity.

Where shutdown-on-failure can be unsafe:
Industrial control systems, patient-care systems, and other safety-critical workloads may need exceptions. AU-5(4) allows continued operation if you have alternate audit logging capability, and your exception package should show how you meet audit intent without creating physical safety risk. 1

What you actually need to do (step-by-step)

Use this as an implementation checklist you can assign to control owners.

1) Set the scope and control ownership

  • Identify in-scope systems where audit logs are required for security monitoring, incident response, or compliance.
  • Assign a primary owner (often Security Engineering or Platform Engineering) and a governance owner (GRC).

Deliverable: AU-5(4) control record in your control library with owners, in-scope systems, and evidence cadence.

2) Define the two “organization-defined parameters”

AU-5(4) forces precision. Define both items explicitly:

  • Audit logging failure events: Examples to consider include log agent stopped, log queue/buffer full, cannot write to disk, cannot reach log collector, logging service disabled, or tamper detection triggers.
  • Shutdown action: Decide what “shutdown” means per system class:
    • Full host shutdown/reboot
    • Stop application/service
    • Disable privileged/admin functions
    • Block inbound traffic at the load balancer
    • Fail authentication closed (deny requests) until logging restored

Decision rule: pick the smallest shutdown action that prevents unlogged security-relevant activity while preserving safety and availability goals.

Deliverable: A system-by-system AU-5(4) parameter register (a table is fine) that names failure conditions and shutdown actions.

3) Engineer detection of logging failure

You need a reliable signal that logging is broken. Common patterns:

  • Health checks on logging agents and collectors
  • Heartbeat logs with alerting if absent
  • Local checks for file write failures and disk exhaustion
  • Monitoring of log delivery acknowledgements (queue depth, drop counts)

Deliverable: Monitoring requirements mapped to each failure condition (what detects it, where it alerts, and how quickly engineering sees it).

4) Implement the shutdown trigger and make it hard to bypass

  • Implement automated actions (systemd unit dependencies, orchestrator policies, admission controllers, runtime guards).
  • Restrict who can disable the trigger (role-based access; change control).
  • Ensure the shutdown event itself is logged to the extent possible (for example, to a separate channel, or to the alternate logging path if that is the design).

Deliverable: Configuration artifacts (IaC, policies, service unit files) and access control settings showing anti-tamper.

5) Define and validate “alternate audit logging capability” (if you claim the exception)

If you choose not to shut down during certain failures, document and test the alternate logging capability:

  • What events it captures (and whether it matches your audit requirements)
  • What failure scenarios it survives (collector down, network partition, credential expiry)
  • How you reconcile logs later (time sync, deduplication, chain-of-custody expectations)

Deliverable: An exception memo per system explaining why shutdown is not used, plus test evidence that alternate logging continues under the defined failure conditions. 1

6) Test it like an auditor will

Run controlled tests that simulate the failure conditions you defined:

  • Stop the log agent
  • Block egress to the collector
  • Fill the log volume/buffer
  • Disable the logging service via configuration

For each test, capture:

  • Time and system identifier
  • Failure injected
  • Detection signal
  • Shutdown action observed
  • Recovery steps and approvals

Deliverable: Test scripts/runbooks and test results with screenshots/commands/log excerpts.

7) Operationalize: incident handling, recovery, and governance

  • Create a runbook for “logging failure causes shutdown” scenarios.
  • Define who can approve restoration to service and what checks must pass first (logging restored, backlog drained, alternate logging confirmed).
  • Track repeated failures as reliability issues and feed them into problem management.

Deliverable: Runbook + incident tickets that demonstrate the process works under real operations.

Required evidence and artifacts to retain

Keep evidence aligned to what an assessor asks for: “show me it is defined, implemented, and tested.”

Evidence type What “good” looks like
AU-5(4) parameters register Lists each system, failure events, shutdown action, alternate logging (if any), and owner
Technical configuration Policies/scripts/IaC showing shutdown trigger bound to failure detection
Access control proof Roles/permissions showing only authorized staff can change logging and shutdown configurations
Test results Dated test records tied to defined failure conditions and expected shutdown behavior
Exception approvals Written approvals for alternate logging designs, with rationale and compensating controls
Operational records Incident tickets, post-incident reviews, and change records tied to logging failures

Common exam/audit questions and hangups

Expect these questions in NIST-based assessments:

  1. “What are your defined failure conditions?” If you cannot name them, you fail the intent of the requirement.
  2. “Show me the shutdown action occurs automatically.” Manual response is usually treated as insufficient for AU-5(4).
  3. “Where is the alternate logging capability documented and tested?” A claim without test evidence is a common finding.
  4. “How do you prevent admins from disabling the mechanism?” Assessors look for configuration control and privileged access governance.
  5. “Which systems are exempt and why?” Your exception list must be complete and approved.

Frequent implementation mistakes (and how to avoid them)

  • Mistake: Defining “shutdown” as an alert. Alerts support AU-5 base response, but AU-5(4) calls for a shutdown action unless alternate logging exists. Tie the alert to an enforced state change. 1
  • Mistake: Relying on the same failing component as your “alternate.” If the same network path or collector failure breaks both, it is not a credible alternate.
  • Mistake: No negative testing. Teams test logging when everything is healthy. Assessors want evidence of failure injection and the observed shutdown behavior.
  • Mistake: Over-scoping shutdown and causing self-inflicted outages. Apply shutdown-on-failure to the smallest component that prevents unlogged sensitive actions (service, endpoint, admin function). Document why that boundary is sufficient.
  • Mistake: Exceptions living in email. Put exceptions in a controlled workflow with approvals, review dates, and test artifacts.

Enforcement context and risk implications

No public enforcement cases were provided in the source material for AU-5(4).
Operationally, the risk is straightforward: if attackers (or insiders) can disable or degrade logging without consequence, they can operate without detection and you may be unable to reconstruct events during incident response. AU-5(4) is designed to make “turning off the lights” operationally expensive.

Practical execution plan (30/60/90-day)

Use phases rather than rigid timelines if your environment is complex.

First 30 days: Define and decide

  • Inventory in-scope systems and identify where audit logging is security-critical.
  • Create the AU-5(4) parameters register (failure conditions + shutdown action per system).
  • Decide where alternate logging will be the strategy and document the design intent.
  • Stand up a simple evidence folder structure per system (config, tests, exceptions, incidents).

Days 31–60: Build and pilot

  • Implement failure detection and shutdown triggers for a pilot set of systems (pick one high-risk system and one typical system).
  • Write runbooks for detection, shutdown behavior, and recovery.
  • Run tabletop scenarios with SecOps and SRE: “logging fails at 2am, what happens?”

Days 61–90: Expand and harden

  • Roll out to remaining in-scope systems with a standard pattern (IaC modules, baseline policies).
  • Execute negative tests for each defined failure condition and store results as repeatable evidence.
  • Formalize the exception process for systems using alternate logging, and schedule periodic re-validation tests.
  • Add AU-5(4) checks to change management so logging pipeline changes trigger a control review.

Where Daydream fits (without adding process drag)

If your team struggles with keeping AU-5(4) audit-ready, Daydream helps by mapping the requirement to a named control owner, a written implementation procedure, and a recurring evidence checklist so you do not rebuild proof each assessment cycle. This aligns with the recommended best practice to map AU-5(4) to control ownership and recurring artifacts. 1

Frequently Asked Questions

Does “shutdown” mean powering off the server?

Not necessarily. Define a shutdown action that prevents normal processing when audit logging fails, such as stopping the service, blocking requests, or disabling privileged functions, and document that definition for assessors. 1

When can we avoid shutdown and keep running?

Only when an alternate audit logging capability exists and you can show it continues capturing required audit events during the defined logging failure scenarios. Keep an approved exception record and test evidence. 1

What counts as “audit logging failure” for AU-5(4)?

Treat it as organization-defined and be explicit: agent down, collector unreachable, disk full, queue overflow, logging disabled, or dropped events. The key is that you predefine the conditions and test them. 1

How do we implement AU-5(4) in containers or Kubernetes?

Use platform controls that can stop scheduling, kill pods, or block ingress when the logging sidecar/agent is unhealthy or delivery acknowledgements fail. Store the policy/IaC and a failure-injection test record as your evidence.

What will auditors ask for first?

They usually start with your defined parameters (failure conditions and shutdown action), then ask for configuration proof and a test showing the shutdown actually happened under an induced failure. 1

We have a SIEM. Does that automatically satisfy the alternate logging requirement?

A SIEM only helps if logs reliably reach it during the failure scenario you defined. If the SIEM depends on the same broken pipeline, it is not a defensible alternate logging capability.

Footnotes

  1. NIST SP 800-53 Rev. 5 OSCAL JSON

  2. NIST SP 800-53 Rev. 5

Frequently Asked Questions

Does “shutdown” mean powering off the server?

Not necessarily. Define a shutdown action that prevents normal processing when audit logging fails, such as stopping the service, blocking requests, or disabling privileged functions, and document that definition for assessors. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

When can we avoid shutdown and keep running?

Only when an alternate audit logging capability exists and you can show it continues capturing required audit events during the defined logging failure scenarios. Keep an approved exception record and test evidence. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

What counts as “audit logging failure” for AU-5(4)?

Treat it as organization-defined and be explicit: agent down, collector unreachable, disk full, queue overflow, logging disabled, or dropped events. The key is that you predefine the conditions and test them. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

How do we implement AU-5(4) in containers or Kubernetes?

Use platform controls that can stop scheduling, kill pods, or block ingress when the logging sidecar/agent is unhealthy or delivery acknowledgements fail. Store the policy/IaC and a failure-injection test record as your evidence.

What will auditors ask for first?

They usually start with your defined parameters (failure conditions and shutdown action), then ask for configuration proof and a test showing the shutdown actually happened under an induced failure. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

We have a SIEM. Does that automatically satisfy the alternate logging requirement?

A SIEM only helps if logs reliably reach it during the failure scenario you defined. If the SIEM depends on the same broken pipeline, it is not a defensible alternate logging capability.

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream