SC-24: Fail in Known State
SC-24: Fail in Known State requires you to design systems so that when specific components fail, they automatically transition to a predetermined, safe “known state” while still preserving defined, essential functions. To operationalize it quickly, you need a failure-mode map, explicit “known state” definitions per component, tested failover/fail-closed behaviors, and repeatable evidence that the behaviors work in production conditions. 1
Key takeaways:
- Define the “known state” and the “functions preserved in failure” for each critical component, not just for the system overall. 1
- Implement and test deterministic failure behavior (fail-closed or fail-safe) for the failure modes you list, with logs and runbooks that prove it. 1
- Treat SC-24 as an engineering requirement with audit artifacts: design decision records, test results, and change control for failure behavior. 2
SC-24: Fail in Known State is one of those controls that looks “obvious” until an assessor asks a simple question: “Show me what happens when this component fails.” If your answer is a mix of tribal knowledge, best-effort monitoring, and hope that redundancy saves you, you will struggle to demonstrate conformance.
The operational goal is deterministic behavior under failure. Your system must not drift into an unsafe or ambiguous state where security controls silently stop working, data integrity becomes uncertain, or availability collapses without a controlled fallback. SC-24 forces you to decide, ahead of time, what “safe” means for each key component and failure mode, and then build the system so it reliably lands there.
This requirement page focuses on fast operationalization: scoping, ownership, implementation steps, and the evidence you should retain for exams and audits. It also highlights common implementation traps (like “we’re redundant” as a substitute for “we fail in a known state”) and gives a practical execution plan you can hand to engineering and infrastructure teams.
sc-24: fail in known state requirement — plain-English meaning
SC-24 requires you to predefine a “known state” and ensure the system transitions to that state when specific failures occur, while still preserving specific essential functions during the failure condition. 1
In practice, this means:
- You identify the failure types that matter (power loss, process crash, dependency outage, storage corruption signals, network partition, identity provider outage, etc.). 1
- You decide what the component must do when that failure happens (stop processing, deny traffic, enter read-only mode, route to a standby, degrade gracefully, or shut down safely). 1
- You ensure the component behaves that way consistently, not “usually.” 2
A “known state” is not a vague promise of resilience. It is a defined operational condition you can test and show.
Regulatory text
“Fail to a {{ insert: param, sc-24_odp.02 }} for the following failures on the indicated components while preserving {{ insert: param, sc-24_odp.03 }} in failure: {{ insert: param, sc-24_odp.01 }}.” 1
What the operator must do with this text
Because the control is parameterized, your implementation must explicitly fill in three things for your environment: (1) the failures and components in scope, (2) the “known state” the component must enter, and (3) the functions you must preserve during failure. Your assessor will look for those filled-in decisions and proof they operate as designed. 1
Who SC-24 applies to
Entity scope
SC-24 commonly applies where you use NIST SP 800-53 as the control baseline for federal information systems and for contractor systems handling federal data. 1
Operational scope (where auditors focus)
Expect scrutiny on components that can create security impact under failure:
- Identity and access components (SSO, MFA, PAM, directory services)
- Key management and secrets systems
- Network security controls (firewalls, gateways, WAFs, proxies)
- Logging and audit pipelines
- Data stores and queues used for sensitive workflows
- Service meshes / API gateways / ingress controllers
- Backup/restore and replication mechanisms
Also include third-party dependencies that behave like “components” in your architecture (managed databases, hosted IdP, managed message queues). You may not control their internals, but you still must define and implement how your system fails when they fail.
What you actually need to do (step-by-step)
Step 1: Assign an owner and define the assessment boundary
- Name a control owner (often the platform/infrastructure lead) and a compliance owner (GRC or security assurance) responsible for evidence quality.
- Define what systems are in scope for SC-24 in your authorization boundary or equivalent system inventory. 2
Fast operator tip: Put the scope in a single “SC-24 Implementation Record” so engineering and audit work from the same document.
Step 2: Build a failure-mode inventory per critical component
For each critical component, document:
- Component name and purpose
- Upstream/downstream dependencies
- Failure modes you will claim coverage for (be explicit)
- Detection signals (health checks, error codes, lost quorum, heartbeat failure)
- The required “known state” behavior
Examples of “known state” definitions that are assessable:
- “API gateway denies requests requiring authentication if the IdP is unreachable.”
- “Payments service enters read-only mode if the ledger database reports corruption indicators.”
- “Secrets injector stops issuing secrets if KMS is unavailable.”
Step 3: Define “preserved functions” during failure
SC-24 explicitly calls out preserving defined functions during failure. Decide what must continue even in degraded operation, such as:
- Continued authentication for existing sessions only (or none, if that is your known-safe posture)
- Logging of failure events locally for later forwarding
- Integrity checks that prevent processing of uncertain data
- Graceful shutdown that avoids data corruption
Write these as requirements statements. Avoid “system remains available” unless you can defend it for each failure mode.
Step 4: Implement deterministic failure behavior (engineer the default)
Common implementation patterns:
- Fail-closed for security decisions: if authorization cannot be evaluated, deny.
- Fail-safe for data integrity: if consistency cannot be guaranteed, stop writes or stop processing.
- Graceful degradation for non-sensitive features: disable optional functions, keep core protected workflows.
- Circuit breakers and timeouts to prevent cascading failures.
- State pinning: if configuration or policy cannot be loaded, pin to last known good version with a bounded validity window you define internally.
Tie each pattern back to your “known state” definition for the component.
Step 5: Test the failures you listed and capture evidence
An assessor will ask how you know the system fails in a known state. You need repeatable tests:
- Tabletop: confirm the decision logic (known state + preserved functions).
- Functional test: simulate dependency outage (block network to IdP, stop a process, revoke a token signing key in a test environment).
- Chaos/fault injection where feasible for production-like validation.
Keep the tests aligned to the specific failures/components you documented. 1
Step 6: Operationalize with monitoring, runbooks, and change control
- Alerts that trigger when the system enters the known state.
- Runbooks that explain operator actions, rollback steps, and customer impact.
- Change control that requires review when failure behavior changes (new timeout, new fallback, new cache behavior).
- Post-incident review templates that include “Did we enter the known state? Did we preserve required functions?”
Step 7: Package it for audit in one place
Most teams lose time because evidence is scattered across tickets, wikis, and CI logs. Daydream can help by mapping SC-24 to a named owner, a documented implementation procedure, and a recurring evidence set so you can answer assessor requests without rebuilding the story every cycle. 1
Required evidence and artifacts to retain
Keep evidence that proves design and operation:
Core artifacts
- SC-24 Implementation Record (scope, owners, definitions for known state and preserved functions) 1
- Architecture diagrams showing where failure behavior is enforced (gateway, service, datastore, queue, IAM boundary)
- Failure-mode inventory table (component → failure → known state → preserved functions)
Engineering proof
- Configuration snippets (timeouts, circuit breakers, policy defaults, deny-by-default rules)
- Runbooks for each major failure mode
- Test plans and results (screenshots, CI logs, test reports)
- Incident tickets or postmortems that show the known state occurred and operators followed the runbook
Operational proof
- Monitoring dashboards or alert rules for known-state transitions
- Log samples demonstrating the failure event and the enforced behavior
- Change records for modifications to failover/fail-closed logic
Common exam/audit questions and hangups (with operator answers)
| Auditor question | What they mean | What you show |
|---|---|---|
| “What is the known state for this component?” | You must define it, not imply it | The SC-24 Implementation Record + component requirements statements |
| “Which failures are in scope?” | Parameterization must be filled in | Failure-mode inventory and boundary statement 1 |
| “How do you know it works?” | Evidence of testing/operation | Test results, fault-injection evidence, incident proof |
| “What functions are preserved?” | SC-24 requires preservation in failure | Preserved-functions list + design mapping 1 |
| “What changed since last assessment?” | Control drift | Change tickets, config diffs, updated tests |
Frequent implementation mistakes (and how to avoid them)
-
Equating redundancy with known state
Redundancy improves availability but does not define what happens mid-failure or during partial outage. Write the known state behavior even if you have HA. -
Defining known state only at the system level
Auditors will pick a component. Document known state per component and per failure type. -
Fail-open defaults for authz/authn paths
If your IdP is down, a cached “allow” decision can become a silent bypass. Decide explicitly whether you deny all, deny privileged actions, or allow only pre-authorized sessions, and document it. -
No statement of preserved functions
SC-24 requires preserving defined functions in failure. Pick a short list and make it testable (logging, safe shutdown, deny-by-default, etc.). 1 -
Tests exist but don’t map to the control claim
A chaos test that “caused errors” is not evidence unless it proves the component entered the known state you defined.
Enforcement context and risk implications
No public enforcement cases were provided in the source catalog for this requirement, so treat SC-24 primarily as an assurance and safety control within NIST-based assessments. 1 Operationally, weak “fail in known state” behavior increases the chance of:
- unauthorized access during dependency outages,
- data integrity issues during partial failures,
- extended recovery time because operators don’t know the expected safe behavior.
Practical execution plan (30/60/90-day)
You asked for speed, so use phases rather than promises.
First 30 days (Immediate)
- Assign owners and define scope for SC-24. 2
- Build the failure-mode inventory for the top critical components (start with IAM, gateway, secrets/KMS, primary datastore).
- Draft known state and preserved functions statements and get engineering sign-off.
By 60 days (Near-term)
- Implement missing fail-closed / fail-safe behaviors for the scoped failures.
- Write runbooks and align on on-call actions.
- Add monitoring/alerting for “known state entered” signals.
By 90 days (Operationalize and prove)
- Execute tests for each documented failure mode and store results as audit evidence. 1
- Add change control hooks so modifications to failure behavior trigger re-testing and evidence refresh.
- Centralize the control narrative and evidence set (for many teams, Daydream becomes the system of record for the control, owner, procedure, and recurring evidence requests). 1
Frequently Asked Questions
What counts as a “known state” for SC-24?
A known state is a predefined, testable condition the component enters under a specified failure, such as deny-by-default, read-only mode, or controlled shutdown. You must document it per component and failure type. 1
Do we have to preserve availability during failure to meet SC-24?
SC-24 requires preserving specified functions during failure, not blanket availability. You choose the preserved functions and must prove they continue in the failure condition. 1
How do we handle third-party managed services we can’t control?
Treat the third party as a dependency and define your system’s known state when that dependency fails. Your evidence focuses on your fallback behavior, monitoring, and testing of the integration path.
Is fail-closed always required?
No, but you need a defensible decision per workflow. For authorization and integrity-relevant paths, fail-closed or fail-safe is usually the easiest to justify because it is deterministic and testable.
What evidence is most persuasive to an assessor?
A tight mapping from failure mode → known state → preserved functions, plus executed test results that reproduce the failure and show the expected behavior. Add runbooks and monitoring proof to show it’s operational. 1
How do we keep SC-24 from becoming shelfware after the audit?
Put the failure-mode inventory and tests into your change process so updates to timeouts, fallbacks, or dependency architecture trigger re-validation. Centralizing ownership and recurring evidence collection in Daydream reduces control drift across releases. 1
Footnotes
Frequently Asked Questions
What counts as a “known state” for SC-24?
A known state is a predefined, testable condition the component enters under a specified failure, such as deny-by-default, read-only mode, or controlled shutdown. You must document it per component and failure type. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
Do we have to preserve availability during failure to meet SC-24?
SC-24 requires preserving specified functions during failure, not blanket availability. You choose the preserved functions and must prove they continue in the failure condition. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
How do we handle third-party managed services we can’t control?
Treat the third party as a dependency and define your system’s known state when that dependency fails. Your evidence focuses on your fallback behavior, monitoring, and testing of the integration path.
Is fail-closed always required?
No, but you need a defensible decision per workflow. For authorization and integrity-relevant paths, fail-closed or fail-safe is usually the easiest to justify because it is deterministic and testable.
What evidence is most persuasive to an assessor?
A tight mapping from failure mode → known state → preserved functions, plus executed test results that reproduce the failure and show the expected behavior. Add runbooks and monitoring proof to show it’s operational. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
How do we keep SC-24 from becoming shelfware after the audit?
Put the failure-mode inventory and tests into your change process so updates to timeouts, fallbacks, or dependency architecture trigger re-validation. Centralizing ownership and recurring evidence collection in Daydream reduces control drift across releases. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
Operationalize this requirement
Map requirement text to controls, owners, evidence, and review workflows inside Daydream.
See Daydream