SI-2(7): Root Cause Analysis

To meet the si-2(7): root cause analysis requirement, you must run a documented root cause analysis (RCA) for security-relevant issues or failures, identify underlying causes (not symptoms), and track corrective actions to completion with evidence. Operationalize it by defining RCA triggers, assigning ownership, standardizing an RCA template, and making remediation measurable and auditable. 1

Key takeaways:

  • SI-2(7) expects repeatable RCA, not ad hoc “lessons learned.” 1
  • Auditors look for proof that causes were identified and fixes were implemented and verified. 2
  • Your fastest path is a lightweight workflow: trigger → analyze → approve → remediate → validate → retain evidence. 2

SI-2 is the NIST SP 800-53 “Flaw Remediation” control family; enhancement SI-2(7) adds a specific expectation: perform root cause analysis to find the underlying causes of issues or failures. The practical intent is simple: recurring incidents, repeated patch misses, repeat configuration drift, or “mystery outages” are control failures unless you can show you identified why they happened and prevented recurrence. 1

For a Compliance Officer, CCO, or GRC lead, the trap is treating RCA as an engineering-only activity with inconsistent documentation. Assessors typically do not care which RCA methodology you pick (5 Whys, fishbone, fault tree). They care that your process is defined, consistently triggered, produces actionable findings, and results in completed corrective actions with validation. 2

This page gives requirement-level implementation guidance you can put into your control narrative, your incident/problem workflow, and your evidence plan. The goal is quick operationalization: clear triggers, a standard template, defined roles, and an evidence trail you can hand to an assessor without scrambling.

Regulatory text

Requirement (verbatim excerpt): “Conduct root cause analysis to identify underlying causes of issues or failures.” 1

What the operator must do:

  • Define what counts as an “issue or failure” that requires RCA (for example: repeat vulnerabilities due to the same misconfiguration pattern, repeated failed patches, recurring control exceptions, incident recurrences, availability failures tied to configuration changes).
  • Perform an RCA that identifies underlying causes (process gaps, design issues, missing guardrails, unclear ownership), not only the immediate technical trigger.
  • Document findings and corrective actions, assign owners and due dates, and verify closure.
  • Retain evidence that RCA happened and that changes reduced recurrence risk. 2

Plain-English interpretation (what SI-2(7) is really asking)

SI-2(7) expects you to prove that you learn systematically from security and resilience failures. If your scanner repeatedly flags the same class of vulnerability, or a system repeatedly falls out of baseline, “we patched it again” is not a compliant endpoint. You need to show why it kept happening and what you changed so it stops happening (or is materially less likely). 1

Think of SI-2(7) as the bridge between:

  • Detection (you found a flaw or failure), and
  • Prevention of recurrence (you fixed the root cause, not just the symptom). 2

Who it applies to

Entity types (common applicability):

  • Federal information systems
  • Contractor systems handling federal data (including environments subject to federal security requirements through contracts or authorizations) 1

Operational contexts where assessors expect to see SI-2(7) working:

  • Vulnerability management and patch remediation (recurring missing patches, repeated exceptions)
  • Incident response (repeat incident patterns, same kill chain step recurring)
  • Change management and configuration management (repeat outages or security drift)
  • Identity and access problems (repeat privilege creep findings)
  • Third-party delivered components or managed services (repeat failures caused by unclear shared responsibility or weak integration controls)

What you actually need to do (step-by-step)

1) Assign ownership and define the workflow boundary

  • Control owner: Usually Security Assurance/GRC or Security Operations, with shared execution by Engineering/IT.
  • RCA facilitator: Named role (can rotate) responsible for running the meeting and producing the write-up.
  • Approver: A manager accountable for accepting root cause statements and corrective actions.

Operational rule: an RCA is not “done” until actions are implemented or formally accepted as risk with documented rationale.

2) Define RCA triggers (make them objective)

Write triggers into your procedure so RCAs occur consistently. Practical triggers that map well to “issues or failures” include:

  • Recurrence of the same vulnerability class or misconfiguration pattern.
  • Repeat incidents with similar TTPs or control breakdowns.
  • SLA breaches for remediation timelines due to process failures (for example, “patch window missed because ownership unclear”).
  • Material outages or reliability failures with security impact (failed updates, rollback failures, broken logging).
  • Audit findings that repeat across cycles.

Keep triggers tight and testable. If your triggers are vague (“significant issue”), RCAs become optional and evidence becomes inconsistent.

3) Use a standard RCA template that forces “root cause,” not symptoms

Your template should require, at minimum:

  • Event summary: what happened, impacted assets, dates, and detection source.
  • Customer/data impact assessment: what could have been exposed or disrupted (keep it factual).
  • Timeline: key events and decision points (ticket links or change IDs).
  • Contributing factors: tooling gaps, process gaps, training gaps, documentation gaps, access issues.
  • Root cause statement: a single sentence in the form “X happened because Y control/process/design gap existed.”
  • Corrective actions: preventive and detective improvements, each with owner and due date.
  • Validation plan: what evidence proves the fix worked (test case, control check, monitoring alert, configuration query).
  • Lessons learned and procedural updates: policy/standard/runbook changes required.

If you need a quick win, implement the template inside your existing ticketing system so the record is inherently time-stamped and attributable.

4) Run the RCA meeting and document decisions

  • Include Security, the system owner, and any team that owns a contributing control (patching, IAM, network, CI/CD).
  • Keep the meeting outcome-oriented: root cause statement(s) and approved corrective actions.
  • Capture dissent or uncertainty explicitly (for example, “root cause not fully confirmed; additional data collection action created”).

5) Track corrective actions like control obligations, not suggestions

Corrective actions must be managed as first-class work items:

  • Create tickets/epics for each action.
  • Require due dates, owners, and status updates.
  • Escalate overdue actions through your normal risk or issue management forum.

A clean approach is to treat RCA actions as part of your risk register lifecycle. If the root cause cannot be fully remediated (legacy system constraints), document compensating controls and risk acceptance.

6) Validate closure and prevent recurrence

Validation is where many programs fail. Require one of the following per action:

  • Evidence of configuration guardrails implemented (policy-as-code rule, baseline enforcement, CI checks).
  • Monitoring/alerting added and tested.
  • A regression test or control check performed post-change.
  • A follow-up scan/report showing the recurring finding no longer appears.

Tie the validation evidence back to the RCA record so an assessor can follow the chain from failure → cause → fix → proof.

7) Retention and readiness (make evidence easy to produce)

Create an “RCA evidence package” per event with consistent naming and storage. If you use Daydream for control operations, map SI-2(7) to a control owner, documented procedure, and recurring evidence artifacts so you can produce a complete audit packet without rebuilding context during the exam window. 1

Required evidence and artifacts to retain

Keep artifacts that prove both process and outcomes:

Core artifacts

  • RCA procedure (triggers, roles, methodology options, required fields)
  • Completed RCA reports/tickets (with approvals)
  • Corrective action tickets (with owners, due dates, closure)
  • Validation evidence (test results, scan outputs, monitoring changes, configuration diffs)

Supporting artifacts

  • Incident/problem/change records linked to the RCA
  • Meeting notes or decision logs (who agreed to what)
  • Updated runbooks/standards/policies that resulted from the RCA
  • Risk acceptance memo(s) where remediation is not feasible

Common exam/audit questions and hangups

Assessors commonly probe:

  • “Show me your RCA criteria. How do you decide when RCA is required?” 2
  • “Provide examples of RCAs from the last period and the corrective actions that closed.” 1
  • “How do you verify the root cause fix worked and reduced recurrence risk?” 2
  • “Who approves the root cause statement and signs off on closure?”
  • “How do you handle third-party-caused failures and shared responsibility gaps?”

Hangup to anticipate: teams provide a post-incident summary with no underlying cause analysis, or actions that are purely “remind the team” with no control change.

Frequent implementation mistakes (and how to avoid them)

  1. Mistake: Confusing root cause with the trigger

    • Symptom: “Root cause was unpatched server.”
    • Fix: Ask why it was unpatched (ownership, tooling coverage, change freeze, asset inventory gaps).
  2. Mistake: RCAs that produce no preventive control change

    • Fix: Require at least one preventive or detective control improvement per RCA, unless risk-accepted with rationale.
  3. Mistake: No linkage between RCA and remediation evidence

    • Fix: Put RCA ID into every corrective action ticket and store validation artifacts in the same record.
  4. Mistake: Treating third-party failures as “out of scope”

    • Fix: Document the integration root cause (contractual SLA gaps, missing monitoring, unclear escalation paths) and create actions for governance changes.
  5. Mistake: RCAs exist, but triggers are discretionary

    • Fix: Define objective triggers and review samples monthly in a governance forum for consistency.

Enforcement context and risk implications

No public enforcement cases were provided in the source material for SI-2(7). 1

Operationally, the risk is still concrete: without RCA discipline, repeat failures accumulate into recurring incidents, recurring audit findings, and chronic exception handling. That pattern increases operational risk and weakens your authorization or customer assurance posture because you cannot show continuous improvement tied to real failures. 2

Practical 30/60/90-day execution plan

First 30 days (stand up the minimum viable control)

  • Name the SI-2(7) control owner and RCA facilitator role.
  • Publish RCA triggers and a single approved RCA template.
  • Implement the RCA record in your ticketing system (fields, required approvals).
  • Train the teams who will be pulled into RCAs (Security Ops, IT, Engineering).

Days 31–60 (run RCAs and harden corrective action tracking)

  • Run RCAs for newly triggered events; do not backfill everything.
  • Create a corrective action tracking view (board/report) with due dates and status.
  • Define validation evidence requirements and add them to “definition of done.”
  • Start a lightweight monthly review to check RCA quality and overdue actions.

Days 61–90 (make it auditable and resilient)

  • Perform a self-assessment: select a sample of RCAs and verify end-to-end traceability.
  • Tighten triggers based on early learning (too many, too few, or unclear).
  • Ensure third-party-related RCAs include governance actions (SLA updates, monitoring, escalation).
  • Package recurring evidence for assessors: procedure, examples, action closure proof, validation proof.

Frequently Asked Questions

Do we need a specific RCA methodology (5 Whys vs. fishbone)?

SI-2(7) requires that you conduct RCA and identify underlying causes; it does not mandate a specific method. Pick one approach, document it in your procedure, and apply it consistently with evidence. 1

What types of events should trigger an RCA?

Define objective triggers tied to “issues or failures,” especially recurring vulnerabilities, repeated incidents, and repeat audit findings. The key is consistency: the same pattern should produce the same RCA decision. 2

How deep does “root cause” need to go?

Deep enough to identify a fix that prevents recurrence, such as a control gap in asset inventory, patch ownership, baseline enforcement, or change governance. If the only action is “patch again,” the analysis usually stopped too early. 2

Can we close an RCA if we accept the risk instead of fixing the root cause?

Yes, if you document the rationale, approving authority, and compensating controls (if any), and link that decision to the RCA record. Auditors will still expect traceability and governance, not silence. 2

How do we handle RCAs where a third party caused the failure?

Document the integration and governance root cause (monitoring gaps, unclear responsibility boundaries, missing escalation SLAs) and create corrective actions you control. Keep third-party communications and contract/SOW changes as evidence. 2

What evidence do auditors typically ask for first?

A written RCA procedure, a small sample of completed RCAs, and proof that corrective actions were implemented and validated. Make sure each RCA has links to tickets, changes, and validation outputs. 2

Footnotes

  1. NIST SP 800-53 Rev. 5 OSCAL JSON

  2. NIST SP 800-53 Rev. 5

Frequently Asked Questions

Do we need a specific RCA methodology (5 Whys vs. fishbone)?

SI-2(7) requires that you conduct RCA and identify underlying causes; it does not mandate a specific method. Pick one approach, document it in your procedure, and apply it consistently with evidence. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

What types of events should trigger an RCA?

Define objective triggers tied to “issues or failures,” especially recurring vulnerabilities, repeated incidents, and repeat audit findings. The key is consistency: the same pattern should produce the same RCA decision. (Source: NIST SP 800-53 Rev. 5)

How deep does “root cause” need to go?

Deep enough to identify a fix that prevents recurrence, such as a control gap in asset inventory, patch ownership, baseline enforcement, or change governance. If the only action is “patch again,” the analysis usually stopped too early. (Source: NIST SP 800-53 Rev. 5)

Can we close an RCA if we accept the risk instead of fixing the root cause?

Yes, if you document the rationale, approving authority, and compensating controls (if any), and link that decision to the RCA record. Auditors will still expect traceability and governance, not silence. (Source: NIST SP 800-53 Rev. 5)

How do we handle RCAs where a third party caused the failure?

Document the integration and governance root cause (monitoring gaps, unclear responsibility boundaries, missing escalation SLAs) and create corrective actions you control. Keep third-party communications and contract/SOW changes as evidence. (Source: NIST SP 800-53 Rev. 5)

What evidence do auditors typically ask for first?

A written RCA procedure, a small sample of completed RCAs, and proof that corrective actions were implemented and validated. Make sure each RCA has links to tickets, changes, and validation outputs. (Source: NIST SP 800-53 Rev. 5)

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream