SI-13(3): Manual Transfer Between Components

To meet the si-13(3): manual transfer between components requirement, you must define a threshold (as a fraction of mean time to failure) and manually switch workload from active to standby components when the active component reaches that usage point, then keep evidence that the trigger, decision, and transfer occurred as designed. 1

Key takeaways:

  • You need an operator-initiated failover/switchover procedure tied to MTTF-based usage thresholds, not an ad hoc “when it feels risky” decision. 1
  • Auditors look for defined MTTF inputs, a clear trigger, and repeatable runbooks, plus logs/tickets showing transfers happened when required. 1
  • The hard part is measuring “use” consistently and mapping it to standby readiness without creating outages during manual transfers.

SI-13(3) is a reliability and resilience requirement. It assumes you run systems with active and standby components (hardware, virtual infrastructure, clustered nodes, storage controllers, power supplies, network devices, or service instances) and you want to reduce the chance of failure by switching before the active component approaches its expected failure window.

This control enhancement is operationally specific: it ties the transfer decision to mean time to failure (MTTF) and requires the transfer to be manually initiated once the active component’s “use” hits a defined threshold. 1 That means you need (1) a defensible definition of MTTF for the component, (2) a method to measure usage against that MTTF, (3) a documented trigger value, (4) a human-run switchover process, and (5) evidence.

For a CCO/GRC lead, the fastest path is to treat SI-13(3) like a production control: assign an owner (often SRE/IT Ops), define the metric and trigger, create a runbook, and implement a lightweight evidence trail (tickets + logs + post-change validation). Your goal is repeatability and auditability without turning reliability work into bureaucracy.

Regulatory text

Control requirement (verbatim): “Manually initiate transfers between active and standby system components when the use of the active component reaches {{ insert: param, si-13.03_odp }} of the mean time to failure.” 1

Operator interpretation (what you must do):

  1. Set the organization-defined parameter: decide what fraction of MTTF triggers the transfer (the “{{…si-13.03_odp}}” value). 1
  2. Measure “use” of the active component in a consistent, documentable way (runtime hours, duty cycles, I/O counts, request volume, or vendor-provided wear indicators, depending on the component).
  3. Manually initiate the switchover from active to standby when the active component reaches the threshold. “Manually” means a human authorizes/starts the transfer, even if automation executes steps after approval. 1
  4. Repeat this as an operating practice, not a one-time project, and retain evidence that shows the trigger, the manual initiation, and the outcome.

Plain-English interpretation

You are required to rotate from active to standby before expected failure risk rises, using an MTTF-based trigger, and you must have a human in the loop to initiate the transfer at the defined point. 1 The compliance intent is to prevent avoidable outages caused by running components too long without planned switchover.

Who it applies to

Entities

  • Federal information systems and contractor systems handling federal data that adopt NIST SP 800-53 as the control baseline. 1

Operational contexts where SI-13(3) is relevant

  • High-availability architectures: active/standby clusters, warm standby environments, redundant appliances.
  • Components with known wear-out behavior: storage media, network hardware, power modules, rotating secrets in HSM clusters, or any component where vendors provide lifecycle guidance.
  • Environments where automatic failover exists but you still need planned, operator-initiated switchovers tied to lifecycle/usage thresholds.

Where it’s often out of scope (document the rationale)

  • Purely stateless horizontally scaled services with no clear “component MTTF” signal and no designated standby role. If you claim “not applicable,” keep an architecture note explaining why no active/standby relationship exists and what alternative resilience controls you operate.

What you actually need to do (step-by-step)

1) Establish ownership and system scope

  • Assign a control owner in IT Ops/SRE and a GRC coordinator for evidence collection.
  • Define which components qualify as “active” and “standby” for this control (by system/service).
  • Decide whether you will implement at the platform layer (clusters, hypervisors, storage) or at application layer (primary/standby app nodes). The control is agnostic; your evidence must map clearly to the selected layer. 1

2) Define MTTF inputs and the “use” metric

Create a short “MTTF & Usage Definition” sheet per component type:

  • MTTF source: vendor specs, historical maintenance data, internal reliability engineering estimates, or service provider guidance.
  • Use metric: the measurable counter that advances toward MTTF (examples: powered-on hours, number of write cycles, request hours at rated load, or a manufacturer health/wear indicator).
  • Collection method: where the metric is pulled from (monitoring tool, device telemetry, CMDB fields, maintenance logs).

Auditor hangup to preempt: “MTTF” is part of the control text; you need a defensible input and a repeatable way to track usage against it. 1

3) Set the organization-defined threshold (the ODP)

  • Document the chosen fraction of MTTF that triggers manual transfer (the control’s parameter). 1
  • Tie it to risk: critical services may set a more conservative trigger; less critical components may accept a higher threshold.
  • Record approvals: reliability engineering + service owner + security/compliance sign-off.

Practical tip: Put the threshold in a governed location (standard + runbook + monitoring alert) so it can’t drift.

4) Build the manual transfer procedure (runbook)

Your runbook should be short, executable, and evidence-friendly:

Runbook minimum contents

  • Preconditions: standby health checks, replication status, capacity, patch parity.
  • Manual initiation step: “Approve change” in ITSM and “execute switchover” command or console action.
  • Safety controls: maintenance window rules, backout plan, max tolerable impact, communications steps.
  • Validation: post-transfer checks (service health, error rates, data integrity checks, cluster state).
  • Post-change actions: update active/standby designation, reset counters, schedule next review.

If your engineering team uses ChatOps or pipelines, keep the “manual initiation” explicit: approval in ticketing, a change record, or an approval gate that starts the workflow. The requirement is “manually initiate,” not “type every command by hand.” 1

5) Implement monitoring + alerting for the trigger

  • Create an alert that fires when “use ≥ threshold × MTTF” for the active component.
  • Route alerts to the on-call team and create an auto-generated change ticket draft (recommended).
  • Add a dashboard widget showing remaining margin to threshold for each covered component.

6) Operate the control and capture evidence every time

Each time the threshold is met:

  • Open/confirm a change ticket.
  • Document the trigger (metric value and threshold).
  • Manually initiate switchover.
  • Attach logs/screenshots/monitoring snapshots.
  • Record validation results and any incidents.

7) Governance: review and improve

  • Periodically review whether MTTF inputs and use metrics still reflect reality (vendor changes, architecture changes, cloud migrations).
  • Review failed/aborted switchovers as reliability incidents with corrective actions.

Required evidence and artifacts to retain

Keep evidence that shows design and operation. A tight evidence set:

Design-time artifacts

  • Control narrative mapping SI-13(3) to the system and owner. 1
  • “MTTF & Usage Definition” document per component category.
  • Documented ODP threshold approval (the fraction of MTTF). 1
  • Runbook / standard operating procedure for manual transfer.

Run-time artifacts 1

  • Change ticket (manual initiation proof): requester/approver, timestamp, reason “threshold reached.”
  • Monitoring evidence: alert snapshot showing use vs threshold.
  • System logs: switchover execution logs, cluster role changes, failover events.
  • Post-transfer validation checklist results.
  • Exception record if transfer is delayed (with compensating controls and approval).

Daydream fit (earned mention): Daydream is useful as the system of record to map SI-13(3) to an owner, store the runbook link, and define the recurring evidence checklist so every transfer produces an audit-ready packet without chasing engineers. 1

Common exam/audit questions and hangups

Auditors and assessors tend to press on these points:

  1. What is the exact threshold value and who approved it? They want the organization-defined parameter filled in and governed. 1
  2. How do you calculate “use” and MTTF? Expect scrutiny if “use” is informal or manual spreadsheets lack controls.
  3. Show me evidence of manual initiation. A ticket approval, change record, or explicit manual gate is the simplest. 1
  4. Show recent transfer events. They will sample events and confirm trigger timing, execution, and validation.
  5. How do you ensure standby is actually ready? If standby is stale, a manual transfer can create an outage.

Frequent implementation mistakes (and how to avoid them)

Mistake Why it fails SI-13(3) Fix
No defined ODP threshold Control requires a defined fraction of MTTF. 1 Set and approve the threshold; embed it in alerts + runbooks.
“Manual transfer” is verbal, not recorded You can’t prove initiation was manual. 1 Require a change ticket approval or manual pipeline gate.
“Use” metric is ambiguous Trigger becomes subjective; audits fail on repeatability Define a single metric per component type and document collection.
Standby not maintained Transfer increases outage risk Add standby readiness checks as runbook preconditions.
Transfer happens after the threshold Control says initiate when use reaches the threshold. 1 Alert earlier, schedule maintenance windows, allow justified exception handling.

Enforcement context and risk implications

No public enforcement cases were provided for this specific control enhancement in the supplied source catalog. From a risk perspective, weak operation of SI-13(3) usually shows up as avoidable downtime, failed disaster recovery tests, or inability to demonstrate planned resilience activities during an assessment. The practical exposure is operational: you can pass policy review but fail an audit sample if you cannot produce transfer records tied to the MTTF trigger. 1

Practical execution plan (30/60/90)

Use phases instead of date promises; the work depends on system complexity and telemetry maturity.

First 30 days (Immediate foundation)

  • Confirm in-scope systems with active/standby components.
  • Assign control owner and publish a one-page control narrative for SI-13(3). 1
  • Pick initial MTTF sources and define “use” metrics for the top critical components.
  • Draft the runbook template and evidence checklist.

Next 60 days (Operationalize)

  • Set and approve the ODP threshold; configure monitoring alerts for “use vs threshold.” 1
  • Implement ITSM change templates that capture manual initiation and required attachments.
  • Run a planned switchover exercise for at least one representative system and refine the runbook based on outcomes.

By 90 days (Prove repeatability)

  • Expand coverage to remaining in-scope components and standardize evidence capture.
  • Perform an internal audit-style sampling: pick recent tickets and verify trigger evidence, initiation proof, and post-checks.
  • Add exception handling workflow (deferral approvals, compensating controls, and rescheduled transfer date).

Frequently Asked Questions

Does “manual transfer” prohibit automation?

No. The requirement is that you manually initiate the transfer, which can be satisfied by an operator approval or explicit start action even if automation performs the steps afterward. 1

What counts as “use” for virtualized or cloud components?

Define “use” as the best available lifecycle indicator you can measure consistently, such as instance runtime, managed service health indicators, or platform telemetry. Document the metric and how it maps to MTTF for your environment. 1

We don’t have vendor MTTF data. Can we still comply?

Yes, if you document a defensible MTTF input based on internal reliability data or engineering estimates and apply it consistently. Auditors mainly test whether you defined MTTF and operate the threshold-based manual transfer process. 1

What if the system can’t be switched at the threshold due to business constraints?

Create an exception record with approver, rationale, compensating controls (heightened monitoring, standby validation), and a rescheduled transfer plan. Keep the exception tied to the same “use vs MTTF” trigger evidence. 1

How do we prove the transfer happened “when” the threshold was reached?

Preserve the alert/metric snapshot that shows the threshold condition and the ticket timestamps showing manual initiation and completion. Correlate them with system logs that record the role change. 1

Is this requirement relevant if we have active/active architecture?

Often less directly. If you do not have a defined standby component, document the architecture and justify how resilience is achieved through other mechanisms; otherwise, identify an equivalent “standby” role within the design and apply SI-13(3) there. 1

Footnotes

  1. NIST SP 800-53 Rev. 5 OSCAL JSON

Frequently Asked Questions

Does “manual transfer” prohibit automation?

No. The requirement is that you **manually initiate** the transfer, which can be satisfied by an operator approval or explicit start action even if automation performs the steps afterward. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

What counts as “use” for virtualized or cloud components?

Define “use” as the best available lifecycle indicator you can measure consistently, such as instance runtime, managed service health indicators, or platform telemetry. Document the metric and how it maps to MTTF for your environment. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

We don’t have vendor MTTF data. Can we still comply?

Yes, if you document a defensible MTTF input based on internal reliability data or engineering estimates and apply it consistently. Auditors mainly test whether you defined MTTF and operate the threshold-based manual transfer process. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

What if the system can’t be switched at the threshold due to business constraints?

Create an exception record with approver, rationale, compensating controls (heightened monitoring, standby validation), and a rescheduled transfer plan. Keep the exception tied to the same “use vs MTTF” trigger evidence. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

How do we prove the transfer happened “when” the threshold was reached?

Preserve the alert/metric snapshot that shows the threshold condition and the ticket timestamps showing manual initiation and completion. Correlate them with system logs that record the role change. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

Is this requirement relevant if we have active/active architecture?

Often less directly. If you do not have a defined standby component, document the architecture and justify how resilience is achieved through other mechanisms; otherwise, identify an equivalent “standby” role within the design and apply SI-13(3) there. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream