PR.IR-04: Adequate resource capacity to ensure availability is maintained

To meet the pr.ir-04: adequate resource capacity to ensure availability is maintained requirement, you must prove you have enough people, technology headroom, and third-party support to keep critical services available during normal peaks and disruptive events. Operationalize it by setting availability targets, forecasting capacity, stress-testing constraints, and keeping auditable evidence of monitoring, decisions, and remediation tied to specific services.

Key takeaways:

  • Define service-specific availability and capacity thresholds, then monitor against them continuously.
  • Treat capacity as a cross-functional control: infrastructure, app, SRE/IT ops, third-party management, and incident response.
  • Keep evidence that you forecasted demand, tested limits, and executed corrective actions before availability was materially impacted.

PR.IR-04 is an availability control dressed as a resource planning requirement. Examiners and auditors read it as: “Show me you can keep the lights on, even when demand spikes or something breaks.” That means you need defensible capacity management across compute, storage, network, licenses, people, on-call coverage, and third-party dependencies that can bottleneck service availability.

This requirement applies whether you run a traditional data center, cloud-native workloads, SaaS platforms, or business-critical internal systems. It also applies when availability depends on third parties (cloud providers, managed service providers, payment processors, telecom, critical SaaS). If your organization claims alignment to NIST CSF 2.0, PR.IR-04 becomes a testable expectation: you should be able to point to documented targets, capacity models, monitoring, and an operating rhythm for scaling decisions.

The fastest way to implement PR.IR-04 is to pick your “crown jewel” services, map their dependencies, define measurable capacity and availability thresholds, and build an evidence trail that shows you actively manage headroom—not just react to outages.

Regulatory text

Excerpt: “Adequate resource capacity to ensure availability is maintained.” 1

What an operator must do: Translate “adequate resource capacity” into measurable service-level commitments and supporting capacity thresholds, then operate a repeatable process that (1) forecasts demand, (2) monitors leading indicators, (3) scales resources or reduces load before service degradation, and (4) validates readiness through testing and reviews. Your audit posture depends on evidence that you planned for capacity and executed actions tied to maintaining availability. 1

Plain-English interpretation

PR.IR-04 requires you to avoid “availability surprises.” You are expected to know which systems must remain available, what “available” means for each, what capacity constraints can cause downtime or severe degradation, and how you will add capacity (or shed load) in time.

“Adequate” is contextual. For a customer-facing SaaS, it can mean autoscaling policies, resilient architecture, and 24/7 on-call coverage. For an internal finance system, it can mean scheduled capacity reviews, tested failover, and vendor support SLAs that match your recovery expectations. The common denominator is that availability is maintained through planned capacity, not luck.

Who it applies to

Entity scope: Any organization operating a cybersecurity program aligned to NIST CSF 2.0, especially those that deliver or depend on technology services where availability affects business operations. 2

Operational contexts where PR.IR-04 becomes exam-critical:

  • Customer-facing digital services: e-commerce, banking portals, healthcare apps, marketplaces, APIs.
  • Operational technology / critical operations: manufacturing execution systems, logistics platforms, call center telephony.
  • Cloud migrations: capacity risk shifts from hardware procurement to quotas, regional dependencies, and scaling design.
  • Third-party-dependent availability: managed hosting, CDN/WAF, identity providers, payment processors, and key SaaS platforms.
  • Regulated/contractual availability commitments: where SLAs, customer contracts, or internal risk appetite define availability expectations.

What you actually need to do (step-by-step)

1) Define the availability scope and owners

  1. Create (or confirm) a list of critical services (business services, not just servers).
  2. Assign each service a service owner and an operations owner (SRE/IT ops).
  3. Map each service to its availability objective (internal target and/or customer SLA) and the time windows that matter (business hours vs 24/7).

Operator tip: If you cannot name the top services whose outage would trigger executive escalation, PR.IR-04 will stall.

2) Build a dependency and constraint map (include third parties)

For each critical service, document:

  • Primary infrastructure components (compute, database, storage, network).
  • Key platform controls (load balancer, queue, caching layer).
  • External dependencies: identity, payments, email/SMS, CDN, cloud regions, DNS, managed service providers.

Then identify the realistic constraints:

  • Performance ceilings (CPU, memory pressure, IOPS, connection pools).
  • Hard limits (cloud quotas, license counts, rate limits).
  • People limits (on-call coverage, single points of knowledge).
  • Third-party constraints (support hours, failover options).

3) Set capacity thresholds and leading indicators

Availability failures often have a runway. Define and monitor:

  • Saturation indicators: CPU, memory, disk, I/O wait, DB connections, queue depth.
  • Error indicators: timeouts, 5xx rates, dependency failures.
  • Scaling signals: autoscaling triggers, quota consumption, storage growth rates.
  • Operational indicators: ticket backlog for performance issues, on-call alert fatigue, patch windows that reduce capacity.

Attach thresholds to each indicator:

  • “Warning” threshold that triggers investigation.
  • “Action” threshold that triggers scaling or load-shedding.

4) Establish your capacity management operating rhythm

Implement a recurring process with clear accountability:

  • Regular review of demand trends and upcoming events (product launches, marketing campaigns, seasonal spikes).
  • Review of headroom vs thresholds.
  • Decisions logged: scale up/out, raise quotas, optimize, or implement rate limiting.
  • Post-incident actions: capacity fixes tracked to completion.

This is where many teams fail PR.IR-04: they monitor, but they do not document decisions and follow-through.

5) Prove you can respond under stress (testing and exercises)

Validate capacity and availability assumptions through:

  • Load testing for critical transaction paths.
  • Failover testing (where architecture supports it).
  • Tabletop scenarios that include third-party outages and quota exhaustion.
  • On-call drills for scaling and rollback procedures.

Record results, findings, and remediation. Treat repeated “known issues” without a plan as a control weakness.

6) Integrate third-party capacity into your due diligence and oversight

For third parties that can impact your availability:

  • Confirm contractual support: escalation paths, support hours, incident communications.
  • Validate resilience claims during onboarding and periodically thereafter.
  • Monitor third-party status pages and incident notifications.
  • Maintain workarounds: alternate providers, manual processes, cached modes, rate limiting.

Daydream fit (earned mention): If your capacity evidence is scattered across monitoring tools, tickets, and vendor folders, Daydream can help you map PR.IR-04 to an owner, a procedure, and a recurring evidence checklist so you can produce consistent audit-ready artifacts on demand. 1

Required evidence and artifacts to retain

Keep artifacts that show design and operation:

Governance and scope

  • Service inventory with criticality ratings and named owners
  • Availability objectives/SLA statements per critical service
  • Dependency maps including third parties

Capacity planning and monitoring

  • Capacity models/forecasts (spreadsheets, reports, or tool exports)
  • Monitoring dashboards and alert definitions for key indicators
  • Records of cloud quota requests and approvals (or capacity procurement tickets)
  • Runbooks for scaling, failover, and load-shedding

Operational execution

  • Change tickets or Git records showing scaling/optimization changes
  • Incident and problem records where capacity was a factor, with corrective actions
  • Meeting notes or decision logs from capacity review forums
  • Test plans and results for load/failover exercises

Third-party oversight (availability-relevant)

  • Third-party SLAs and support terms
  • Third-party incident communications and your internal response records
  • Documented contingency plans for critical third-party failures

Common exam/audit questions and hangups

Auditors typically probe four areas:

  1. “Which services are in scope and why?”
    Hangup: teams list infrastructure assets, not business services.

  2. “How do you know capacity is adequate?”
    Hangup: no thresholds tied to availability objectives, only raw monitoring.

  3. “Show me actions you took before an outage.”
    Hangup: decisions happen in chat; no ticket, no change record, no evidence.

  4. “What about third parties?”
    Hangup: vendor SLAs exist, but there is no monitoring, no contingency plan, and no tested assumption about dependency failure modes.

Frequent implementation mistakes and how to avoid them

  • Mistake: Treating capacity as a one-time sizing exercise.
    Fix: create a recurring capacity review cadence with logged outputs and tracked actions.

  • Mistake: Monitoring without thresholds.
    Fix: define warning/action thresholds per service and tie them to runbooks.

  • Mistake: Ignoring people capacity.
    Fix: document on-call coverage, escalation, and cross-training for critical services; track single points of failure as risk items.

  • Mistake: Assuming cloud autoscaling equals availability.
    Fix: validate quotas, regional dependencies, database scaling constraints, and scaling time. Document test evidence.

  • Mistake: No evidence trail.
    Fix: standardize an evidence packet per service: dashboard link, last forecast, last review notes, last scaling change, last test result.

Enforcement context and risk implications

No public enforcement cases were provided for this specific NIST CSF requirement in the source catalog. Practically, PR.IR-04 becomes high-risk during customer-impacting outages, SLA breaches, safety events, and material incident reporting. Even without a “fine,” availability failures trigger contractual disputes, regulator scrutiny under broader operational resilience expectations, and reputational harm. Treat PR.IR-04 as part of your defensible operational resilience story. 2

Practical 30/60/90-day execution plan

First 30 days (stabilize and define)

  • Confirm your critical service list and owners.
  • Set availability objectives for each critical service (even if interim).
  • Stand up a minimum dashboard and alert set for each service’s top constraints.
  • Create a PR.IR-04 evidence checklist template (what you will retain each review cycle). 2

Days 31–60 (operationalize and produce evidence)

  • Build dependency maps, including third-party dependencies.
  • Establish capacity thresholds and link them to runbooks.
  • Start recurring capacity review meetings with logged decisions and tracked actions.
  • Identify your top capacity risks (quotas, database ceilings, single points of knowledge) and open remediation work items.

Days 61–90 (test, harden, and make it repeatable)

  • Run load tests and at least one failover or dependency-outage exercise for the highest criticality service.
  • Validate third-party escalation paths and update contingency plans.
  • Close the loop on remediation items and document outcomes.
  • Centralize evidence collection in your GRC workflow (Daydream or equivalent) so PR.IR-04 stays “always ready,” not “audit scramble.” 3

Frequently Asked Questions

Does PR.IR-04 require a formal SLA for every system?

No, but you need an availability objective for each critical service and a way to show capacity supports it. For non-critical systems, a lighter-weight target and monitoring set can be appropriate.

What counts as “resources” under PR.IR-04?

Include infrastructure capacity (compute, storage, network), platform limits (quotas, licenses, connection pools), staffing and on-call coverage, and third-party capacity/support that can constrain availability.

We use cloud autoscaling. Is that enough evidence?

Autoscaling configuration helps, but auditors usually want proof you validated constraints like quotas and scaling time. Keep test results, quota monitoring, and records of scaling actions tied to service health.

How do we handle third-party dependencies we can’t control?

Document the dependency, contract terms, monitoring approach, and your contingency plan (fallback provider, manual workaround, degraded mode). Evidence that you planned for the failure mode matters.

What’s the minimum evidence pack we should keep per critical service?

A service profile (owner, objective, dependencies), a dashboard with thresholds, a capacity forecast snapshot, and records of the last review actions and the last resilience test.

How do we keep PR.IR-04 from becoming a quarterly scramble?

Put capacity review and evidence capture on a recurring schedule and assign a control owner. Use a GRC workflow to collect the same artifacts each cycle and track exceptions to closure. 2

Footnotes

  1. NIST CSWP 29; NIST CSF 1.1 to 2.0 Core Transition Changes

  2. NIST CSWP 29

  3. NIST CSF 1.1 to 2.0 Core Transition Changes

Frequently Asked Questions

Does PR.IR-04 require a formal SLA for every system?

No, but you need an availability objective for each critical service and a way to show capacity supports it. For non-critical systems, a lighter-weight target and monitoring set can be appropriate.

What counts as “resources” under PR.IR-04?

Include infrastructure capacity (compute, storage, network), platform limits (quotas, licenses, connection pools), staffing and on-call coverage, and third-party capacity/support that can constrain availability.

We use cloud autoscaling. Is that enough evidence?

Autoscaling configuration helps, but auditors usually want proof you validated constraints like quotas and scaling time. Keep test results, quota monitoring, and records of scaling actions tied to service health.

How do we handle third-party dependencies we can’t control?

Document the dependency, contract terms, monitoring approach, and your contingency plan (fallback provider, manual workaround, degraded mode). Evidence that you planned for the failure mode matters.

What’s the minimum evidence pack we should keep per critical service?

A service profile (owner, objective, dependencies), a dashboard with thresholds, a capacity forecast snapshot, and records of the last review actions and the last resilience test.

How do we keep PR.IR-04 from becoming a quarterly scramble?

Put capacity review and evidence capture on a recurring schedule and assign a control owner. Use a GRC workflow to collect the same artifacts each cycle and track exceptions to closure. (Source: NIST CSWP 29)

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream