BAI04: Managed Availability and Capacity

BAI04: Managed Availability and Capacity requires you to run IT services with defined availability and performance targets, continuously monitor real capacity demand, and take planned actions to prevent outages and degradation. To operationalize it quickly, set SLO/SLAs for critical services, instrument monitoring, establish capacity thresholds and forecasting, and retain evidence that proves the process runs. 1

Key takeaways:

  • Define service availability/performance targets and tie them to business impact, not IT preferences. 2
  • Monitor and forecast capacity, then execute documented actions before thresholds are breached. 3
  • Audit success depends on evidence: targets, monitoring outputs, trend reviews, decisions, and completed remediation. 2

The bai04: managed availability and capacity requirement is one of the fastest ways auditors separate “we think we’re reliable” from “we can prove we manage reliability.” Under COBIT 2019, BAI04 is an implementation expectation focused on two operational outcomes: services stay available at the level the business needs, and the underlying technology capacity scales predictably so performance doesn’t collapse under normal growth or predictable peaks. 1

For a Compliance Officer, CCO, or GRC lead, the practical goal is straightforward: make availability and capacity a governed process with clear ownership, measurable targets, and repeatable reviews that drive action. This page gives you requirement-level implementation guidance you can put into a control narrative, assign to IT operations, test internally, and defend in an audit. It prioritizes artifacts and decision points because BAI04 failures usually show up as “no one owned it,” “monitoring existed but wasn’t reviewed,” or “capacity was handled ad hoc after incidents.”

You do not need to rebuild your entire IT operations model to meet BAI04. You need a minimal, complete system: service criticality, targets, monitoring, thresholding, trend reviews, and tracked actions.

Regulatory text

Provided excerpt (framework summary): “COBIT 2019 objective BAI04 implementation expectation.” 1

Operator interpretation: You must implement and operate controls that manage (1) availability of IT-enabled services and (2) capacity of the infrastructure and platforms those services depend on, with enough governance and evidence to show the process is intentional and repeatable. 2

What an auditor will expect you to demonstrate: documented ownership, documented procedures, and retained evidence mapped to BAI04 that shows the controls run in practice, not just on paper. 1

Plain-English interpretation (what BAI04 means in practice)

BAI04 expects you to:

  1. Define what “good” looks like for service availability and performance (targets that match business needs).
  2. Measure reality through monitoring and reporting.
  3. Plan capacity so systems have enough headroom for expected demand, changes, and known peaks.
  4. Act early when trends or thresholds indicate risk, and track those actions to completion.
  5. Prove it with artifacts that link targets → monitoring → review → decisions → outcomes. 1

Who it applies to

Entities: Any enterprise IT organization adopting COBIT 2019 practices, including regulated organizations that use COBIT as a governance framework or map COBIT objectives to internal controls. 2

Operational scope (what systems/processes are in):

  • Production business services (customer-facing and internal critical services)
  • Core infrastructure (compute, storage, network)
  • Platforms (cloud services, container platforms, databases)
  • Supporting operations processes (incident, change, problem, monitoring, capacity planning)
  • Third parties that materially affect service availability or capacity (cloud providers, managed service providers, critical SaaS). Use “third party” oversight where their performance determines yours.

What you actually need to do (step-by-step)

1) Establish ownership and a control boundary

  • Assign an Availability & Capacity Control Owner (often SRE/IT Ops leader) and a GRC Control Steward (you) to maintain the control narrative and evidence map.
  • Define which services are in-scope by business criticality (tiering). A simple tier model is acceptable if it drives differentiated targets and monitoring.

Output: RACI + service inventory with criticality tiers.

2) Define availability and performance targets per service

  • For each in-scope service, document:
    • Availability target (SLO/SLAs as appropriate)
    • Performance target (latency, throughput, job completion time, API error rate, or user experience proxy)
    • Support window and dependencies (databases, message queues, third parties)
    • Customer impact statement (what breaks when it degrades)

Practical pattern: Start with a short “Service Reliability Profile” template. Keep it one page per service.

Output: Approved service targets and dependency map.

3) Implement monitoring that matches the targets

  • Confirm monitoring covers:
    • Service health checks (synthetic and/or real-user)
    • Key resource metrics (CPU, memory, storage, IOPS, network saturation)
    • Queue depth / worker utilization for async systems
    • Error rates and timeouts at the edge (load balancer/API gateway)
  • Ensure alerts are:
    • Tied to user-impacting thresholds where possible
    • Routed to an on-call or incident channel with clear ownership
    • Tuned to avoid chronic noise (auditors notice alert fatigue when tickets show no action)

Output: Monitoring configuration evidence + alert routing + runbooks.

4) Create a capacity management procedure with thresholds and forecasts

  • Define:
    • Capacity thresholds (e.g., “warn” and “critical”) for each key resource class.
    • Forecast cadence (how you review trends) and methodology (trend lines, seasonal adjustment, known business events).
    • Trigger criteria for action (for example, when forecasted demand crosses a threshold inside your procurement/change lead time).
  • Tie the procedure to change management: capacity expansions should be planned changes with approvals and rollback considerations.

Output: Capacity management SOP + threshold register + forecast reports.

5) Operational reviews that produce decisions

Set two recurring review types:

  • Reliability review: target attainment, recurring incidents, top contributors to downtime, planned resilience work.
  • Capacity review: forecast vs actual, headroom risks, upcoming launches or campaigns, required scaling actions.

Each review must end with:

  • decisions,
  • owners,
  • due dates,
  • tracked completion.

Output: Meeting notes, action logs, tickets/epics.

6) Integrate with incident, problem, and change management

  • After major incidents, require a specific check: “Was capacity or saturation a contributing factor?”
  • If yes, open a problem record or backlog item to fix the underlying capacity planning/limits, not just patch the symptom.
  • Require post-change validation to confirm the change did not reduce availability or effective capacity (for example, resource limits, autoscaling settings, DB connection pools).

Output: Linked incident/problem/change records with clear traceability.

7) Evidence mapping (make the audit easy)

Build a simple evidence index mapped to BAI04:

  • what the control is,
  • who runs it,
  • how often it runs,
  • where evidence lives,
  • how exceptions are handled. 1

Output: BAI04 control narrative + evidence map (your “one-stop” audit packet).

Required evidence and artifacts to retain

Use this as your minimum evidence checklist:

Evidence type What “good” looks like Where it usually lives
Control narrative + RACI Named owners, scope statement, cadence GRC repository
Service inventory + tiering In-scope services tagged by criticality CMDB/service catalog
Service Reliability Profiles Targets, dependencies, support window Wiki/GRC attachment
Monitoring/alert evidence Screenshots/export of key dashboards and alert rules Monitoring tool
On-call/runbooks “If X alert, do Y” with escalation Runbook repo
Capacity SOP + thresholds Documented thresholds and action triggers SOP repository
Forecast/trend outputs Monthly/quarterly reports, annotated decisions Capacity review deck
Action tracking Tickets with owners, dates, completion proof ITSM/Jira
Incident/problem linkage Evidence capacity issues are analyzed ITSM

Common exam/audit questions and hangups

  • “Show me the availability targets for your critical services and who approved them.”
  • “How do you know you will have capacity for the next major release or business event?”
  • “Which dashboards and alerts prove you detect saturation before users notice?”
  • “Give examples of capacity risks identified in review meetings and the actions taken.”
  • “Where is the evidence that reviews happen consistently, not only after an incident?”
  • Hangup: Teams have monitoring but no documented review cadence and no retained outputs.

Frequent implementation mistakes (and how to avoid them)

  1. Mistake: Targets exist only as third-party SLAs.
    Fix: Define internal SLOs per service; third-party SLAs become an input, not the whole requirement.

  2. Mistake: Capacity is treated as procurement only.
    Fix: Include configuration constraints (autoscaling limits, DB pools, rate limits) in capacity planning.

  3. Mistake: “We have dashboards” with no evidence of action.
    Fix: Keep a standing action log from reviews and link it to tickets.

  4. Mistake: One-size-fits-all thresholds.
    Fix: Set thresholds per resource and service tier; document rationale in the threshold register.

  5. Mistake: No mapping to BAI04 for audit readiness.
    Fix: Maintain a BAI04 evidence index and refresh it during each review cycle. 2

Enforcement context and risk implications

No public enforcement cases were provided in the source catalog for this requirement, so you should treat BAI04 primarily as a governance and auditability expectation under COBIT rather than a directly enforceable regulation in this dataset. The real risk is indirect: poor availability/capacity practices drive outages, SLA breaches, customer harm, and failed audits where you cannot demonstrate control operation. 1

A practical 30/60/90-day execution plan

First 30 days (stand up the minimum viable control)

  • Name owners (control owner + GRC steward) and publish the BAI04 scope.
  • Tier services by criticality and pick the initial in-scope list.
  • Create the Service Reliability Profile template and complete it for the highest-criticality services.
  • Identify the monitoring “source of truth” and capture baseline dashboard/alert evidence for those services.

By 60 days (make it operational and repeatable)

  • Publish the capacity management SOP and threshold register.
  • Start recurring reliability and capacity reviews with agendas and action logs.
  • Integrate review outputs with ITSM tickets (planned capacity expansions, tuning, resilience work).
  • Create the BAI04 evidence index and store artifacts in consistent locations.

By 90 days (prove it works and close gaps)

  • Run at least one full review cycle where actions are created and completed.
  • Test audit readiness: sample a service, trace targets → monitoring → reviews → actions → improved outcomes.
  • Add third-party dependencies for critical services and document how you monitor their impact (status feeds, synthetic checks, escalation paths).
  • If you use Daydream, centralize the BAI04 control narrative and evidence requests so IT Ops can attach dashboards, tickets, and review notes without email chasing.

Frequently Asked Questions

Do we need formal SLAs for every application to meet BAI04?

You need defined targets for in-scope services; they can be internal SLOs rather than external SLAs. Start with critical services and expand based on tiering. 2

What’s the minimum acceptable “capacity planning” for a small team?

A written SOP, thresholds for key resources, and a recurring trend review that produces tracked actions. Keep it lightweight, but make it repeatable and evidenced. 3

Our monitoring is strong, but we don’t keep meeting notes. Is that a problem?

Yes, often. Audits fail on missing evidence of review and decision-making; retain notes, action logs, and ticket links as proof the process runs. 2

How do we treat cloud autoscaling under BAI04?

Treat autoscaling configuration as capacity control. Document scaling limits, quotas, and trigger conditions, and review trend data to confirm scaling keeps pace with demand.

Does BAI04 apply to third-party SaaS outages that affect our service?

If a third party materially impacts your service availability, include dependency monitoring and escalation paths in the Service Reliability Profile and review process.

What should the control narrative say in one paragraph?

Describe scope (which services), targets (availability/performance), monitoring approach, review cadence, and how capacity risks become tracked actions with ownership and completion evidence. 2

Footnotes

  1. ISACA COBIT overview; OSA COBIT 2019 objective mapping

  2. ISACA COBIT overview

  3. OSA COBIT 2019 objective mapping

Frequently Asked Questions

Do we need formal SLAs for every application to meet BAI04?

You need defined targets for in-scope services; they can be internal SLOs rather than external SLAs. Start with critical services and expand based on tiering. (Source: ISACA COBIT overview)

What’s the minimum acceptable “capacity planning” for a small team?

A written SOP, thresholds for key resources, and a recurring trend review that produces tracked actions. Keep it lightweight, but make it repeatable and evidenced. (Source: OSA COBIT 2019 objective mapping)

Our monitoring is strong, but we don’t keep meeting notes. Is that a problem?

Yes, often. Audits fail on missing evidence of review and decision-making; retain notes, action logs, and ticket links as proof the process runs. (Source: ISACA COBIT overview)

How do we treat cloud autoscaling under BAI04?

Treat autoscaling configuration as capacity control. Document scaling limits, quotas, and trigger conditions, and review trend data to confirm scaling keeps pace with demand.

Does BAI04 apply to third-party SaaS outages that affect our service?

If a third party materially impacts your service availability, include dependency monitoring and escalation paths in the Service Reliability Profile and review process.

What should the control narrative say in one paragraph?

Describe scope (which services), targets (availability/performance), monitoring approach, review cadence, and how capacity risks become tracked actions with ownership and completion evidence. (Source: ISACA COBIT overview)

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream