DSS03: Managed Problems
To meet the dss03: managed problems requirement, you need a documented, repeatable problem management process that finds root causes behind recurring incidents, tracks problems through remediation, and proves the fixes worked. Operationalize DSS03 by defining ownership, intake and prioritization rules, RCA standards, known error records, and metrics, then retain evidence that problems are identified, resolved, and reviewed.
Key takeaways:
- Separate incident restoration from problem root-cause removal, and show both in tickets and reports.
- Require consistent RCA and remediation tracking, including “known error” and workaround documentation.
- Evidence matters: auditors will look for end-to-end traceability from recurring incidents to verified fixes.
DSS03 (Managed Problems) in COBIT 2019 is where many IT and security programs either mature quickly or stall. Most teams are already good at incident response: restore service, communicate impact, close the ticket. DSS03 asks for something different: a control system that prevents the same failure mode from returning by managing problems as first-class work, with governance, prioritization, root cause analysis (RCA), and verified remediation.
For a Compliance Officer, CCO, or GRC lead, the fastest path is to treat DSS03 as an evidence-backed workflow requirement. You are not writing a “policy” to satisfy a framework; you are proving that the organization reliably detects patterns, assigns accountability, performs RCA with defined standards, manages known errors and workarounds, and reduces recurrence risk over time.
This page gives requirement-level guidance you can implement quickly: who owns what, how to structure the workflow in an ITSM tool, what artifacts to retain, what auditors ask for, and how to avoid the most common failure modes (like “RCA done in someone’s head” or “problem tickets opened but never closed”).
Regulatory text
Framework excerpt (provided): “COBIT 2019 objective DSS03 implementation expectation.” 1
Operator meaning: You must implement DSS03 as an operating capability, not a statement of intent. For exam readiness, your program should show:
- A defined scope and ownership model for problem management.
- A repeatable method to identify and log problems (often from recurring/high-impact incidents).
- RCA and corrective action practices tied to changes and preventative controls.
- Measurable outcomes and review, backed by durable evidence.
1
Plain-English interpretation (what DSS03 requires)
DSS03 requires you to manage the underlying causes of incidents so the organization reduces recurrence and systemic outages. Incidents are “restore service fast.” Problems are “stop this from happening again.”
In practice, auditors and internal assurance teams typically treat DSS03 as satisfied only when you can demonstrate:
- Problems are consistently identified (not ad hoc).
- Problems are prioritized based on risk and impact.
- RCA is performed using a documented standard.
- Remediation actions are assigned, tracked, and linked to change management.
- Outcomes are verified (fix worked) and knowledge is captured (known errors/workarounds).
1
Who it applies to
Entity scope
- Enterprise IT organizations implementing COBIT 2019 governance and management objectives. 1
Operational context (where it matters most)
You should scope DSS03 to any environment where incident recurrence creates material operational risk, including:
- Production applications and infrastructure
- Identity and access services
- Security tooling and monitoring pipelines
- Third-party provided systems where you depend on their uptime (treat as “shared problem ownership” via contracts and SLAs)
If your organization runs an ITSM platform (ServiceNow, Jira Service Management, etc.), DSS03 should be implemented directly in the ticketing workflow so evidence is created as work happens.
What you actually need to do (step-by-step)
Step 1: Assign ownership and define the minimum workflow
Create a one-page Problem Management Standard with:
- Process owner (ITSM owner, SRE/IT Ops leader, or Service Delivery)
- Required roles: Problem Manager, service owners, resolver groups
- RACI for: opening a problem, approving RCA, accepting risk, closing the problem
Decision point: If you do not have a dedicated Problem Manager, assign the role to Service Reliability/SRE or IT Ops and make it explicit.
Step 2: Define triggers (when a problem must be opened)
Write objective triggers so teams cannot “forget” to do problem management. Common triggers:
- Recurring incidents with the same failure signature
- High-impact incidents (business critical service or security control failure)
- Trending alert patterns (capacity, latency, errors)
- Third-party recurring outages affecting your service commitments
Document these triggers in the standard and configure ITSM routing rules where possible.
Step 3: Standardize classification and prioritization
Define fields required on every problem record:
- Service/application affected (from service catalog/CMDB if you have it)
- Business impact statement
- Risk rating method (simple qualitative tiers are acceptable if consistent)
- Linkage to related incidents (“child incidents”)
Audit reality: If prioritization is “tribal knowledge,” you will struggle to defend why systemic risks sat unaddressed.
Step 4: Set an RCA requirement that produces usable output
Create an RCA template that requires:
- Incident timeline (what happened, when)
- Fault domain (app, infra, network, identity, third party)
- Root cause statement (single sentence, testable)
- Contributing factors (monitoring gaps, runbook gaps, config drift)
- Corrective actions (permanent fixes) and preventive actions (controls/monitoring)
- Owner and due date for each action
- Verification method (what proof shows the fix works)
Keep RCA lightweight for low-risk problems, deeper for critical services. The key is consistency and evidence.
Step 5: Manage Known Errors and workarounds
Maintain a Known Error Record (KER) pattern:
- Problem ticket references a known error entry when root cause is understood but not yet fixed
- Workaround steps are documented (runbook link or embedded steps)
- Communications plan exists for frontline support (so incidents are handled consistently)
This is where DSS03 improves incident handling quality while remediation is in progress.
Step 6: Link remediation to change management and engineering work
Make the problem ticket the umbrella record, then link:
- Change requests (CAB/standard change)
- Engineering epics/stories
- Configuration changes
- Third-party support cases
Control expectation: A fix that is not tracked through change control is hard to defend in audit and hard to sustain operationally. 2
Step 7: Verify resolution and close with quality gates
Before closure, require:
- Evidence the corrective action deployed (change record link)
- Evidence the fix reduced recurrence (monitoring trend, incident count comparison, or post-implementation review notes)
- Knowledge captured (KER closed or updated, runbook updated)
Step 8: Establish metrics and governance review
Define a monthly or quarterly review cadence with:
- Top recurring incident categories and open problems by risk
- Aging problems and overdue corrective actions
- Themes (e.g., capacity planning, patching gaps, third-party instability)
- Lessons learned and control improvements
If you use Daydream for control management, map each artifact (standard, ticket samples, RCA template, metrics, governance minutes) directly to DSS03 so evidence collection is continuous rather than a scramble during audit.
Required evidence and artifacts to retain
Keep evidence in the systems where work occurs, then export for audit packets when needed.
Core artifacts (minimum viable evidence):
- Problem Management Standard (owner, scope, triggers, workflow)
- RCA template and at least several completed RCAs that follow it
- Problem tickets with:
- links to related incidents
- impact and prioritization fields completed
- corrective/preventive actions with owners
- links to change records and/or engineering work items
- Known Error / workaround documentation (runbooks or KER entries)
- Governance evidence: review meeting notes, action item logs, KPI/KRI snapshots
1
Retention tip: Store immutable exports (PDF snapshots or audit exports) for representative samples, because ITSM records can change over time.
Common exam/audit questions and hangups
Auditors typically test DSS03 by sampling incident history and asking you to prove the root cause loop was closed. Expect questions like:
- “Show recurring incidents and the related problem records.”
- “How do you decide a problem must be opened?”
- “Where is RCA documented, and who approves it?”
- “How do you verify the fix worked?”
- “How do you manage known errors and communicate workarounds?”
- “How do you ensure changes tied to problems follow change control?”
Hangup: Teams provide incident tickets only. DSS03 needs a separate problem record with RCA and corrective actions.
Frequent implementation mistakes (and how to avoid them)
-
Problem tickets opened as placeholders, then abandoned.
Fix: require a governance review of aging problems and an explicit risk-acceptance path for deprioritized items. -
RCA is inconsistent or performed only after major outages.
Fix: publish a lightweight RCA tiering model and enforce the same required fields every time. -
No linkage between incidents, problems, and changes.
Fix: make linking mandatory fields in ITSM; reject closure without links. -
Workarounds live in chat threads.
Fix: require runbook updates or known error entries as a closure gate. -
Third-party-caused incidents are excluded.
Fix: open problems that track vendor/third-party root causes, contract escalations, and compensating controls; include the third party ticket ID as evidence.
Risk implications (why operators treat DSS03 as material)
Without managed problems, you accumulate repeat outages, operational toil, and control failures that surface as availability incidents, security monitoring gaps, and missed SLAs. In regulated environments, repeat incidents also invite scrutiny about governance effectiveness: leadership knew the pattern existed, but the organization did not remove the root cause.
DSS03 evidence also supports broader assurance narratives: operational resilience, change governance, and service reliability.
Practical 30/60/90-day execution plan
First 30 days (stand up the control)
- Assign the DSS03 process owner and name a Problem Manager function.
- Publish a one-page Problem Management Standard and RCA template.
- Configure ITSM minimum fields and linkage requirements (incidents ↔ problems ↔ changes).
- Start a pilot with one or two critical services.
Days 31–60 (make it repeatable)
- Train incident managers, service owners, and resolver teams on triggers and RCA expectations.
- Create a Known Error / workaround publishing path (runbook library or knowledge base).
- Run the first governance review; produce a metrics snapshot and action log.
Days 61–90 (make it auditable)
- Expand scope to all tier-1 services and common shared platforms (identity, network, monitoring).
- Perform a self-audit: sample recurring incidents and confirm a corresponding problem record exists with RCA and verified remediation.
- Map DSS03 artifacts in Daydream (or your GRC system) so evidence is continuously collected and tied to control performance.
Frequently Asked Questions
Do we need a separate “problem ticket” if we already do post-incident reviews?
Yes, if your post-incident review is not tracked as a managed record with owners, corrective actions, and closure evidence. You can implement DSS03 using PIRs, but you still need traceability from incidents to remediation and verification.
What counts as acceptable root cause analysis for DSS03?
A consistent RCA output that identifies a testable root cause, contributing factors, and corrective/preventive actions with ownership. The key is repeatability and evidence, not a specific RCA methodology. 2
How do we handle problems where the root cause is a third party?
Open a problem anyway and track your side of the remediation: escalation records, vendor ticket IDs, contractual remedies, and compensating controls (monitoring, failover, workaround procedures). Auditors want to see management of the risk, not blame assignment.
We’re a small team. What is the minimum viable DSS03 implementation?
Define triggers, require a lightweight RCA template, and run a regular review of open problems and corrective actions. Keep the evidence in your ticketing system and export samples for audit.
What evidence is strongest in an audit?
End-to-end traceability: recurring incidents linked to a problem record, an RCA with clear corrective actions, a change record showing the fix, and verification that recurrence dropped or the control gap closed.
How does DSS03 connect to change management?
Most permanent fixes require changes. Tie each corrective action to a change record or engineering work item, and require that linkage before you close the problem.
Footnotes
Frequently Asked Questions
Do we need a separate “problem ticket” if we already do post-incident reviews?
Yes, if your post-incident review is not tracked as a managed record with owners, corrective actions, and closure evidence. You can implement DSS03 using PIRs, but you still need traceability from incidents to remediation and verification.
What counts as acceptable root cause analysis for DSS03?
A consistent RCA output that identifies a testable root cause, contributing factors, and corrective/preventive actions with ownership. The key is repeatability and evidence, not a specific RCA methodology. (Source: ISACA COBIT overview)
How do we handle problems where the root cause is a third party?
Open a problem anyway and track your side of the remediation: escalation records, vendor ticket IDs, contractual remedies, and compensating controls (monitoring, failover, workaround procedures). Auditors want to see management of the risk, not blame assignment.
We’re a small team. What is the minimum viable DSS03 implementation?
Define triggers, require a lightweight RCA template, and run a regular review of open problems and corrective actions. Keep the evidence in your ticketing system and export samples for audit.
What evidence is strongest in an audit?
End-to-end traceability: recurring incidents linked to a problem record, an RCA with clear corrective actions, a change record showing the fix, and verification that recurrence dropped or the control gap closed.
How does DSS03 connect to change management?
Most permanent fixes require changes. Tie each corrective action to a change record or engineering work item, and require that linkage before you close the problem.
Operationalize this requirement
Map requirement text to controls, owners, evidence, and review workflows inside Daydream.
See Daydream