MEASURE-3.2: Risk tracking approaches are considered for settings where AI risks are difficult to assess using currently available measurement techniques or where metrics are not yet available.
MEASURE-3.2 requires you to define and run a “risk tracking” method for AI risks that you cannot reliably quantify yet, either because measurement techniques are immature or because no useful metrics exist. Operationally, you stand up qualitative and proxy indicators, escalation triggers, and a review cadence so these risks are actively monitored, owned, and reassessed until measurable metrics become available. 1
Key takeaways:
- Treat “unmeasurable” AI risk as trackable risk: assign owners, indicators, triggers, and review intervals. 1
- Use proxy metrics, structured expert judgment, and operational signals (incidents, complaints, drift flags) to track risk movement. 1
- Keep evidence that you identified measurement gaps, chose a tracking approach, reviewed it, and updated it as your measurement maturity improved. 1
MEASURE-3.2 addresses a common failure mode in AI governance: teams wait for perfect metrics before they manage a risk. The NIST AI RMF expects the opposite. When an AI risk is hard to assess with currently available measurement techniques, or when there is no metric yet, you still need a disciplined way to track that risk over time. 1
For a Compliance Officer, CCO, or GRC lead, the practical goal is audit-ready proof that your organization recognized the measurement gap, selected an appropriate tracking approach, and integrated it into your standard risk processes. That means owners, routines, thresholds, and decision points that generate evidence. It also means connecting “risk tracking” to real operational controls: incident intake, change management, model monitoring, and third-party oversight for AI services you do not fully control.
This page gives requirement-level implementation guidance you can execute quickly: a step-by-step procedure, artifacts to retain, exam questions to prepare for, and a 30/60/90-day plan. It is written to help you operationalize the target keyword: measure-3.2: risk tracking approaches are considered for settings where ai risks are difficult to assess using currently available measurement techniques or where metrics are not yet available. requirement. 1
Regulatory text
NIST AI RMF (MEASURE-3.2) excerpt: “Risk tracking approaches are considered for settings where AI risks are difficult to assess using currently available measurement techniques or where metrics are not yet available.” 1
What the operator must do:
- Identify AI risk topics where measurement is currently weak or unavailable, 2) choose a tracking approach that still allows oversight (even if it is qualitative or proxy-based), 3) run that approach on an ongoing basis with named ownership, and 4) revisit the approach as better measurement becomes possible. Your documentation should show the decision logic and the operating rhythm, not just a statement of intent. 1
Plain-English interpretation
If you cannot measure an AI risk well, you still need to manage it. MEASURE-3.2 expects you to track “direction and exposure” using structured methods, such as proxy indicators, structured expert judgment, scenario tracking, and event-driven signals (incidents, complaints, unexpected outputs), until you have mature metrics. 1
Think of this as a bridge between:
- Known, measurable risks (you have performance, bias, robustness, security metrics), and
- Known, hard-to-measure risks (you suspect harms or uncertainty but you cannot quantify them reliably yet).
Your program needs a defined “what we do when we can’t measure it” path.
Who it applies to
Entity scope: Any organization developing or deploying AI systems, including those embedding third-party AI services into products or internal operations. 1
Operational contexts where MEASURE-3.2 shows up most:
- New AI use cases where ground truth is scarce (early rollout, novel domains)
- Generative AI features where harms are contextual (brand, misinformation, IP, consumer deception)
- Safety, fairness, or misuse risks where you can’t test all edge cases
- Third-party AI where you lack full transparency into training data, model internals, or monitoring telemetry
- Low-volume decisioning where statistical metrics are unstable but stakes are high
Functions you will need involved:
- Model/product owners (accountable for operation and change)
- GRC/compliance (controls, evidence, and challenge function)
- Security (abuse monitoring, incident response)
- Legal/privacy (rights, consumer impact, sensitive data)
- Customer support/operations (complaints and escalation signals)
- Procurement/TPRM (third-party controls and reporting)
What you actually need to do (step-by-step)
Step 1: Build an “AI measurement gap register”
Create a short register (spreadsheet or GRC workflow) listing AI risks that cannot be assessed well today. For each entry capture:
- System/use case, owner, and deployment context
- Risk statement (harm + actor + mechanism)
- Why existing measurement is insufficient (no metric, low data volume, unclear ground truth, weak test method)
- Impacted stakeholders (customers, employees, public)
- Current control coverage (policies, guardrails, human review, rate limits)
This is the core proof that you recognized the MEASURE-3.2 condition. 1
Step 2: Select a tracking approach per risk type
Pick a tracking method that creates signals over time. A practical menu:
A) Proxy indicators (leading signals)
Examples: rate of user overrides, percentage of outputs routed to human review, “high-risk prompt” detections, policy-violation classifier flags, or abnormal topic clusters in logs.
B) Outcome signals (lagging signals)
Examples: incident tickets, customer complaints, chargebacks/returns linked to AI decisions, downstream remediation volume, abuse reports.
C) Structured expert judgment
- A small review panel uses a consistent rubric (severity, likelihood, detectability, exposure) to re-score risk at each review.
- Keep the rubric stable so movement is meaningful.
D) Scenario and “watch list” tracking
- Maintain a list of foreseeable harm scenarios and mark observed triggers, near-misses, and control gaps.
Your choice should be documented per risk entry: why this approach is appropriate, what it can detect, and what it cannot. 1
Step 3: Define triggers and escalation paths
For each tracked risk, define:
- Trigger event: what causes reassessment or escalation (incident, spike in complaints, model update, new data source, expansion to a new population)
- Action: what happens (pause feature, increase sampling, add human-in-the-loop, open CAPA ticket, notify risk committee)
- Decision owner: who can accept risk, who must be informed, who can stop deployment
- Time expectations: set internal targets for response and review; treat these as your governance commitments (guidance, not regulatory numbers)
This is where MEASURE-3.2 becomes operational rather than theoretical. 1
Step 4: Integrate tracking into BAU routines
Hard-to-measure risks die in one-off spreadsheets. Connect tracking to existing workflows:
- Change management: every model update prompts review of the measurement gap register entries
- Incident management: tag AI-related incidents and map them to tracked risks
- Third-party management: require vendors to provide telemetry or periodic risk reports for relevant risks
- Model monitoring: even basic monitoring (volume shifts, drift indicators) can serve as proxy signals when direct harm metrics are missing
If you already use a GRC tool, make the register a controlled object with ownership and recurring tasks. Daydream can house the control mapping, owners, and recurring evidence requests so the “risk tracking approach” actually produces audit-ready artifacts. 1
Step 5: Reassess and graduate from proxies to metrics
Set a rule: each tracked risk must have a path to improved measurability, even if the date is unknown. Examples:
- Collect labeled outcomes to enable future measurement
- Run periodic red teaming to generate test cases
- Improve logging to measure exposure
- Add post-deployment evaluation studies where feasible
Document each maturity improvement and update the tracking plan accordingly. 1
Required evidence and artifacts to retain
Auditors and internal reviewers will ask for proof of operation. Retain:
- Policy/control mapping: where MEASURE-3.2 is implemented (control statement, owner, scope) 1
- AI measurement gap register: current and prior versions (change history matters)
- Tracking plan per gap: indicators, rubric, data sources, limitations, triggers
- Review records: meeting notes, risk committee decisions, tickets, sign-offs
- Event evidence: incidents/complaints mapped to tracked risks, with actions taken
- Third-party artifacts: contract clauses or attestations supporting needed telemetry/reporting
- Exceptions: when tracking is not possible, document compensating controls and approval
Common exam/audit questions and hangups
Expect questions like:
- “Show me risks you could not measure and how you tracked them anyway.” 1
- “Who owns each tracked risk, and what causes escalation?”
- “How do you know the proxy indicators correlate to real-world harm?”
- “What changed after incidents or near-misses?”
- “For third-party AI, what reporting do you receive and how do you act on it?”
- “Where is the evidence that reviews happened on schedule?”
Hangup to avoid: presenting a monitoring dashboard that only shows model performance metrics while the actual concern is a non-performance harm (consumer deception, misuse, reputational harm). MEASURE-3.2 is about tracking the risk you cannot directly measure. 1
Frequent implementation mistakes and how to avoid them
-
Mistake: “No metric” becomes “no control.”
Fix: require a tracking plan entry for every measurement gap before production approval. -
Mistake: proxies with no decision use.
Fix: define triggers and actions. If an indicator can’t drive a decision, it’s noise. -
Mistake: inconsistent expert scoring.
Fix: use a stable rubric, keep reviewer training notes, and store prior scoring to show trend. -
Mistake: third-party black box with zero telemetry.
Fix: add contractual reporting requirements, minimum incident notification, and audit/assurance rights where feasible. If you cannot get signals, document compensating controls (restricted use, human review, limited data exposure). -
Mistake: tracking isn’t tied to change management.
Fix: treat model changes, prompt/template changes, and data source changes as trigger events for reassessing measurement gaps.
Enforcement context and risk implications
NIST AI RMF is a framework, not a regulator, so you should not expect “MEASURE-3.2 fines.” Your risk is indirect: if an AI harm occurs and you cannot show you tracked known-but-unmeasured risks, you will struggle to defend your governance, oversight, and reasonableness to regulators, customers, and plaintiffs. MEASURE-3.2 is also a procurement and third-party risk issue: black-box AI increases the chance that you cannot measure or detect emerging harms. 2
A practical 30/60/90-day execution plan
First 30 days (stand up the control)
- Assign a control owner and define scope: which AI systems and third-party AI services are in scope. 1
- Create the AI measurement gap register and seed it using:
- existing risk assessments
- incident/complaint themes
- known model limitations
- Draft a standard tracking plan template (indicators, rubric, triggers, evidence).
Days 31–60 (operate it for real)
- For each register entry, select a tracking approach (proxy, outcome, expert rubric, scenario watch list). 1
- Implement intake pipes:
- a way to tag AI incidents and complaints
- a change-management trigger for AI updates
- Run the first formal review meeting and record decisions, deltas, and action items.
Days 61–90 (make it audit-ready and scalable)
- Add recurring evidence collection (review notes, dashboards/screenshots, ticket exports) into your GRC calendar.
- Formalize third-party requirements for AI telemetry and incident notification in procurement checklists.
- Publish a short internal standard: “What we do when AI risks aren’t measurable yet,” mapped to MEASURE-3.2. 1
- If you use Daydream, configure workflows for owners, reminders, and evidence requests so the register, reviews, and artifacts stay current without manual chasing. 1
Frequently Asked Questions
What counts as a “risk tracking approach” if we don’t have metrics?
A documented method that produces consistent signals over time, like proxy indicators, structured expert scoring, scenario watch lists, and incident/complaint trend review. You also need triggers and defined actions so tracking drives decisions. 1
How do we justify that a proxy indicator is valid?
Document the rationale and limitations, then validate qualitatively by checking whether proxy movement aligns with real events (incidents, escalations, human-review outcomes). Keep the validation notes as part of the tracking plan evidence. 1
Does MEASURE-3.2 require human review?
No specific control is mandated in the text. Human review is one common compensating control when direct measurement is weak, but you can also use technical monitoring, restricted deployment, or structured expert judgment as long as it is defined and operated. 1
How should we handle third-party AI systems where we can’t access logs?
Treat lack of telemetry as a measurement gap, then require reporting through contract terms, periodic assurance artifacts, and incident notification. If you cannot obtain signals, restrict use cases and document compensating controls and approvals. 1
What evidence is strongest for auditors?
A measurement gap register with named owners, tracking plans with triggers, and dated review outputs that show decisions and follow-through (tickets closed, controls added, scope restricted). Static policies without operating records rarely hold up. 1
When can we retire a measurement gap entry?
When you can demonstrate a reliable measurement technique exists and is in operation, and you have transitioned from qualitative/proxy tracking to defined metrics with monitoring and thresholds. Record the “graduation” decision and the new metric owner. 1
Footnotes
Frequently Asked Questions
What counts as a “risk tracking approach” if we don’t have metrics?
A documented method that produces consistent signals over time, like proxy indicators, structured expert scoring, scenario watch lists, and incident/complaint trend review. You also need triggers and defined actions so tracking drives decisions. (Source: NIST AI RMF Core)
How do we justify that a proxy indicator is valid?
Document the rationale and limitations, then validate qualitatively by checking whether proxy movement aligns with real events (incidents, escalations, human-review outcomes). Keep the validation notes as part of the tracking plan evidence. (Source: NIST AI RMF Core)
Does MEASURE-3.2 require human review?
No specific control is mandated in the text. Human review is one common compensating control when direct measurement is weak, but you can also use technical monitoring, restricted deployment, or structured expert judgment as long as it is defined and operated. (Source: NIST AI RMF Core)
How should we handle third-party AI systems where we can’t access logs?
Treat lack of telemetry as a measurement gap, then require reporting through contract terms, periodic assurance artifacts, and incident notification. If you cannot obtain signals, restrict use cases and document compensating controls and approvals. (Source: NIST AI RMF Core)
What evidence is strongest for auditors?
A measurement gap register with named owners, tracking plans with triggers, and dated review outputs that show decisions and follow-through (tickets closed, controls added, scope restricted). Static policies without operating records rarely hold up. (Source: NIST AI RMF Core)
When can we retire a measurement gap entry?
When you can demonstrate a reliable measurement technique exists and is in operation, and you have transitioned from qualitative/proxy tracking to defined metrics with monitoring and thresholds. Record the “graduation” decision and the new metric owner. (Source: NIST AI RMF Core)
Operationalize this requirement
Map requirement text to controls, owners, evidence, and review workflows inside Daydream.
See Daydream