MEASURE-3.2: Risk tracking approaches are considered for settings where AI risks are difficult to assess using currently available measurement techniques or where metrics are not yet available.

9 min readLast verified: February 2026By Isaac Silverman

MEASURE-3.2 requires you to define and run a “risk tracking” method for AI risks that you cannot reliably quantify yet, either because measurement techniques are immature or because no useful metrics exist. Operationally, you stand up qualitative and proxy indicators, escalation triggers, and a review cadence so these risks are actively monitored, owned, and reassessed until measurable metrics become available. ¹

Key takeaways:

Treat “unmeasurable” AI risk as trackable risk: assign owners, indicators, triggers, and review intervals. ¹
Use proxy metrics, structured expert judgment, and operational signals (incidents, complaints, drift flags) to track risk movement. ¹
Keep evidence that you identified measurement gaps, chose a tracking approach, reviewed it, and updated it as your measurement maturity improved. ¹

MEASURE-3.2 addresses a common failure mode in AI governance: teams wait for perfect metrics before they manage a risk. The NIST AI RMF expects the opposite. When an AI risk is hard to assess with currently available measurement techniques, or when there is no metric yet, you still need a disciplined way to track that risk over time. ¹

For a Compliance Officer, CCO, or GRC lead, the practical goal is audit-ready proof that your organization recognized the measurement gap, selected an appropriate tracking approach, and integrated it into your standard risk processes. That means owners, routines, thresholds, and decision points that generate evidence. It also means connecting “risk tracking” to real operational controls: incident intake, change management, model monitoring, and third-party oversight for AI services you do not fully control.

This page gives requirement-level implementation guidance you can execute quickly: a step-by-step procedure, artifacts to retain, exam questions to prepare for, and a 30/60/90-day plan. It is written to help you operationalize the target keyword: measure-3.2: risk tracking approaches are considered for settings where ai risks are difficult to assess using currently available measurement techniques or where metrics are not yet available. requirement. ¹

Regulatory text

NIST AI RMF (MEASURE-3.2) excerpt: “Risk tracking approaches are considered for settings where AI risks are difficult to assess using currently available measurement techniques or where metrics are not yet available.” ¹

What the operator must do:

Identify AI risk topics where measurement is currently weak or unavailable, 2) choose a tracking approach that still allows oversight (even if it is qualitative or proxy-based), 3) run that approach on an ongoing basis with named ownership, and 4) revisit the approach as better measurement becomes possible. Your documentation should show the decision logic and the operating rhythm, not just a statement of intent. ¹

Plain-English interpretation

If you cannot measure an AI risk well, you still need to manage it. MEASURE-3.2 expects you to track “direction and exposure” using structured methods, such as proxy indicators, structured expert judgment, scenario tracking, and event-driven signals (incidents, complaints, unexpected outputs), until you have mature metrics. ¹

Think of this as a bridge between:

Known, measurable risks (you have performance, bias, robustness, security metrics), and
Known, hard-to-measure risks (you suspect harms or uncertainty but you cannot quantify them reliably yet).

Your program needs a defined “what we do when we can’t measure it” path.

Who it applies to

Entity scope: Any organization developing or deploying AI systems, including those embedding third-party AI services into products or internal operations. ¹

Operational contexts where MEASURE-3.2 shows up most:

New AI use cases where ground truth is scarce (early rollout, novel domains)
Generative AI features where harms are contextual (brand, misinformation, IP, consumer deception)
Safety, fairness, or misuse risks where you can’t test all edge cases
Third-party AI where you lack full transparency into training data, model internals, or monitoring telemetry
Low-volume decisioning where statistical metrics are unstable but stakes are high

Functions you will need involved:

Model/product owners (accountable for operation and change)
GRC/compliance (controls, evidence, and challenge function)
Security (abuse monitoring, incident response)
Legal/privacy (rights, consumer impact, sensitive data)
Customer support/operations (complaints and escalation signals)
Procurement/TPRM (third-party controls and reporting)

What you actually need to do (step-by-step)

Step 1: Build an “AI measurement gap register”

Create a short register (spreadsheet or GRC workflow) listing AI risks that cannot be assessed well today. For each entry capture:

System/use case, owner, and deployment context
Risk statement (harm + actor + mechanism)
Why existing measurement is insufficient (no metric, low data volume, unclear ground truth, weak test method)
Impacted stakeholders (customers, employees, public)
Current control coverage (policies, guardrails, human review, rate limits)

This is the core proof that you recognized the MEASURE-3.2 condition. ¹

Step 2: Select a tracking approach per risk type

Pick a tracking method that creates signals over time. A practical menu:

A) Proxy indicators (leading signals)
Examples: rate of user overrides, percentage of outputs routed to human review, “high-risk prompt” detections, policy-violation classifier flags, or abnormal topic clusters in logs.

B) Outcome signals (lagging signals)
Examples: incident tickets, customer complaints, chargebacks/returns linked to AI decisions, downstream remediation volume, abuse reports.

C) Structured expert judgment

A small review panel uses a consistent rubric (severity, likelihood, detectability, exposure) to re-score risk at each review.
Keep the rubric stable so movement is meaningful.

D) Scenario and “watch list” tracking

Maintain a list of foreseeable harm scenarios and mark observed triggers, near-misses, and control gaps.

Your choice should be documented per risk entry: why this approach is appropriate, what it can detect, and what it cannot. ¹

Step 3: Define triggers and escalation paths

For each tracked risk, define:

Trigger event: what causes reassessment or escalation (incident, spike in complaints, model update, new data source, expansion to a new population)
Action: what happens (pause feature, increase sampling, add human-in-the-loop, open CAPA ticket, notify risk committee)
Decision owner: who can accept risk, who must be informed, who can stop deployment
Time expectations: set internal targets for response and review; treat these as your governance commitments (guidance, not regulatory numbers)

This is where MEASURE-3.2 becomes operational rather than theoretical. ¹

Step 4: Integrate tracking into BAU routines

Hard-to-measure risks die in one-off spreadsheets. Connect tracking to existing workflows:

Change management: every model update prompts review of the measurement gap register entries
Incident management: tag AI-related incidents and map them to tracked risks
Third-party management: require vendors to provide telemetry or periodic risk reports for relevant risks
Model monitoring: even basic monitoring (volume shifts, drift indicators) can serve as proxy signals when direct harm metrics are missing

If you already use a GRC tool, make the register a controlled object with ownership and recurring tasks. Daydream can house the control mapping, owners, and recurring evidence requests so the “risk tracking approach” actually produces audit-ready artifacts. ¹

Step 5: Reassess and graduate from proxies to metrics

Set a rule: each tracked risk must have a path to improved measurability, even if the date is unknown. Examples:

Collect labeled outcomes to enable future measurement
Run periodic red teaming to generate test cases
Improve logging to measure exposure
Add post-deployment evaluation studies where feasible

Document each maturity improvement and update the tracking plan accordingly. ¹

Required evidence and artifacts to retain

Auditors and internal reviewers will ask for proof of operation. Retain:

Policy/control mapping: where MEASURE-3.2 is implemented (control statement, owner, scope) ¹
AI measurement gap register: current and prior versions (change history matters)
Tracking plan per gap: indicators, rubric, data sources, limitations, triggers
Review records: meeting notes, risk committee decisions, tickets, sign-offs
Event evidence: incidents/complaints mapped to tracked risks, with actions taken
Third-party artifacts: contract clauses or attestations supporting needed telemetry/reporting
Exceptions: when tracking is not possible, document compensating controls and approval

Common exam/audit questions and hangups

Expect questions like:

“Show me risks you could not measure and how you tracked them anyway.” ¹
“Who owns each tracked risk, and what causes escalation?”
“How do you know the proxy indicators correlate to real-world harm?”
“What changed after incidents or near-misses?”
“For third-party AI, what reporting do you receive and how do you act on it?”
“Where is the evidence that reviews happened on schedule?”

Hangup to avoid: presenting a monitoring dashboard that only shows model performance metrics while the actual concern is a non-performance harm (consumer deception, misuse, reputational harm). MEASURE-3.2 is about tracking the risk you cannot directly measure. ¹

Frequent implementation mistakes and how to avoid them

Mistake: “No metric” becomes “no control.”
Fix: require a tracking plan entry for every measurement gap before production approval.
Mistake: proxies with no decision use.
Fix: define triggers and actions. If an indicator can’t drive a decision, it’s noise.
Mistake: inconsistent expert scoring.
Fix: use a stable rubric, keep reviewer training notes, and store prior scoring to show trend.
Mistake: third-party black box with zero telemetry.
Fix: add contractual reporting requirements, minimum incident notification, and audit/assurance rights where feasible. If you cannot get signals, document compensating controls (restricted use, human review, limited data exposure).
Mistake: tracking isn’t tied to change management.
Fix: treat model changes, prompt/template changes, and data source changes as trigger events for reassessing measurement gaps.

Enforcement context and risk implications

NIST AI RMF is a framework, not a regulator, so you should not expect “MEASURE-3.2 fines.” Your risk is indirect: if an AI harm occurs and you cannot show you tracked known-but-unmeasured risks, you will struggle to defend your governance, oversight, and reasonableness to regulators, customers, and plaintiffs. MEASURE-3.2 is also a procurement and third-party risk issue: black-box AI increases the chance that you cannot measure or detect emerging harms. ²

A practical 30/60/90-day execution plan

First 30 days (stand up the control)

Assign a control owner and define scope: which AI systems and third-party AI services are in scope. ¹
Create the AI measurement gap register and seed it using:
- existing risk assessments
- incident/complaint themes
- known model limitations
Draft a standard tracking plan template (indicators, rubric, triggers, evidence).

Days 31–60 (operate it for real)

For each register entry, select a tracking approach (proxy, outcome, expert rubric, scenario watch list). ¹
Implement intake pipes:
- a way to tag AI incidents and complaints
- a change-management trigger for AI updates
Run the first formal review meeting and record decisions, deltas, and action items.

Days 61–90 (make it audit-ready and scalable)

Add recurring evidence collection (review notes, dashboards/screenshots, ticket exports) into your GRC calendar.
Formalize third-party requirements for AI telemetry and incident notification in procurement checklists.
Publish a short internal standard: “What we do when AI risks aren’t measurable yet,” mapped to MEASURE-3.2. ¹
If you use Daydream, configure workflows for owners, reminders, and evidence requests so the register, reviews, and artifacts stay current without manual chasing. ¹

Frequently Asked Questions

What counts as a “risk tracking approach” if we don’t have metrics?

A documented method that produces consistent signals over time, like proxy indicators, structured expert scoring, scenario watch lists, and incident/complaint trend review. You also need triggers and defined actions so tracking drives decisions. ¹

How do we justify that a proxy indicator is valid?

Document the rationale and limitations, then validate qualitatively by checking whether proxy movement aligns with real events (incidents, escalations, human-review outcomes). Keep the validation notes as part of the tracking plan evidence. ¹

Does MEASURE-3.2 require human review?

No specific control is mandated in the text. Human review is one common compensating control when direct measurement is weak, but you can also use technical monitoring, restricted deployment, or structured expert judgment as long as it is defined and operated. ¹

How should we handle third-party AI systems where we can’t access logs?

Treat lack of telemetry as a measurement gap, then require reporting through contract terms, periodic assurance artifacts, and incident notification. If you cannot obtain signals, restrict use cases and document compensating controls and approvals. ¹

What evidence is strongest for auditors?

A measurement gap register with named owners, tracking plans with triggers, and dated review outputs that show decisions and follow-through (tickets closed, controls added, scope restricted). Static policies without operating records rarely hold up. ¹

When can we retire a measurement gap entry?

When you can demonstrate a reliable measurement technique exists and is in operation, and you have transitioned from qualitative/proxy tracking to defined metrics with monitoring and thresholds. Record the “graduation” decision and the new metric owner. ¹

Frequently Asked Questions

What counts as a “risk tracking approach” if we don’t have metrics?

How do we justify that a proxy indicator is valid?

Does MEASURE-3.2 require human review?

How should we handle third-party AI systems where we can’t access logs?

What evidence is strongest for auditors?

When can we retire a measurement gap entry?

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream

Regulatory text

Plain-English interpretation

Who it applies to

What you actually need to do (step-by-step)

Step 1: Build an “AI measurement gap register”

Step 2: Select a tracking approach per risk type

Step 3: Define triggers and escalation paths

Step 4: Integrate tracking into BAU routines

Step 5: Reassess and graduate from proxies to metrics

Required evidence and artifacts to retain

Common exam/audit questions and hangups

Frequent implementation mistakes and how to avoid them

Enforcement context and risk implications

A practical 30/60/90-day execution plan

First 30 days (stand up the control)

Days 31–60 (operate it for real)

Days 61–90 (make it audit-ready and scalable)

Frequently Asked Questions

What counts as a “risk tracking approach” if we don’t have metrics?

How do we justify that a proxy indicator is valid?

Does MEASURE-3.2 require human review?

How should we handle third-party AI systems where we can’t access logs?

What evidence is strongest for auditors?

When can we retire a measurement gap entry?

Footnotes

Frequently Asked Questions

Related Resources

Operationalize this requirement