MEASURE-2.7: AI system security and resilience – as identified in the map function – are evaluated and documented.

10 min readLast verified: February 2026By Isaac Silverman

MEASURE-2.7 requires you to evaluate the AI system security and resilience risks you already identified in your MAP activities, then document the results in a repeatable, reviewable way. Operationalize it by running a scoped security/resilience assessment per AI system (and key third parties), recording test methods, findings, decisions, and residual risk, and keeping evidence current through change management. ¹

Key takeaways:

Treat MEASURE-2.7 as an assessment-and-evidence requirement tied to your MAP-identified threats and dependencies. ¹
“Evaluated” means you performed tests or structured reviews; “documented” means an auditor can retrace scope, method, results, and sign-off. ¹
The fastest path is to map MEASURE-2.7 to an owner, procedure, and recurring evidence collection cadence per AI system. ¹

MEASURE-2.7 sits in the “MEASURE” function of the NIST AI Risk Management Framework and focuses on security and resilience for AI systems, specifically the security and resilience issues you already surfaced during the “MAP” function (system context, dependencies, threats, intended use, and operational environment). The operational implication is straightforward: you do not get credit for having a threat model or architecture diagram alone. You need to show that you assessed the identified security and resilience concerns and can produce documentation that supports governance decisions. ¹

For a Compliance Officer, CCO, or GRC lead, this requirement is easiest to execute if you frame it as a control: “For each AI system in scope, the organization performs and documents a security and resilience evaluation aligned to MAP-identified risks, with defined owners and recurring evidence.” That framing makes audits simpler because it creates a consistent file structure, standard assessment template, and clear accountability. ¹

This page gives requirement-level implementation guidance you can put into a runbook, assign to control owners, and verify with evidence in an assessment repository or GRC system.

Regulatory text

Text (excerpt): “AI system security and resilience – as identified in the map function – are evaluated and documented.” ¹

Operator interpretation (what you must do):

Use MAP outputs as the authoritative input set (threats, dependencies, failure modes, system boundaries, data flows, third parties, and operational environment). ¹
Evaluate security and resilience against those MAP-identified items using tests, reviews, and/or structured analyses appropriate to the AI system and its deployment context. ¹
Document the evaluation so a reviewer can reconstruct: what was in scope, what method was used, what was found, what was decided, and who approved the residual risk. ¹

Plain-English interpretation of the requirement

MEASURE-2.7 is asking: “Did you actually check whether the AI system can withstand realistic security threats and operational disruptions, based on the risks you already mapped, and can you prove it with documentation?” ¹

Security here commonly includes issues such as:

Model/service compromise, abuse, or unauthorized access paths
Data poisoning or integrity loss across the ML supply chain
Prompt injection or tool/function misuse for AI systems with agentic behavior
Secrets exposure, training data leakage, and unsafe logging
Resilience covers:
Ability to continue operating safely under degraded conditions
Robust monitoring, fallback modes, rollback plans, and recovery procedures
Dependency failures (upstream APIs, model providers, feature stores, identity systems, key management)
These examples are implementation-relevant interpretations of “security and resilience,” and your MAP function should tell you which ones matter for your system. ¹

Who it applies to (entity and operational context)

Applies to:

Any organization developing, integrating, or deploying AI systems where security and resilience are relevant to mission, customers, safety, compliance, or critical business processes. ¹

Operational contexts where auditors will focus:

Externally facing AI features (customer support, recommendations, identity/fraud, underwriting, content moderation)
High-impact internal decision support (HR screening, credit risk, compliance surveillance)
AI systems that depend on third parties (hosted model APIs, managed vector databases, labeling providers, data brokers, cloud platforms)
AI systems with privileged integrations (tools that can execute actions, send emails, move funds, change records, or open tickets)

What you actually need to do (step-by-step)

1) Define scope per AI system (tie to MAP inventory)

Create an “AI System MEASURE-2.7 Assessment Record” for each in-scope AI system:

System name, owner, environment(s), version/model identifier
Intended use and user groups
Critical dependencies and third parties (model provider, hosting, data pipelines, monitoring tools)
MAP-identified security and resilience risks as the starting checklist (paste them in, don’t restate from memory) ¹

Practical tip: If your MAP output is scattered (tickets, docs, whiteboards), consolidate it into one canonical MAP register per system before you assess MEASURE-2.7. MEASURE can’t be repeatable if MAP isn’t retrievable.

2) Select evaluation methods that match the risks

You need a defensible method-to-risk mapping. Common evaluation methods include:

Secure architecture review and threat modeling refresh (based on current deployment)
Access control review (service accounts, secrets handling, RBAC, network segmentation)
Adversarial testing appropriate to the system (abuse cases, prompt injection tests, data exfil paths)
Dependency resilience review (rate limits, failover, circuit breakers, model/provider outage plan)
Logging/monitoring validation (alerts, detection rules, audit logging, data retention)
Your deliverable: a table that maps each MAP-identified risk to one or more evaluation activities and expected evidence. ¹

3) Execute the evaluation and record results in a standard template

For each evaluation activity, record:

Date performed and performer (team/role; third party involvement if any)
Tooling used (scanner, test harness, red-team script set, checklist)
Scope boundaries (what was tested, what was excluded, and why)
Findings (include severity rubric you use internally; avoid vague “passed/failed”)
Impact analysis (security, availability, integrity, and downstream business effects)
Recommended remediation and owner
Residual risk decision and approver (risk acceptance, mitigation, transfer, or decommission path) ¹

4) Document resilience behaviors and “safe failure” paths

Resilience documentation is where many programs fail. For each AI system, document:

What happens on dependency failure (model API down, vector DB degraded, identity outage)
Degraded mode behavior (fallback to rules, cached responses, human review, or feature disablement)
Rollback plan for model releases and prompt/tool changes
Incident response hooks (on-call ownership, escalation path, customer comms trigger points)
Keep it concrete. A diagram plus a written narrative is usually easiest to audit.

5) Close the loop with change management (keep evidence current)

MEASURE-2.7 is not “one and done.” Define triggers that require reassessment and evidence refresh:

Model or prompt/tooling changes that alter behavior
New data sources, new integrations, or new third parties
Significant architecture changes (new gateways, new auth paths, new hosting)
New abuse patterns detected in production
Your procedure should state how changes are reviewed, who approves, and where updated evidence is stored. ¹

6) Assign ownership and recurring evidence collection

Make this auditable by design:

Control owner (usually AI product/security owner; GRC coordinates)
Assessment performers (AppSec, ML engineering, SRE, privacy, third-party risk)
Evidence repository location and naming convention
A recurring collection plan aligned to your internal governance cycles and meaningful change events ¹

Where Daydream fits (practitioner use case): If you struggle to keep MEASURE-2.7 evidence consistent across many AI systems and third parties, Daydream can track the control mapping, owner assignments, and recurring evidence requests so you can show auditors a complete, current record without chasing files across tools.

Required evidence and artifacts to retain

Retain artifacts that prove both evaluation and documentation quality:

Core artifacts ²:

MAP outputs referenced (system context, dependency list, threat register) ¹
MEASURE-2.7 assessment plan (risk-to-test mapping table)
Completed assessment report with findings and remediation tickets
Resilience runbook: failover/degraded mode, rollback steps, monitoring/alert coverage
Risk acceptance/exception records (with approver identity and rationale)
Change management records showing reassessment triggers were honored

Third-party artifacts (where applicable):

Third-party dependency register for the AI system (model/API providers, data providers)
Due diligence results tied to AI security/resilience risks (contract/SLA clauses, incident notification, availability commitments where relevant)
Evidence of ongoing monitoring (security advisories, provider status tracking, material change notices)

Common exam/audit questions and hangups

Auditors and internal reviewers often ask:

“Show me the MAP-identified risks and the specific tests you ran against them.” ¹
“How do you know the documentation is current after model updates?”
“What did you exclude from scope, and who approved the exclusion?”
“Where are the resilience procedures tested or validated?”
“How do third-party dependencies affect resilience, and what controls exist?”

Hangup to anticipate: “We have a SOC 2 report / cloud provider assurances.” That does not substitute for an AI-system-specific security and resilience evaluation. You still need to document how you assessed your configuration, integrations, and AI-specific abuse paths.

Frequent implementation mistakes and how to avoid them

Mistake	Why it fails MEASURE-2.7	How to avoid it
Treating MEASURE-2.7 as a policy statement	Policy doesn’t prove evaluation occurred	Require a per-system assessment record with dated execution evidence ¹
Generic security checklist not tied to MAP	Breaks traceability to identified risks	Build a risk-to-test mapping table using MAP outputs as inputs ¹
No resilience content beyond “we have backups”	Resilience is system behavior under stress	Document degraded mode, rollback, dependency failure handling, monitoring and escalation paths
Evidence scattered across tools	Audits fail on retrieval and completeness	Standardize folder structure, naming conventions, and ownership; centralize in a GRC repository
No third-party angle	Many AI systems inherit risk from providers	Include third-party dependencies in scope and document assurance gaps and compensating controls

Enforcement context and risk implications

No public enforcement cases were provided in the source catalog for this requirement, and NIST AI RMF is a framework rather than a penalty-backed regulation in the provided materials. ³

Operationally, MEASURE-2.7 still matters because weak security/resilience documentation increases:

The chance of missed control gaps (prompt injection paths, insecure tool calls, over-privileged service accounts)
The chance of inconsistent risk acceptance decisions across teams
The time-to-respond during incidents, since you lack a clear resilience plan and system boundaries

Practical 30/60/90-day execution plan

First 30 days (Immediate: establish control structure)

Name a control owner and define RACI across AppSec, ML engineering, SRE, privacy, and third-party risk.
Inventory AI systems in scope and link each to its MAP artifacts (or create a minimal MAP register if missing). ¹
Publish a standard MEASURE-2.7 template: scope, MAP risks, evaluation methods, findings, approvals, and evidence links.
Decide where evidence lives and standardize naming.

Next 60 days (Near-term: execute evaluations for highest-risk systems)

Prioritize systems by business impact and exposure (external-facing, privileged integrations, sensitive data).
Run security and resilience evaluations on the prioritized set and create remediation tickets with owners. ¹
Write resilience runbooks for each evaluated system, focused on dependency failures and rollback paths.
Establish change triggers that require reassessment and document them in your SDLC/ML lifecycle procedures.

By 90 days (Ongoing operations: make it repeatable)

Expand MEASURE-2.7 evaluations to remaining in-scope systems.
Add a recurring evidence collection workflow and periodic management review for open findings and risk acceptances. ¹
Integrate MEASURE-2.7 checks into release governance (model/prompt/tool changes) and third-party onboarding.
Run a tabletop incident scenario for one critical AI system and update resilience docs based on lessons learned.

Frequently Asked Questions

Do we need to perform penetration testing specifically for the AI model?

MEASURE-2.7 doesn’t mandate a specific test type, but you must evaluate the MAP-identified security and resilience risks with a method that is defensible and documented. If your MAP risks include adversarial abuse paths, you should test those paths directly and retain the results. ¹

What counts as “documented” for auditors?

Documentation should allow a reviewer to reconstruct scope, methods, findings, remediation, and residual risk approval without informal tribal knowledge. A completed assessment template plus linked evidence (tickets, logs, test outputs) is the usual standard. ¹

How do we handle third-party model APIs under MEASURE-2.7?

Include the third party as a dependency in scope and evaluate resilience and security at the integration layer: auth, rate limits, failover, monitoring, and data handling. Document what you could not test directly and what assurances you obtained through third-party due diligence.

We already did MAP work. Why isn’t that enough?

MEASURE-2.7 is explicitly about evaluation and documentation of security and resilience items identified in MAP. MAP identifies and frames risks; MEASURE demonstrates you assessed them and made governed decisions. ¹

What if the system changes weekly (prompts, tools, models)?

Define change triggers that require a refreshed MEASURE-2.7 evaluation and keep a lightweight “delta assessment” process for smaller changes. The key is traceable evidence that the evaluation stays current relative to operational changes.

How can we operationalize this across many AI systems without drowning in spreadsheets?

Standardize templates, centralize evidence, and assign clear owners per system so evidence collection is routine. If you need workflow, reminders, and audit-ready packaging across systems and third parties, a GRC platform such as Daydream can track control mapping and recurring evidence requests.

Frequently Asked Questions

Do we need to perform penetration testing specifically for the AI model?

What counts as “documented” for auditors?

How do we handle third-party model APIs under MEASURE-2.7?

We already did MAP work. Why isn’t that enough?

What if the system changes weekly (prompts, tools, models)?

How can we operationalize this across many AI systems without drowning in spreadsheets?

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream

Regulatory text

Plain-English interpretation of the requirement

Who it applies to (entity and operational context)

What you actually need to do (step-by-step)

1) Define scope per AI system (tie to MAP inventory)

2) Select evaluation methods that match the risks

3) Execute the evaluation and record results in a standard template

4) Document resilience behaviors and “safe failure” paths

5) Close the loop with change management (keep evidence current)

6) Assign ownership and recurring evidence collection

Required evidence and artifacts to retain

Common exam/audit questions and hangups

Frequent implementation mistakes and how to avoid them

Enforcement context and risk implications

Practical 30/60/90-day execution plan

First 30 days (Immediate: establish control structure)

Next 60 days (Near-term: execute evaluations for highest-risk systems)

By 90 days (Ongoing operations: make it repeatable)

Frequently Asked Questions

Do we need to perform penetration testing specifically for the AI model?

What counts as “documented” for auditors?

How do we handle third-party model APIs under MEASURE-2.7?

We already did MAP work. Why isn’t that enough?

What if the system changes weekly (prompts, tools, models)?

How can we operationalize this across many AI systems without drowning in spreadsheets?

Footnotes

Frequently Asked Questions

Related Resources

Operationalize this requirement