MANAGE-3.2: Pre-trained models which are used for development are monitored as part of AI system regular monitoring and maintenance.

10 min readLast verified: February 2026By Isaac Silverman

MANAGE-3.2 requires you to treat any pre-trained model you rely on during development (third-party foundation models, open-source checkpoints, or internal base models) as a monitored dependency within your AI system lifecycle. Operationalize it by inventorying those models, defining health and risk signals (performance drift, safety regressions, version changes), and running recurring reviews with documented decisions and rollback paths. ¹

Key takeaways:

Put every pre-trained model used in development into scope for ongoing monitoring, not just the final deployed model. ¹
Monitor for version changes, performance/safety drift, and supplier updates, and tie results to change management and incident response. ¹
Keep auditable evidence: model lineage, monitoring reports, review tickets, and maintenance actions. ¹

Pre-trained models often enter your environment “quietly”: a data science team pulls an open-source embedding model, a product team prototypes with a hosted LLM, or engineering adopts a third-party vision model to bootstrap a feature. MANAGE-3.2 closes the gap where teams monitor the deployed application but forget to monitor the upstream model they depended on during development. The risk is practical: a model update can change behavior, a newly discovered vulnerability can alter your threat picture, and drift can show up first in development pipelines long before a production incident.

For a Compliance Officer, CCO, or GRC lead, the fastest path is to convert MANAGE-3.2 into a clear control statement: “All pre-trained models used to develop or maintain an AI system are identified, assigned owners, and monitored on a recurring basis as part of the AI system monitoring and maintenance program.” Then build a lightweight operating rhythm: inventory, signals, review cadence, change triggers, and evidence collection. This aligns with the NIST AI RMF’s focus on managing AI risks across the system lifecycle, not just at go-live. ²

Regulatory text

Excerpt: “Pre-trained models which are used for development are monitored as part of AI system regular monitoring and maintenance.” ¹

Operator interpretation: If your teams use a pre-trained model at any point to build, tune, evaluate, or maintain an AI system, you must include that model in your normal monitoring and maintenance activities. “Monitored” here is operational: you define what you watch, who watches it, what triggers action, and how you document outcomes. “Regular monitoring and maintenance” means this is not a one-time due diligence event; it’s an ongoing control integrated with model lifecycle management. ¹

Plain-English interpretation (what the requirement really means)

You are responsible for the ongoing risk posture of pre-trained models used during development because they shape the resulting system’s behavior and risk profile. That includes:

Behavioral changes over time (drift, regressions, unexpected output patterns).
Upstream changes (new versions, altered weights, changed inference endpoints, licensing changes).
Safety/security posture changes (newly disclosed issues, compromised supply chain, poisoning concerns).
Operational reliability (availability of hosted endpoints, latency changes that alter downstream controls).

If you already have “AI monitoring,” expand its scope so it explicitly includes development dependencies, not only the production model artifact. ¹

Who it applies to

Entities: Any organization developing or deploying AI systems, including those building with third-party or open-source pre-trained models. ¹

Operational contexts where MANAGE-3.2 is commonly missed:

RAG and search: embeddings models used for indexing/retrieval are pre-trained dependencies even if the “main model” is separate.
Fine-tuning and adapters: LoRA/adapters on top of a foundation model still inherit upstream behavior and changes.
Model-as-a-service: hosted LLM APIs where the provider updates models behind a stable name.
AutoML and model hubs: fast experimentation that pulls many checkpoints without long-term ownership.

If a pre-trained model is “only used in dev,” it still matters because it can influence evaluation baselines, feature selection, labeling strategy, and what ultimately ships. ¹

What you actually need to do (step-by-step)

1) Define “in-scope pre-trained model” for your program

Write a short scoping rule that compliance, engineering, and data science can apply consistently:

Any external or internal model weights/checkpoints used for initialization, feature extraction, embeddings, fine-tuning, distillation, evaluation, or safety filtering.
Any hosted model endpoint used to generate training data (including synthetic data) or labels.

Deliverable: a one-page MANAGE-3.2 control statement mapped to your AI governance policy. ¹

2) Build and maintain an inventory (system-of-record)

Create an inventory table (CMDB-style) for pre-trained models with:

Model name and provider/source (third party, open-source repo, internal).
Version or immutable identifier (hash/checksum when possible).
How it is used (embeddings, base LLM, classifier, moderation model).
Systems/products it supports (link to AI system register entry).
Owner (technical) and accountable risk owner (product/business).
Update mechanism (pinned artifact vs rolling endpoint).
License/terms reference and usage constraints.

Practical tip: treat model inventory as a dependency graph. One AI system may inherit risk from multiple upstream models. ¹

3) Define monitoring signals that match the risk

Avoid “monitor everything.” Pick signals that you can run repeatedly and defend in an exam.

Recommended signal categories (mix as needed):

Version/change signals: provider release notes, model card updates, endpoint alias changes, dependency lockfile changes.
Performance signals: task accuracy/quality on a fixed holdout set; latency and error rates for hosted inference.
Safety signals: toxic content rates, policy-violation rates, jailbreak susceptibility checks relevant to your use case.
Security/supply chain signals: integrity checks for downloaded artifacts; alerts on compromised repositories; dependency scanning outputs.
Data compatibility signals: embedding distribution shifts; retrieval relevance degradation; new failure modes after upstream updates.

Deliverable: a Monitoring Specification per pre-trained model or per model class, tied to AI system monitoring. ¹

4) Set an operating cadence plus event-based triggers

You need two types of monitoring:

Recurring reviews (scheduled): review dashboards/reports, confirm signals are running, record outcomes and actions.
Triggered reviews (event-based): initiate when key events occur, such as provider model updates, major upstream incidents, or significant performance regressions.

Hard requirement for audit readiness: define what counts as a “material change” that forces reassessment and potentially blocks release. ¹

5) Connect monitoring to change management and maintenance actions

Monitoring that doesn’t change decisions is theater. Wire it into:

SDLC gates: new model onboarding, version bump approval, release readiness checks.
Rollback plan: pinned prior version, alternate model, feature flag to disable AI feature.
Issue management: tickets for drift/regression, root cause analysis, and corrective action.
Exceptions: documented risk acceptance when you cannot remediate quickly, with expiry and compensating controls.

Deliverable: evidence that monitoring results create maintenance actions, not just reports. ¹

6) Assign clear ownership (RACI)

Minimum:

Control owner: GRC or AI governance lead who ensures the control runs and evidence exists.
Model owner(s): engineering/ML owner accountable for signals, thresholds, and remediation.
Product risk owner: decides whether to accept residual risk and authorizes exceptions.

This is where many programs fail: nobody owns the pre-trained model because “we didn’t build it.” MANAGE-3.2 expects you to manage it anyway. ¹

7) Make evidence collection automatic where possible

If you rely on screenshots and ad hoc narratives, you will miss cycles. Common automations:

CI checks that block merges when model artifacts change without an approved ticket.
Scheduled evaluation jobs that publish metrics to a dashboard and store immutable logs.
A lightweight template that creates a review record each cycle.

Daydream can help by mapping MANAGE-3.2 to a named control owner, embedding recurring evidence tasks, and keeping model monitoring artifacts linked to each AI system’s record for audit retrieval. ¹

Required evidence and artifacts to retain

Keep evidence that proves both design (you defined monitoring) and operation (you run it and act on it):

Inventory and lineage

Pre-trained model inventory (with versions/hashes where feasible)
Model lineage diagram linking upstream model(s) to each AI system
Source records (repo URL or provider reference, internal approval to use)

Monitoring design

Monitoring specification (signals, thresholds, cadence, triggers)
Test/evaluation plan and fixed benchmark dataset description (where applicable)
Change classification rubric (“material change” criteria)

Monitoring operation

Monitoring run logs and dashboards exports (timestamped)
Periodic review meeting notes or tickets with approvals
Incidents and corrective action records tied to monitoring findings
Exception/risk acceptance records with approvals and expirations

Maintenance

Version upgrade tickets and approvals
Rollback execution evidence (if used)
Post-change validation results

A clean evidence package lets you answer, fast: “Which pre-trained models do you depend on, what do you monitor, what happened last cycle, and what did you do about it?” ¹

Common exam/audit questions and hangups

Auditors and internal risk reviewers tend to probe these points:

“Show me all pre-trained models used for this system.”
Hangup: teams only list the final deployed model and miss embeddings, moderation, rerankers, or label-generation models.
“How do you know the provider didn’t change the model?”
Hangup: reliance on rolling aliases (for example, “latest”) without a detection control.
“What triggers a re-validation?”
Hangup: no defined materiality threshold; decisions become subjective.
“Prove you ran monitoring regularly and reviewed results.”
Hangup: dashboards exist, but no evidence of review/decision.
“What do you do when monitoring fails?”
Hangup: no documented rollback, feature kill switch, or exception path.

Frequent implementation mistakes (and how to avoid them)

Mistake	Why it fails	Fix
Inventory is incomplete	Hidden model dependencies create blind spots	Require registration for any model import or endpoint call in dev and prod pipelines. ¹
Monitoring focuses only on accuracy	Safety, security, and reliability regressions still harm users	Add signals for safety policy violations, integrity, and operational KPIs relevant to your use case. ¹
No linkage to change management	Findings don’t drive maintenance	Require a ticket for model updates and attach monitoring evidence to the change record. ¹
“One-and-done” due diligence	Requirement is ongoing monitoring	Establish recurring review records and event-based triggers. ¹
No owner for third-party models	Accountability gap	Assign a named model owner and a business risk owner; track exceptions with expiry. ¹

Enforcement context and risk implications

NIST AI RMF is a framework, not a regulator. The practical risk of failing MANAGE-3.2 is governance failure: you cannot explain or control how upstream model behavior changes affected your system. That exposure shows up during internal audits, customer due diligence, and incident response after harmful outputs or outages. The most defensible posture is documented, repeatable monitoring tied to maintenance decisions. ²

A practical 30/60/90-day execution plan

First 30 days (stand up the control)

Publish the MANAGE-3.2 control statement and scope rule for “pre-trained model.”
Build the initial inventory for top AI systems and identify owners.
Decide the minimum monitoring signals per model class (LLM, embeddings, vision, moderation).
Create templates: monitoring spec, review record, exception record. ¹

Days 31–60 (run monitoring and connect it to operations)

Implement monitoring jobs or dashboards for in-scope models.
Add event triggers: provider updates, dependency changes, performance regression alerts.
Integrate with change management: require an approved record to update models.
Run at least one review cycle and document actions taken. ¹

Days 61–90 (audit hardening and scaling)

Expand inventory coverage across all product lines and dev teams.
Add lineage mapping so each AI system shows upstream pre-trained dependencies.
Test rollback paths and document post-change validation.
Centralize evidence storage and reporting, so you can answer audit requests quickly. ¹

Frequently Asked Questions

Does MANAGE-3.2 apply if the pre-trained model is only used during experimentation and never ships?

If it informs development decisions for an AI system, treat it as in scope and monitor it under your regular monitoring and maintenance program. A practical compromise is lighter-weight monitoring for prototypes, with stricter requirements once a system is on a path to production. ¹

We use a hosted LLM endpoint where the provider can change the underlying model. How do we “monitor” that?

Track provider change notices and run canary evaluations that detect behavioral shifts relevant to your use case. Pair this with change triggers so a detected shift forces review before you keep using the endpoint for development or releases. ¹

What’s the difference between monitoring the AI system and monitoring the pre-trained model?

AI system monitoring focuses on end-to-end outcomes; pre-trained model monitoring focuses on the upstream dependency that can change behavior, performance, safety, and reliability. You need both, and the evidence should show the linkage from upstream signals to system maintenance actions. ¹

Do we need a model card for every third-party pre-trained model?

You need documentation sufficient to support monitoring and maintenance decisions, which can include a provider model card when available plus your internal monitoring specification and review records. If the provider documentation is thin, your internal evaluation and monitoring evidence becomes more important. ¹

How do we handle open-source checkpoints pulled from a model hub?

Pin versions or hashes where feasible, record the source, and perform integrity checks and recurring evaluations as part of monitoring. Treat the model hub as a third party dependency from a governance perspective, and document any exceptions if you cannot pin versions. ¹

What evidence is most persuasive in an audit?

A complete inventory linked to each AI system, monitoring run logs, and review tickets showing decisions and corrective actions. Auditors want to see that monitoring leads to maintenance outcomes, not just dashboards. ¹

Frequently Asked Questions

Does MANAGE-3.2 apply if the pre-trained model is only used during experimentation and never ships?

We use a hosted LLM endpoint where the provider can change the underlying model. How do we “monitor” that?

What’s the difference between monitoring the AI system and monitoring the pre-trained model?

Do we need a model card for every third-party pre-trained model?

How do we handle open-source checkpoints pulled from a model hub?

What evidence is most persuasive in an audit?

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream

Regulatory text

Plain-English interpretation (what the requirement really means)

Who it applies to

What you actually need to do (step-by-step)

1) Define “in-scope pre-trained model” for your program

2) Build and maintain an inventory (system-of-record)

3) Define monitoring signals that match the risk

4) Set an operating cadence plus event-based triggers

5) Connect monitoring to change management and maintenance actions

6) Assign clear ownership (RACI)

7) Make evidence collection automatic where possible

Required evidence and artifacts to retain

Common exam/audit questions and hangups

Frequent implementation mistakes (and how to avoid them)

Enforcement context and risk implications

A practical 30/60/90-day execution plan

First 30 days (stand up the control)

Days 31–60 (run monitoring and connect it to operations)

Days 61–90 (audit hardening and scaling)

Frequently Asked Questions

Does MANAGE-3.2 apply if the pre-trained model is only used during experimentation and never ships?

We use a hosted LLM endpoint where the provider can change the underlying model. How do we “monitor” that?

What’s the difference between monitoring the AI system and monitoring the pre-trained model?

Do we need a model card for every third-party pre-trained model?

How do we handle open-source checkpoints pulled from a model hub?

What evidence is most persuasive in an audit?

Footnotes

Frequently Asked Questions

Related Resources

Operationalize this requirement