MANAGE-3.2: Pre-trained models which are used for development are monitored as part of AI system regular monitoring and maintenance.

MANAGE-3.2 requires you to treat any pre-trained model you rely on during development (third-party foundation models, open-source checkpoints, or internal base models) as a monitored dependency within your AI system lifecycle. Operationalize it by inventorying those models, defining health and risk signals (performance drift, safety regressions, version changes), and running recurring reviews with documented decisions and rollback paths. 1

Key takeaways:

  • Put every pre-trained model used in development into scope for ongoing monitoring, not just the final deployed model. 1
  • Monitor for version changes, performance/safety drift, and supplier updates, and tie results to change management and incident response. 1
  • Keep auditable evidence: model lineage, monitoring reports, review tickets, and maintenance actions. 1

Pre-trained models often enter your environment “quietly”: a data science team pulls an open-source embedding model, a product team prototypes with a hosted LLM, or engineering adopts a third-party vision model to bootstrap a feature. MANAGE-3.2 closes the gap where teams monitor the deployed application but forget to monitor the upstream model they depended on during development. The risk is practical: a model update can change behavior, a newly discovered vulnerability can alter your threat picture, and drift can show up first in development pipelines long before a production incident.

For a Compliance Officer, CCO, or GRC lead, the fastest path is to convert MANAGE-3.2 into a clear control statement: “All pre-trained models used to develop or maintain an AI system are identified, assigned owners, and monitored on a recurring basis as part of the AI system monitoring and maintenance program.” Then build a lightweight operating rhythm: inventory, signals, review cadence, change triggers, and evidence collection. This aligns with the NIST AI RMF’s focus on managing AI risks across the system lifecycle, not just at go-live. 2

Regulatory text

Excerpt: “Pre-trained models which are used for development are monitored as part of AI system regular monitoring and maintenance.” 1

Operator interpretation: If your teams use a pre-trained model at any point to build, tune, evaluate, or maintain an AI system, you must include that model in your normal monitoring and maintenance activities. “Monitored” here is operational: you define what you watch, who watches it, what triggers action, and how you document outcomes. “Regular monitoring and maintenance” means this is not a one-time due diligence event; it’s an ongoing control integrated with model lifecycle management. 1

Plain-English interpretation (what the requirement really means)

You are responsible for the ongoing risk posture of pre-trained models used during development because they shape the resulting system’s behavior and risk profile. That includes:

  • Behavioral changes over time (drift, regressions, unexpected output patterns).
  • Upstream changes (new versions, altered weights, changed inference endpoints, licensing changes).
  • Safety/security posture changes (newly disclosed issues, compromised supply chain, poisoning concerns).
  • Operational reliability (availability of hosted endpoints, latency changes that alter downstream controls).

If you already have “AI monitoring,” expand its scope so it explicitly includes development dependencies, not only the production model artifact. 1

Who it applies to

Entities: Any organization developing or deploying AI systems, including those building with third-party or open-source pre-trained models. 1

Operational contexts where MANAGE-3.2 is commonly missed:

  • RAG and search: embeddings models used for indexing/retrieval are pre-trained dependencies even if the “main model” is separate.
  • Fine-tuning and adapters: LoRA/adapters on top of a foundation model still inherit upstream behavior and changes.
  • Model-as-a-service: hosted LLM APIs where the provider updates models behind a stable name.
  • AutoML and model hubs: fast experimentation that pulls many checkpoints without long-term ownership.

If a pre-trained model is “only used in dev,” it still matters because it can influence evaluation baselines, feature selection, labeling strategy, and what ultimately ships. 1

What you actually need to do (step-by-step)

1) Define “in-scope pre-trained model” for your program

Write a short scoping rule that compliance, engineering, and data science can apply consistently:

  • Any external or internal model weights/checkpoints used for initialization, feature extraction, embeddings, fine-tuning, distillation, evaluation, or safety filtering.
  • Any hosted model endpoint used to generate training data (including synthetic data) or labels.

Deliverable: a one-page MANAGE-3.2 control statement mapped to your AI governance policy. 1

2) Build and maintain an inventory (system-of-record)

Create an inventory table (CMDB-style) for pre-trained models with:

  • Model name and provider/source (third party, open-source repo, internal).
  • Version or immutable identifier (hash/checksum when possible).
  • How it is used (embeddings, base LLM, classifier, moderation model).
  • Systems/products it supports (link to AI system register entry).
  • Owner (technical) and accountable risk owner (product/business).
  • Update mechanism (pinned artifact vs rolling endpoint).
  • License/terms reference and usage constraints.

Practical tip: treat model inventory as a dependency graph. One AI system may inherit risk from multiple upstream models. 1

3) Define monitoring signals that match the risk

Avoid “monitor everything.” Pick signals that you can run repeatedly and defend in an exam.

Recommended signal categories (mix as needed):

  • Version/change signals: provider release notes, model card updates, endpoint alias changes, dependency lockfile changes.
  • Performance signals: task accuracy/quality on a fixed holdout set; latency and error rates for hosted inference.
  • Safety signals: toxic content rates, policy-violation rates, jailbreak susceptibility checks relevant to your use case.
  • Security/supply chain signals: integrity checks for downloaded artifacts; alerts on compromised repositories; dependency scanning outputs.
  • Data compatibility signals: embedding distribution shifts; retrieval relevance degradation; new failure modes after upstream updates.

Deliverable: a Monitoring Specification per pre-trained model or per model class, tied to AI system monitoring. 1

4) Set an operating cadence plus event-based triggers

You need two types of monitoring:

  • Recurring reviews (scheduled): review dashboards/reports, confirm signals are running, record outcomes and actions.
  • Triggered reviews (event-based): initiate when key events occur, such as provider model updates, major upstream incidents, or significant performance regressions.

Hard requirement for audit readiness: define what counts as a “material change” that forces reassessment and potentially blocks release. 1

5) Connect monitoring to change management and maintenance actions

Monitoring that doesn’t change decisions is theater. Wire it into:

  • SDLC gates: new model onboarding, version bump approval, release readiness checks.
  • Rollback plan: pinned prior version, alternate model, feature flag to disable AI feature.
  • Issue management: tickets for drift/regression, root cause analysis, and corrective action.
  • Exceptions: documented risk acceptance when you cannot remediate quickly, with expiry and compensating controls.

Deliverable: evidence that monitoring results create maintenance actions, not just reports. 1

6) Assign clear ownership (RACI)

Minimum:

  • Control owner: GRC or AI governance lead who ensures the control runs and evidence exists.
  • Model owner(s): engineering/ML owner accountable for signals, thresholds, and remediation.
  • Product risk owner: decides whether to accept residual risk and authorizes exceptions.

This is where many programs fail: nobody owns the pre-trained model because “we didn’t build it.” MANAGE-3.2 expects you to manage it anyway. 1

7) Make evidence collection automatic where possible

If you rely on screenshots and ad hoc narratives, you will miss cycles. Common automations:

  • CI checks that block merges when model artifacts change without an approved ticket.
  • Scheduled evaluation jobs that publish metrics to a dashboard and store immutable logs.
  • A lightweight template that creates a review record each cycle.

Daydream can help by mapping MANAGE-3.2 to a named control owner, embedding recurring evidence tasks, and keeping model monitoring artifacts linked to each AI system’s record for audit retrieval. 1

Required evidence and artifacts to retain

Keep evidence that proves both design (you defined monitoring) and operation (you run it and act on it):

Inventory and lineage

  • Pre-trained model inventory (with versions/hashes where feasible)
  • Model lineage diagram linking upstream model(s) to each AI system
  • Source records (repo URL or provider reference, internal approval to use)

Monitoring design

  • Monitoring specification (signals, thresholds, cadence, triggers)
  • Test/evaluation plan and fixed benchmark dataset description (where applicable)
  • Change classification rubric (“material change” criteria)

Monitoring operation

  • Monitoring run logs and dashboards exports (timestamped)
  • Periodic review meeting notes or tickets with approvals
  • Incidents and corrective action records tied to monitoring findings
  • Exception/risk acceptance records with approvals and expirations

Maintenance

  • Version upgrade tickets and approvals
  • Rollback execution evidence (if used)
  • Post-change validation results

A clean evidence package lets you answer, fast: “Which pre-trained models do you depend on, what do you monitor, what happened last cycle, and what did you do about it?” 1

Common exam/audit questions and hangups

Auditors and internal risk reviewers tend to probe these points:

  1. “Show me all pre-trained models used for this system.”
    Hangup: teams only list the final deployed model and miss embeddings, moderation, rerankers, or label-generation models.

  2. “How do you know the provider didn’t change the model?”
    Hangup: reliance on rolling aliases (for example, “latest”) without a detection control.

  3. “What triggers a re-validation?”
    Hangup: no defined materiality threshold; decisions become subjective.

  4. “Prove you ran monitoring regularly and reviewed results.”
    Hangup: dashboards exist, but no evidence of review/decision.

  5. “What do you do when monitoring fails?”
    Hangup: no documented rollback, feature kill switch, or exception path.

Frequent implementation mistakes (and how to avoid them)

Mistake Why it fails Fix
Inventory is incomplete Hidden model dependencies create blind spots Require registration for any model import or endpoint call in dev and prod pipelines. 1
Monitoring focuses only on accuracy Safety, security, and reliability regressions still harm users Add signals for safety policy violations, integrity, and operational KPIs relevant to your use case. 1
No linkage to change management Findings don’t drive maintenance Require a ticket for model updates and attach monitoring evidence to the change record. 1
“One-and-done” due diligence Requirement is ongoing monitoring Establish recurring review records and event-based triggers. 1
No owner for third-party models Accountability gap Assign a named model owner and a business risk owner; track exceptions with expiry. 1

Enforcement context and risk implications

NIST AI RMF is a framework, not a regulator. The practical risk of failing MANAGE-3.2 is governance failure: you cannot explain or control how upstream model behavior changes affected your system. That exposure shows up during internal audits, customer due diligence, and incident response after harmful outputs or outages. The most defensible posture is documented, repeatable monitoring tied to maintenance decisions. 2

A practical 30/60/90-day execution plan

First 30 days (stand up the control)

  • Publish the MANAGE-3.2 control statement and scope rule for “pre-trained model.”
  • Build the initial inventory for top AI systems and identify owners.
  • Decide the minimum monitoring signals per model class (LLM, embeddings, vision, moderation).
  • Create templates: monitoring spec, review record, exception record. 1

Days 31–60 (run monitoring and connect it to operations)

  • Implement monitoring jobs or dashboards for in-scope models.
  • Add event triggers: provider updates, dependency changes, performance regression alerts.
  • Integrate with change management: require an approved record to update models.
  • Run at least one review cycle and document actions taken. 1

Days 61–90 (audit hardening and scaling)

  • Expand inventory coverage across all product lines and dev teams.
  • Add lineage mapping so each AI system shows upstream pre-trained dependencies.
  • Test rollback paths and document post-change validation.
  • Centralize evidence storage and reporting, so you can answer audit requests quickly. 1

Frequently Asked Questions

Does MANAGE-3.2 apply if the pre-trained model is only used during experimentation and never ships?

If it informs development decisions for an AI system, treat it as in scope and monitor it under your regular monitoring and maintenance program. A practical compromise is lighter-weight monitoring for prototypes, with stricter requirements once a system is on a path to production. 1

We use a hosted LLM endpoint where the provider can change the underlying model. How do we “monitor” that?

Track provider change notices and run canary evaluations that detect behavioral shifts relevant to your use case. Pair this with change triggers so a detected shift forces review before you keep using the endpoint for development or releases. 1

What’s the difference between monitoring the AI system and monitoring the pre-trained model?

AI system monitoring focuses on end-to-end outcomes; pre-trained model monitoring focuses on the upstream dependency that can change behavior, performance, safety, and reliability. You need both, and the evidence should show the linkage from upstream signals to system maintenance actions. 1

Do we need a model card for every third-party pre-trained model?

You need documentation sufficient to support monitoring and maintenance decisions, which can include a provider model card when available plus your internal monitoring specification and review records. If the provider documentation is thin, your internal evaluation and monitoring evidence becomes more important. 1

How do we handle open-source checkpoints pulled from a model hub?

Pin versions or hashes where feasible, record the source, and perform integrity checks and recurring evaluations as part of monitoring. Treat the model hub as a third party dependency from a governance perspective, and document any exceptions if you cannot pin versions. 1

What evidence is most persuasive in an audit?

A complete inventory linked to each AI system, monitoring run logs, and review tickets showing decisions and corrective actions. Auditors want to see that monitoring leads to maintenance outcomes, not just dashboards. 1

Footnotes

  1. NIST AI RMF Core

  2. NIST AI RMF Core; Source: NIST AI RMF program page

Frequently Asked Questions

Does MANAGE-3.2 apply if the pre-trained model is only used during experimentation and never ships?

If it informs development decisions for an AI system, treat it as in scope and monitor it under your regular monitoring and maintenance program. A practical compromise is lighter-weight monitoring for prototypes, with stricter requirements once a system is on a path to production. (Source: NIST AI RMF Core)

We use a hosted LLM endpoint where the provider can change the underlying model. How do we “monitor” that?

Track provider change notices and run canary evaluations that detect behavioral shifts relevant to your use case. Pair this with change triggers so a detected shift forces review before you keep using the endpoint for development or releases. (Source: NIST AI RMF Core)

What’s the difference between monitoring the AI system and monitoring the pre-trained model?

AI system monitoring focuses on end-to-end outcomes; pre-trained model monitoring focuses on the upstream dependency that can change behavior, performance, safety, and reliability. You need both, and the evidence should show the linkage from upstream signals to system maintenance actions. (Source: NIST AI RMF Core)

Do we need a model card for every third-party pre-trained model?

You need documentation sufficient to support monitoring and maintenance decisions, which can include a provider model card when available plus your internal monitoring specification and review records. If the provider documentation is thin, your internal evaluation and monitoring evidence becomes more important. (Source: NIST AI RMF Core)

How do we handle open-source checkpoints pulled from a model hub?

Pin versions or hashes where feasible, record the source, and perform integrity checks and recurring evaluations as part of monitoring. Treat the model hub as a third party dependency from a governance perspective, and document any exceptions if you cannot pin versions. (Source: NIST AI RMF Core)

What evidence is most persuasive in an audit?

A complete inventory linked to each AI system, monitoring run logs, and review tickets showing decisions and corrective actions. Auditors want to see that monitoring leads to maintenance outcomes, not just dashboards. (Source: NIST AI RMF Core)

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream