SI-19(8): Motivated Intruder

To meet the si-19(8): motivated intruder requirement, you must run a “motivated intruder” re-identification test against each de-identified dataset to confirm it cannot be re-identified and no direct identifiers remain. Treat it as an adversarial validation step: document the test scope, methods, results, remediation, and an approved release decision. 1

Key takeaways:

  • Run an explicit re-identification attempt against the de-identified data, not just a checklist review. 1
  • Define “motivated intruder” assumptions (access, tools, time, auxiliary data) and make them repeatable. 1
  • Keep assessment-ready evidence: test plan, logs/queries, findings, fixes, and approval to publish/share the dataset. 1

SI-19(8) is a de-identification reality check. Your team may apply masking, generalization, suppression, tokenization, or other techniques and call the dataset “de-identified,” but the control requires proof that the resulting dataset resists re-identification attempts by a plausible attacker. NIST names that attacker the “motivated intruder” and requires a test to determine whether identified data remains or whether the de-identified dataset can be re-identified. 1

For a Compliance Officer, CCO, or GRC lead, operationalizing this control is mainly about governance: set ownership, define the testing standard, require a repeatable test plan, and gate dataset release on passing results. The operational friction is real because “re-identification” is contextual; outcomes depend on what an intruder could realistically access and what auxiliary data exists. Your job is to force those assumptions into writing, then make engineering and data teams execute the test and retain defensible evidence.

This page translates SI-19(8) into a practical implementation you can deploy across analytics, data science, product telemetry, research datasets, and third-party data sharing.

Regulatory text

Requirement (excerpt): “Perform a motivated intruder test on the de-identified dataset to determine if the identified data remains or if the de-identified data can be re-identified.” 1

What the operator must do:

  1. Identify each dataset you treat as “de-identified.” This includes datasets shared externally, used for research, used for model training, or published internally beyond the original need-to-know boundary.
  2. Attempt to re-identify individuals (or sensitive entities) using a “motivated intruder” model and record the methods used.
  3. Decide, based on results, whether the dataset is safe to release for the intended use or requires additional de-identification controls and retesting.
  4. Retain evidence that the test occurred and that failures drove remediation before release. 1

Plain-English interpretation

SI-19(8) means “prove your de-identified data behaves like de-identified data.” A motivated intruder test is an adversarial exercise: someone tries to reverse the de-identification or connect quasi-identifiers (like age band + geography + dates + rare attributes) to re-identify people. The point is not perfection; the point is a documented, repeatable attempt that is credible given the data environment and the dataset’s intended distribution. 1

In audits, a “we removed names” story does not hold. You need a test narrative: what the intruder could access, what they tried, what worked, what failed, and what you changed as a result.

Who it applies to

Entity types (typical):

  • Federal information systems implementing NIST SP 800-53 controls. 1
  • Contractor systems handling federal data where 800-53 controls are contractually flowed down or used as the security baseline. 1

Operational contexts where it shows up:

  • Publishing “anonymous” datasets for research, transparency reporting, or program evaluation.
  • Sharing de-identified data with third parties (analytics providers, academic partners, product partners).
  • Creating de-identified extracts for internal users outside the original system boundary (BI teams, data science, innovation labs).
  • Using de-identified training datasets in ML pipelines where re-identification could create privacy, security, or contractual exposure.

What you actually need to do (step-by-step)

1) Assign control ownership and define the release gate

  • Control owner: usually Privacy + Security Engineering + Data Governance (one accountable owner, multiple contributors).
  • Release gate: no de-identified dataset is shared or published until a motivated intruder test is completed, reviewed, and approved with a recorded decision. 1

Practical tip: Make the gate operational by binding it to your data request workflow (ticketing) and your data platform (approval tags, catalog status, or access policy).

2) Build and maintain an inventory of de-identified datasets

Minimum fields that make audits survivable:

  • Dataset name, system of record, data steward
  • Intended use and intended recipients (internal group vs external third party)
  • De-identification method(s) applied
  • Link to last motivated intruder test package
  • Release approval status and conditions (expiration, re-test triggers)

3) Define your “motivated intruder” threat model (in writing)

The control does not prescribe one universal model. You need a documented model that is credible for your environment and dataset distribution. Cover:

  • Intruder access level: what the attacker gets (only the de-identified dataset, or also data dictionaries, schema, sample queries, or documentation).
  • Auxiliary data assumptions: what outside data sources are reasonably accessible (public records, social media, prior breaches, commercially available datasets, organizational knowledge).
  • Attacker capabilities: skills and tools you assume (basic analytics, scripting, record linkage techniques).
  • Success definition: what counts as “re-identified” for your organization (unique matching with high confidence; linkage to a named individual; confirmation of sensitive attribute for a person). 1

Common hangup: teams skip this step and run generic tests that do not match the dataset’s actual exposure. Auditors will ask “re-identify against what?” Your threat model is the answer.

4) Create a motivated intruder test plan template

Standardize the plan so tests are repeatable across teams:

  • Dataset version/hash and extraction date
  • Fields present (including derived fields) and de-identification transformations applied
  • Test hypotheses: “Could an intruder single out someone?” “Could an intruder link to external data?” “Could an intruder infer sensitive attributes?”
  • Test methods: uniqueness analysis, outlier detection, linkage attempts using quasi-identifiers, join attacks against known reference datasets (as permitted), small-cell discovery, temporal reconstruction
  • Tools used (SQL queries, notebooks, scripts)
  • Pass/fail criteria and escalation path (who decides, what happens on failure)

5) Execute the test and record reproducible work

The single best evidence improvement: keep the work reproducible.

  • Store queries/notebooks/scripts in a controlled repository.
  • Capture results tables and linkage attempts.
  • Record any successful re-identification paths, even if you believe they are edge cases. 1

Example test activities (choose what fits the dataset):

  • Direct identifier scan: confirm obvious identifiers are absent (names, emails, IDs, device identifiers) and that no “shadow identifiers” remain in free text fields.
  • Uniqueness checks: identify records unique on combinations of quasi-identifiers (example: geography + age band + event date).
  • Small cell risk: find rare categories or tiny groups that allow singling out.
  • Linkage attempt: try joining to permitted auxiliary data on plausible keys (example: location + date + event type).
  • Inference checks: test whether sensitive attributes can be inferred from correlated fields.

6) Remediate and re-test until risk is acceptable

If the intruder can re-identify, you need changes and a retest. Typical remediations:

  • Further generalization (broader age bands, coarse geography)
  • Suppression of rare categories
  • Noise addition or perturbation for certain measures
  • Remove or coarsen dates/times
  • Redact or transform free text fields that leak identity
  • Reduce granularity or sample size for public release

Document what you changed and why. The audit story is “test → failure → fix → retest → approval,” not “we de-identified it and hoped.”

7) Approve, publish/share, and set retest triggers

Set retest triggers that match real-world risk:

  • Material dataset refresh (new time period, new fields, new sources)
  • New recipient population (internal to external, restricted to broad)
  • New auxiliary data becomes available to the likely intruder (for example, a related dataset is published elsewhere)
  • De-identification technique changes or pipeline changes

8) Operationalize with control mapping and recurring evidence

A frequent gap is evidence sprawl: tests happen but aren’t packaged for assessment. Map SI-19(8) to:

  • named owner
  • written procedure
  • evidence checklist
  • recurring review cadence and triggers
  • repository location and access controls

Daydream can help by turning this into a tracked control with assigned ownership, standardized evidence requests, and a recurring evidence calendar so you do not rebuild the audit package each cycle.

Required evidence and artifacts to retain

Keep an “SI-19(8) test package” per dataset/version:

  • Dataset inventory entry (with intended use and distribution)
  • Motivated intruder threat model document (versioned)
  • Test plan (dataset-specific)
  • Execution evidence: queries/scripts/notebooks, outputs, analyst notes
  • Findings register: what re-identification paths were attempted and outcomes
  • Remediation record and retest results
  • Formal approval to release (sign-off with conditions)
  • Change log tying dataset modifications to retest triggers 1

Common exam/audit questions and hangups

Auditors and assessors tend to press on these points:

  • “Which datasets are de-identified, and how do you know you found them all?” Expect to show inventory coverage and the intake workflow.
  • “Define ‘motivated intruder’ for your organization.” If you cannot explain assumptions, the test loses credibility.
  • “Show me a failed test and what you did about it.” Passing-only evidence looks curated.
  • “How do you prevent teams from bypassing the test?” Gating controls, approvals, and technical enforcement matter.
  • “How do you handle dataset updates?” You need retest triggers and change management. 1

Frequent implementation mistakes and how to avoid them

  1. Treating de-identification as a one-time transformation. Avoid this by tying the test to each dataset version and distribution context.
  2. No documented success criteria. Define what “re-identified” means for your environment and what level of linkage is unacceptable.
  3. Testing without auxiliary data assumptions. Write down what the intruder can reasonably access; otherwise the test is arbitrary.
  4. Over-scoping into an academic exercise. Keep the test aligned to real release risk. Focus on plausible linkage paths and singling-out risk.
  5. Weak evidence packaging. Store artifacts in one place with a consistent naming standard and a clear approval record. 1

Enforcement context and risk implications

No public enforcement cases were provided in the source catalog for this requirement, so this page does not cite specific actions or penalties. Practically, failure modes show up as (a) privacy breaches via re-identification, (b) contractual violations when de-identified data was promised, and (c) control failures in federal assessments where you cannot prove the test occurred. SI-19(8) targets that proof gap directly. 1

Practical execution plan (30/60/90)

You asked for speed. Use this phased plan without inventing time-to-complete claims.

First 30 days (stand up governance)

  • Assign the SI-19(8) owner and backups.
  • Publish a one-page motivated intruder threat model for your organization.
  • Create the dataset inventory fields and intake workflow (ticket form or catalog entry).
  • Build the motivated intruder test plan template and an evidence checklist.
  • Pick one high-visibility dataset and run a pilot test package end-to-end. 1

By 60 days (make it repeatable)

  • Expand inventory coverage to all known de-identified datasets in scope.
  • Train data stewards/analytics leads on the template and approval gate.
  • Implement a release gate in the workflow (no approval, no share).
  • Run tests for the highest-risk datasets first (public releases, broad third-party sharing, sensitive attributes). 1

By 90 days (operationalize and audit-proof)

  • Standardize where test artifacts live (repo + ticket + catalog links).
  • Add retest triggers to change management (dataset refreshes and schema changes).
  • Create a simple metrics view: which datasets have a current test package and approval.
  • Conduct an internal assessment: pull a sample of datasets and verify evidence completeness against the checklist. 1

Frequently Asked Questions

What qualifies as a “motivated intruder” test versus a basic privacy review?

A motivated intruder test includes an explicit attempt to re-identify or link records, with documented assumptions and reproducible steps. A basic review usually stops at “identifiers removed,” which does not satisfy SI-19(8). 1

Do we need a third party to run the motivated intruder test?

SI-19(8) requires the test, not a specific party. Internal testing is acceptable if the method is credible, independent enough for your governance model, and fully evidenced. 1

How do we scope the auxiliary data an intruder can use?

Scope it to what is realistically available to the likely recipient or attacker given the dataset distribution. Put those assumptions in the threat model and reuse them consistently across tests. 1

What is the “pass” criterion for re-identification resistance?

SI-19(8) does not define a numeric threshold. You must define “unacceptable re-identification” for your context, document it, and show the test results meet it before release. 1

How often do we need to re-run motivated intruder tests?

Re-run when the dataset changes materially, the audience changes, or new linkage data becomes available under your threat model. Encode those as retest triggers in your data change process. 1

What evidence do auditors ask for most often?

They usually ask for a complete test package tied to a specific dataset version: the plan, the executed work (queries/scripts), results, remediation if needed, and the approval decision to share. 1

Footnotes

  1. NIST SP 800-53 Rev. 5 OSCAL JSON

Frequently Asked Questions

What qualifies as a “motivated intruder” test versus a basic privacy review?

A motivated intruder test includes an explicit attempt to re-identify or link records, with documented assumptions and reproducible steps. A basic review usually stops at “identifiers removed,” which does not satisfy SI-19(8). (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

Do we need a third party to run the motivated intruder test?

SI-19(8) requires the test, not a specific party. Internal testing is acceptable if the method is credible, independent enough for your governance model, and fully evidenced. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

How do we scope the auxiliary data an intruder can use?

Scope it to what is realistically available to the likely recipient or attacker given the dataset distribution. Put those assumptions in the threat model and reuse them consistently across tests. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

What is the “pass” criterion for re-identification resistance?

SI-19(8) does not define a numeric threshold. You must define “unacceptable re-identification” for your context, document it, and show the test results meet it before release. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

How often do we need to re-run motivated intruder tests?

Re-run when the dataset changes materially, the audience changes, or new linkage data becomes available under your threat model. Encode those as retest triggers in your data change process. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

What evidence do auditors ask for most often?

They usually ask for a complete test package tied to a specific dataset version: the plan, the executed work (queries/scripts), results, remediation if needed, and the approval decision to share. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream