SI-19(1): Collection

9 min readLast verified: February 2026By Isaac Silverman

SI-19(1): Collection requires you to de-identify data at the point of collection by designing intake so personally identifiable information (PII) is not collected in the first place. Operationally, this means you must inventory collection points, define “PII not allowed,” implement technical and procedural guardrails, and retain evidence that collection pathways and schemas prevent PII capture. ¹

Key takeaways:

Build “no-PII collection” into forms, APIs, logs, telemetry, and third-party data feeds, not just into downstream masking.
Treat this as an engineering control with compliance evidence: schemas, field allowlists, and test results beat policy-only approaches.
Map ownership and recurring evidence so audits can confirm the control operates continuously. ¹

The si-19(1): collection requirement is a design constraint: collect only what you need, and explicitly do not collect PII so the dataset is de-identified upon collection. In real programs, teams fail this control in predictable places: free-text fields, “notes” fields in case tools, verbose application logs, customer support attachments, analytics SDKs that scoop identifiers, and third-party data sources that arrive pre-populated with personal data.

For a CCO or GRC lead, the fastest path to operationalizing SI-19(1) is to treat “collection” as a set of concrete ingestion mechanisms (UI forms, APIs, batch pipelines, sensors, log shippers, agent telemetry, manual uploads) and to put enforceable rules at each boundary. This page gives requirement-level implementation guidance: who must comply, what to change in systems and processes, what evidence to keep, what auditors ask, and the common mistakes that create silent PII accumulation.

If you need repeatable audit readiness, assign a control owner, publish an implementation procedure, and define the evidence you will re-collect on a schedule so the control stays provable as systems change. ¹

Regulatory text

Requirement (verbatim): “De-identify the dataset upon collection by not collecting personally identifiable information.” ¹

Operator interpretation: You meet SI-19(1) by preventing PII from entering the dataset at ingestion time. Downstream tokenization, masking, hashing, or anonymization can be helpful for other controls, but they do not satisfy the plain reading of “upon collection” if raw PII was collected first. ¹

Plain-English interpretation (what “collection” means in practice)

“Collection” is any mechanism that creates a record in your environment. For most organizations, that includes:

Web and mobile forms (registration, contact us, lead gen, surveys)
Application APIs receiving payloads from clients or third parties
Event pipelines and analytics SDKs
Customer support tooling (tickets, attachments, call transcripts)
System/application logs, APM traces, and security telemetry
Data imports (CSV uploads, partner feeds, SFTP drops)
RPA/manual entry into internal systems

“De-identify upon collection” means designing those mechanisms to avoid PII fields, block PII-like inputs, and minimize free-text where people tend to paste personal data.

Who it applies to

SI-19(1) is relevant to:

Federal information systems and system components subject to NIST SP 800-53 control selection. ²
Contractor systems handling federal data, including service providers that collect or process data on behalf of federal agencies or in regulated federal programs. ²

Operationally, it applies most directly to product teams, platform/infra, data engineering, security engineering, and any business function that sponsors intake of data (marketing ops, CX/support, fraud, risk, HR), because they control collection points.

What you actually need to do (step-by-step)

1) Name the control owner and define scope boundaries

Assign a single accountable owner (often Security Engineering or Data Governance) with named delegates for each major data-collection surface.
Define the “dataset(s)” in scope: which analytics tables, operational data stores, or data lakes are subject to the no-PII intake rule.
Document the system boundaries where you can enforce controls (edge gateway, API layer, ingestion service, logging pipeline).
Recommended practice: map SI-19(1) to an owner, an implementation procedure, and recurring evidence artifacts. ¹

2) Create a “PII not allowed” data rule set

You need a practical, testable definition. Produce:

A PII taxonomy for your environment (examples: names, emails, phone numbers, government IDs, exact address, biometric identifiers).
An allowlist of permitted fields for each collection point (data contract), plus explicit “do not collect” fields.
Rules for ambiguous fields (free text, “notes,” “description,” “message”): either remove them, constrain them, or implement detection and blocking.

Keep this grounded in implementation. The goal is not a perfect legal definition; the goal is field-level enforcement.

3) Inventory collection points and rank them by PII likelihood

Build a simple register (spreadsheet is fine) with:

Collection point name (form/API/log/feed)
Owner team
Data destination (table/bucket/index)
Fields collected (or sample payload)
Presence of free text
Third-party involvement (SDK, vendor form tool, call center)

Prioritize first the areas where humans enter text and where logs capture request bodies.

4) Engineer guardrails at ingestion (preferred order)

Implement controls in the path where they prevent PII intake:

A. Remove or redesign the fields

Delete unneeded fields.
Replace free-text with structured options (dropdowns, categories).
Split “name” into non-identifying attributes if the business need is segmentation rather than identification.

B. Schema enforcement and contract tests

Enforce JSON schema / protobuf contracts at the API gateway or ingestion service.
Reject payloads containing fields not on the allowlist.
Add CI tests that fail builds if new fields appear without data governance review.

C. Input validation and PII detection

For remaining risky inputs, implement validation and redaction-at-edge (before persistence).
Add pattern-based checks for common identifiers (email-like strings, phone formats) and block or strip.

D. Logging and telemetry minimization

Disable request/response body logging by default.
Implement structured logging that captures operational metadata, not user content.
Add log filters in shippers to drop sensitive keys.

E. Third-party controls

Configure third-party SDKs and tools to disable automatic collection of identifiers.
Contractually restrict what third parties may send you in feeds and support exports, and test their payloads on receipt.

5) Add a governance gate for new collection

New data fields are the main drift vector. Put in place:

A lightweight intake review for schema changes (ticket + approval).
A checklist item: “Does this introduce PII at collection?” with required sign-off.

Daydream can help by keeping the SI-19(1) requirement mapped to the owner, procedure, and evidence set, so every release cycle has a clear “what to show the auditor” target. ¹

6) Validate with tests and continuous monitoring

Prove operation, not intent:

Run synthetic submissions that attempt to enter PII into forms/APIs and confirm rejection.
Sample production events/logs for PII indicators (within your authorized monitoring practices) and document results.
Track exceptions and remediation tickets to closure.

Required evidence and artifacts to retain

Keep artifacts that show prevention at collection time:

Design and governance

SI-19(1) control statement and scope definition. ¹
Data collection inventory/register with owners and destinations.
Approved data schemas / data contracts with allowlisted fields.
“PII not allowed” standard and field classification notes.

Technical configuration

API gateway / ingestion service validation rules (configs, code snippets, screenshots).
Form definitions showing removed/restricted fields.
Logging configuration showing payload/body suppression and sensitive-key filtering.
Third-party SDK configuration settings and documentation of disabled ID collection.

Operational proof

Test cases and results (negative tests trying to submit emails/phones into prohibited fields).
Monitoring or sampling reports showing no PII observed in collected datasets.
Change management records for schema changes and approvals.
Exception register (if any) with compensating controls and expiration dates.

Common exam/audit questions and hangups

Auditors tend to probe four things:

“Show me where collection is blocked.” Policies won’t satisfy them; expect requests for schema enforcement and validation proof. ¹
“What counts as PII here?” If teams can’t articulate a consistent rule, you will fail on consistency.
“What about logs?” Many environments collect PII inadvertently through debugging logs.
“How do you stop drift?” They will look for a change gate and recurring evidence, not a one-time cleanup.

Frequent implementation mistakes (and how to avoid them)

Mistake: Relying on downstream masking/tokenization. Fix: move controls to the edge and ingestion layer so raw PII never lands. ¹
Mistake: Ignoring free-text. Fix: remove it, constrain it, or implement pre-persistence detection and rejection for prohibited patterns.
Mistake: Forgetting vendor/third-party feeds. Fix: add contract requirements and inbound validation; quarantine unexpected fields.
Mistake: Treating “logs” as out of scope. Fix: include logs and telemetry in the collection inventory; disable body logging.
Mistake: No evidence cadence. Fix: define what you will re-collect after releases and configuration changes; store it centrally.

Risk implications (why operators care)

Collecting PII you do not need increases breach impact, expands retention and deletion obligations, complicates incident response scoping, and increases third-party exposure when data is shared. SI-19(1) pushes the risk reduction upstream: if you never collect PII, you reduce what can be lost, subpoenaed, misrouted, or accidentally exposed in analytics and logs. ²

Practical 30/60/90-day execution plan

First 30 days (stabilize and map)

Assign control owner and publish the SI-19(1) control statement, scope, and “PII not allowed” rule set. ¹
Build the inventory of collection points, including logs and third-party sources.
Identify top risk ingestion paths (free-text forms, request body logs, support attachments) and implement immediate containment (disable body logging, restrict fields, add temporary filters).

Next 60 days (implement enforceable guardrails)

Add schema enforcement at API gateway/ingestion services with allowlists and “reject unknown fields.”
Refactor high-risk forms to structured inputs and remove unnecessary fields.
Implement inbound validation/quarantine for third-party feeds.
Establish change management gates for new data fields and collection mechanisms.

By 90 days (prove operation and make it repeatable)

Build a test suite for negative PII submissions and run it on a schedule tied to releases.
Produce recurring evidence packets: configs, test runs, sampling reports, and exception logs.
Operationalize reporting to GRC: open exceptions, remediation progress, and confirmation that new schemas were reviewed.
If you use Daydream, map SI-19(1) to the owner, procedure, and evidence artifacts so collection controls stay auditable release after release. ¹

Frequently Asked Questions

Does SI-19(1) allow collecting PII if we hash it immediately?

The text calls for de-identification “upon collection” by “not collecting” PII, so hashing after ingestion is hard to defend as compliant if raw PII was collected first. Design the intake so the PII never enters the dataset. ¹

What about IP addresses or device IDs in telemetry?

Treat telemetry as a collection point and decide, field by field, what is permitted for your dataset. If an identifier can identify a person in your context, handle it under your PII rule set and minimize or block it at collection. ¹

Our support team needs free-text to solve issues. How do we comply?

Keep free-text out of the de-identified dataset and route it to a separate, access-controlled system with its own rules, or redesign intake with structured categories plus a controlled attachment workflow. Add warnings and validation to reduce users pasting personal data.

Do application logs count as “collection” for SI-19(1)?

Yes in practice, because logs persist data. Disable request/response body logging, filter sensitive keys, and keep configuration evidence that prevents PII from being captured in logs. ¹

How do we handle third parties that send us PII in a data feed?

Put contractual limits in place, then enforce them technically by validating inbound payloads against an allowlist and quarantining violations. Keep rejection/quarantine logs as proof that PII is not accepted into the dataset.

What evidence will an assessor accept for this requirement?

Expect to show (1) collection-point inventory, (2) enforced schemas/validation configs, (3) logging/telemetry suppression settings, and (4) test results demonstrating rejection of PII patterns. Also show ownership and a recurring evidence cadence. ¹

Frequently Asked Questions

Does SI-19(1) allow collecting PII if we hash it immediately?

What about IP addresses or device IDs in telemetry?

Our support team needs free-text to solve issues. How do we comply?

Do application logs count as “collection” for SI-19(1)?

Yes in practice, because logs persist data. Disable request/response body logging, filter sensitive keys, and keep configuration evidence that prevents PII from being captured in logs. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)

How do we handle third parties that send us PII in a data feed?

What evidence will an assessor accept for this requirement?

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream

Regulatory text

Plain-English interpretation (what “collection” means in practice)

Who it applies to

What you actually need to do (step-by-step)

1) Name the control owner and define scope boundaries

2) Create a “PII not allowed” data rule set

3) Inventory collection points and rank them by PII likelihood

4) Engineer guardrails at ingestion (preferred order)

5) Add a governance gate for new collection

6) Validate with tests and continuous monitoring

Required evidence and artifacts to retain

Common exam/audit questions and hangups

Frequent implementation mistakes (and how to avoid them)

Risk implications (why operators care)

Practical 30/60/90-day execution plan

First 30 days (stabilize and map)

Next 60 days (implement enforceable guardrails)

By 90 days (prove operation and make it repeatable)

Frequently Asked Questions

Does SI-19(1) allow collecting PII if we hash it immediately?

What about IP addresses or device IDs in telemetry?

Our support team needs free-text to solve issues. How do we comply?

Do application logs count as “collection” for SI-19(1)?

How do we handle third parties that send us PII in a data feed?

What evidence will an assessor accept for this requirement?

Footnotes

Frequently Asked Questions

Related Resources

Operationalize this requirement