SI-19(1): Collection
SI-19(1): Collection requires you to de-identify data at the point of collection by designing intake so personally identifiable information (PII) is not collected in the first place. Operationally, this means you must inventory collection points, define “PII not allowed,” implement technical and procedural guardrails, and retain evidence that collection pathways and schemas prevent PII capture. 1
Key takeaways:
- Build “no-PII collection” into forms, APIs, logs, telemetry, and third-party data feeds, not just into downstream masking.
- Treat this as an engineering control with compliance evidence: schemas, field allowlists, and test results beat policy-only approaches.
- Map ownership and recurring evidence so audits can confirm the control operates continuously. 1
The si-19(1): collection requirement is a design constraint: collect only what you need, and explicitly do not collect PII so the dataset is de-identified upon collection. In real programs, teams fail this control in predictable places: free-text fields, “notes” fields in case tools, verbose application logs, customer support attachments, analytics SDKs that scoop identifiers, and third-party data sources that arrive pre-populated with personal data.
For a CCO or GRC lead, the fastest path to operationalizing SI-19(1) is to treat “collection” as a set of concrete ingestion mechanisms (UI forms, APIs, batch pipelines, sensors, log shippers, agent telemetry, manual uploads) and to put enforceable rules at each boundary. This page gives requirement-level implementation guidance: who must comply, what to change in systems and processes, what evidence to keep, what auditors ask, and the common mistakes that create silent PII accumulation.
If you need repeatable audit readiness, assign a control owner, publish an implementation procedure, and define the evidence you will re-collect on a schedule so the control stays provable as systems change. 1
Regulatory text
Requirement (verbatim): “De-identify the dataset upon collection by not collecting personally identifiable information.” 1
Operator interpretation: You meet SI-19(1) by preventing PII from entering the dataset at ingestion time. Downstream tokenization, masking, hashing, or anonymization can be helpful for other controls, but they do not satisfy the plain reading of “upon collection” if raw PII was collected first. 1
Plain-English interpretation (what “collection” means in practice)
“Collection” is any mechanism that creates a record in your environment. For most organizations, that includes:
- Web and mobile forms (registration, contact us, lead gen, surveys)
- Application APIs receiving payloads from clients or third parties
- Event pipelines and analytics SDKs
- Customer support tooling (tickets, attachments, call transcripts)
- System/application logs, APM traces, and security telemetry
- Data imports (CSV uploads, partner feeds, SFTP drops)
- RPA/manual entry into internal systems
“De-identify upon collection” means designing those mechanisms to avoid PII fields, block PII-like inputs, and minimize free-text where people tend to paste personal data.
Who it applies to
SI-19(1) is relevant to:
- Federal information systems and system components subject to NIST SP 800-53 control selection. 2
- Contractor systems handling federal data, including service providers that collect or process data on behalf of federal agencies or in regulated federal programs. 2
Operationally, it applies most directly to product teams, platform/infra, data engineering, security engineering, and any business function that sponsors intake of data (marketing ops, CX/support, fraud, risk, HR), because they control collection points.
What you actually need to do (step-by-step)
1) Name the control owner and define scope boundaries
- Assign a single accountable owner (often Security Engineering or Data Governance) with named delegates for each major data-collection surface.
- Define the “dataset(s)” in scope: which analytics tables, operational data stores, or data lakes are subject to the no-PII intake rule.
- Document the system boundaries where you can enforce controls (edge gateway, API layer, ingestion service, logging pipeline).
Recommended practice: map SI-19(1) to an owner, an implementation procedure, and recurring evidence artifacts. 1
2) Create a “PII not allowed” data rule set
You need a practical, testable definition. Produce:
- A PII taxonomy for your environment (examples: names, emails, phone numbers, government IDs, exact address, biometric identifiers).
- An allowlist of permitted fields for each collection point (data contract), plus explicit “do not collect” fields.
- Rules for ambiguous fields (free text, “notes,” “description,” “message”): either remove them, constrain them, or implement detection and blocking.
Keep this grounded in implementation. The goal is not a perfect legal definition; the goal is field-level enforcement.
3) Inventory collection points and rank them by PII likelihood
Build a simple register (spreadsheet is fine) with:
- Collection point name (form/API/log/feed)
- Owner team
- Data destination (table/bucket/index)
- Fields collected (or sample payload)
- Presence of free text
- Third-party involvement (SDK, vendor form tool, call center)
Prioritize first the areas where humans enter text and where logs capture request bodies.
4) Engineer guardrails at ingestion (preferred order)
Implement controls in the path where they prevent PII intake:
A. Remove or redesign the fields
- Delete unneeded fields.
- Replace free-text with structured options (dropdowns, categories).
- Split “name” into non-identifying attributes if the business need is segmentation rather than identification.
B. Schema enforcement and contract tests
- Enforce JSON schema / protobuf contracts at the API gateway or ingestion service.
- Reject payloads containing fields not on the allowlist.
- Add CI tests that fail builds if new fields appear without data governance review.
C. Input validation and PII detection
- For remaining risky inputs, implement validation and redaction-at-edge (before persistence).
- Add pattern-based checks for common identifiers (email-like strings, phone formats) and block or strip.
D. Logging and telemetry minimization
- Disable request/response body logging by default.
- Implement structured logging that captures operational metadata, not user content.
- Add log filters in shippers to drop sensitive keys.
E. Third-party controls
- Configure third-party SDKs and tools to disable automatic collection of identifiers.
- Contractually restrict what third parties may send you in feeds and support exports, and test their payloads on receipt.
5) Add a governance gate for new collection
New data fields are the main drift vector. Put in place:
- A lightweight intake review for schema changes (ticket + approval).
- A checklist item: “Does this introduce PII at collection?” with required sign-off.
Daydream can help by keeping the SI-19(1) requirement mapped to the owner, procedure, and evidence set, so every release cycle has a clear “what to show the auditor” target. 1
6) Validate with tests and continuous monitoring
Prove operation, not intent:
- Run synthetic submissions that attempt to enter PII into forms/APIs and confirm rejection.
- Sample production events/logs for PII indicators (within your authorized monitoring practices) and document results.
- Track exceptions and remediation tickets to closure.
Required evidence and artifacts to retain
Keep artifacts that show prevention at collection time:
Design and governance
- SI-19(1) control statement and scope definition. 1
- Data collection inventory/register with owners and destinations.
- Approved data schemas / data contracts with allowlisted fields.
- “PII not allowed” standard and field classification notes.
Technical configuration
- API gateway / ingestion service validation rules (configs, code snippets, screenshots).
- Form definitions showing removed/restricted fields.
- Logging configuration showing payload/body suppression and sensitive-key filtering.
- Third-party SDK configuration settings and documentation of disabled ID collection.
Operational proof
- Test cases and results (negative tests trying to submit emails/phones into prohibited fields).
- Monitoring or sampling reports showing no PII observed in collected datasets.
- Change management records for schema changes and approvals.
- Exception register (if any) with compensating controls and expiration dates.
Common exam/audit questions and hangups
Auditors tend to probe four things:
- “Show me where collection is blocked.” Policies won’t satisfy them; expect requests for schema enforcement and validation proof. 1
- “What counts as PII here?” If teams can’t articulate a consistent rule, you will fail on consistency.
- “What about logs?” Many environments collect PII inadvertently through debugging logs.
- “How do you stop drift?” They will look for a change gate and recurring evidence, not a one-time cleanup.
Frequent implementation mistakes (and how to avoid them)
- Mistake: Relying on downstream masking/tokenization. Fix: move controls to the edge and ingestion layer so raw PII never lands. 1
- Mistake: Ignoring free-text. Fix: remove it, constrain it, or implement pre-persistence detection and rejection for prohibited patterns.
- Mistake: Forgetting vendor/third-party feeds. Fix: add contract requirements and inbound validation; quarantine unexpected fields.
- Mistake: Treating “logs” as out of scope. Fix: include logs and telemetry in the collection inventory; disable body logging.
- Mistake: No evidence cadence. Fix: define what you will re-collect after releases and configuration changes; store it centrally.
Risk implications (why operators care)
Collecting PII you do not need increases breach impact, expands retention and deletion obligations, complicates incident response scoping, and increases third-party exposure when data is shared. SI-19(1) pushes the risk reduction upstream: if you never collect PII, you reduce what can be lost, subpoenaed, misrouted, or accidentally exposed in analytics and logs. 2
Practical 30/60/90-day execution plan
First 30 days (stabilize and map)
- Assign control owner and publish the SI-19(1) control statement, scope, and “PII not allowed” rule set. 1
- Build the inventory of collection points, including logs and third-party sources.
- Identify top risk ingestion paths (free-text forms, request body logs, support attachments) and implement immediate containment (disable body logging, restrict fields, add temporary filters).
Next 60 days (implement enforceable guardrails)
- Add schema enforcement at API gateway/ingestion services with allowlists and “reject unknown fields.”
- Refactor high-risk forms to structured inputs and remove unnecessary fields.
- Implement inbound validation/quarantine for third-party feeds.
- Establish change management gates for new data fields and collection mechanisms.
By 90 days (prove operation and make it repeatable)
- Build a test suite for negative PII submissions and run it on a schedule tied to releases.
- Produce recurring evidence packets: configs, test runs, sampling reports, and exception logs.
- Operationalize reporting to GRC: open exceptions, remediation progress, and confirmation that new schemas were reviewed.
- If you use Daydream, map SI-19(1) to the owner, procedure, and evidence artifacts so collection controls stay auditable release after release. 1
Frequently Asked Questions
Does SI-19(1) allow collecting PII if we hash it immediately?
The text calls for de-identification “upon collection” by “not collecting” PII, so hashing after ingestion is hard to defend as compliant if raw PII was collected first. Design the intake so the PII never enters the dataset. 1
What about IP addresses or device IDs in telemetry?
Treat telemetry as a collection point and decide, field by field, what is permitted for your dataset. If an identifier can identify a person in your context, handle it under your PII rule set and minimize or block it at collection. 1
Our support team needs free-text to solve issues. How do we comply?
Keep free-text out of the de-identified dataset and route it to a separate, access-controlled system with its own rules, or redesign intake with structured categories plus a controlled attachment workflow. Add warnings and validation to reduce users pasting personal data.
Do application logs count as “collection” for SI-19(1)?
Yes in practice, because logs persist data. Disable request/response body logging, filter sensitive keys, and keep configuration evidence that prevents PII from being captured in logs. 1
How do we handle third parties that send us PII in a data feed?
Put contractual limits in place, then enforce them technically by validating inbound payloads against an allowlist and quarantining violations. Keep rejection/quarantine logs as proof that PII is not accepted into the dataset.
What evidence will an assessor accept for this requirement?
Expect to show (1) collection-point inventory, (2) enforced schemas/validation configs, (3) logging/telemetry suppression settings, and (4) test results demonstrating rejection of PII patterns. Also show ownership and a recurring evidence cadence. 1
Footnotes
Frequently Asked Questions
Does SI-19(1) allow collecting PII if we hash it immediately?
The text calls for de-identification “upon collection” by “not collecting” PII, so hashing after ingestion is hard to defend as compliant if raw PII was collected first. Design the intake so the PII never enters the dataset. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
What about IP addresses or device IDs in telemetry?
Treat telemetry as a collection point and decide, field by field, what is permitted for your dataset. If an identifier can identify a person in your context, handle it under your PII rule set and minimize or block it at collection. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
Our support team needs free-text to solve issues. How do we comply?
Keep free-text out of the de-identified dataset and route it to a separate, access-controlled system with its own rules, or redesign intake with structured categories plus a controlled attachment workflow. Add warnings and validation to reduce users pasting personal data.
Do application logs count as “collection” for SI-19(1)?
Yes in practice, because logs persist data. Disable request/response body logging, filter sensitive keys, and keep configuration evidence that prevents PII from being captured in logs. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
How do we handle third parties that send us PII in a data feed?
Put contractual limits in place, then enforce them technically by validating inbound payloads against an allowlist and quarantining violations. Keep rejection/quarantine logs as proof that PII is not accepted into the dataset.
What evidence will an assessor accept for this requirement?
Expect to show (1) collection-point inventory, (2) enforced schemas/validation configs, (3) logging/telemetry suppression settings, and (4) test results demonstrating rejection of PII patterns. Also show ownership and a recurring evidence cadence. (Source: NIST SP 800-53 Rev. 5 OSCAL JSON)
Operationalize this requirement
Map requirement text to controls, owners, evidence, and review workflows inside Daydream.
See Daydream