Developer Testing and Evaluation

9 min readLast verified: March 2026By Isaac SilvermanOur methodology

The SA-11 Developer Testing and Evaluation requirement means you must contractually require system developers (including third-party developers) to run ongoing security and privacy assessments and to execute unit, integration, system, and regression testing after design, at a depth and coverage you define and can defend. You also need durable evidence that testing happens continuously through change.

Key takeaways:

You must set and document what “depth and coverage” means for your environment, then enforce it through SDLC and contracts.
Testing must span the post-design SDLC and include unit, integration, system, and regression testing with security and privacy assessment activities.
Auditors will look for traceability: requirements → test plan → test runs → results → remediation → retest → release approval.

SA-11 is one of the controls that separates “we have security tools” from “we can prove we ship securely.” In FedRAMP Moderate environments, the requirement is not satisfied by a single annual penetration test or a generic QA process. The control is explicit: you must require the developer of the system, system component, or system service to plan for ongoing security and privacy assessments and to perform multiple layers of testing and evaluation after the design stage.

Operationally, this lands in two places. First, your SDLC: you need a test strategy that defines what you test (and how deeply), when you test ¹, and how you decide to block a release. Second, your third-party governance: if any part of your system is developed by a third party, your contracts and intake requirements must force the same testing expectations and deliver the evidence you need for authorization and continuous monitoring.

The fastest path to “audit-ready” is to define depth/coverage in measurable terms (scope, environments, pass/fail gates, defect severity handling, retest expectations), connect it to change management, and keep artifacts in a single, retrievable evidence set.

Regulatory text

Requirement (excerpt): “Require the developer of the system, system component, or system service at all post-design stages of the system development life cycle to develop and implement a plan for ongoing security and privacy assessments; perform unit, integration, system, and regression testing and evaluation at an organization-defined depth and coverage.” (NIST Special Publication 800-53 Revision 5)

What the operator must do:
You must (1) impose a documented, ongoing security and privacy assessment plan on the developer(s), and (2) ensure testing occurs after design across unit, integration, system, and regression levels. You decide the testing depth and coverage, but you must define it explicitly, implement it consistently, and keep evidence that it is followed through changes and releases. (NIST Special Publication 800-53 Revision 5)

Plain-English interpretation

SA-11 requires repeatable, risk-based testing that matches how you build and change the system. “Developer” includes internal engineering teams and external developers building components or services you rely on. “Ongoing” means it can’t be a one-time event; it must be built into release cycles and change control. (NIST Special Publication 800-53 Revision 5)

Your “depth and coverage” definition is the control’s hinge. If you leave it vague (“we do testing”), you will struggle in assessments. If you define it in operational terms (“every change triggers unit tests; high-risk changes require integration and security tests; releases require documented approvals”), you can execute and defend it.

Who it applies to

Entities

Cloud Service Providers (CSPs) operating FedRAMP Moderate systems.
Federal agencies responsible for systems and services in scope. (NIST Special Publication 800-53 Revision 5)

Operational contexts that trigger SA-11 work

Internal application development, infrastructure-as-code, and platform engineering.
Any third party that develops a system component or service used in the boundary (including subcontractors).
CI/CD pipelines, hotfix workflows, emergency changes, and product upgrades.
Major configuration changes that can affect security/privacy behavior.

What you actually need to do (step-by-step)

1) Define “developer” and map it to your SDLC reality

Create a short scope statement:

What repositories, pipelines, and services are in the authorization boundary.
Which teams develop them.
Which third parties provide code, packaged software, managed services, or development work.

Output: SA-11 scope memo (1–2 pages) tied to system boundary documentation.

2) Write the Ongoing Security and Privacy Assessment Plan (developer-owned, you-approved)

SA-11 calls for a plan that is implemented post-design. Require, at minimum:

Assessment activities (security and privacy) that align to your delivery model.
Triggers (new feature, patch, dependency update, config change, environment change).
Roles (engineering, security, privacy, QA, release manager).
Evidence produced per activity and where it is stored.

Practical tip: treat this as the umbrella document that points to your detailed test plans and CI/CD controls, rather than duplicating them.

Output: Ongoing Security and Privacy Assessment Plan approved by the system owner and security leadership. (NIST Special Publication 800-53 Revision 5)

3) Define “depth and coverage” in measurable terms

You choose the depth and coverage. Make it auditable by specifying:

Depth: what techniques are required (e.g., negative testing, abuse-case testing, authz/authn test cases, input validation test cases).
Coverage: what must be included (critical services, privileged paths, admin functions, data flows handling sensitive data, shared libraries, APIs).
Environments: where tests run (developer branch, staging, pre-prod).
Release gates: what blocks deployment (failed test suite, open high-severity security defects, missing regression evidence).
Traceability: link requirements/user stories to test cases and results.

Output: Testing Standard (a policy/procedure) plus Release Gate Checklist owned by engineering with security sign-off.

4) Implement required test layers in your pipeline

SA-11 explicitly calls out these layers. Implement them with clear ownership and artifacts:

Unit testing: developer-run tests for functions/classes/modules; include security-relevant unit tests where applicable.
Integration testing: service-to-service, API, database, identity provider integration, logging and monitoring integration.
System testing: end-to-end workflows in a representative environment, including security controls behavior (session management, access control decisions, encryption behavior, logging).
Regression testing: rerun defined suites on changes to ensure security/privacy controls and core workflows still behave as intended.

Output: CI/CD pipeline configurations, test run logs, and test reports for each layer. (NIST Special Publication 800-53 Revision 5)

5) Make defects and remediation part of the control, not an afterthought

Auditors will follow the thread: test → finding → fix → retest → approval.

Standardize severity levels and response expectations.
Require retesting evidence for closed defects.
Ensure exceptions are documented with compensating controls and explicit risk acceptance.

Output: defect tickets, security findings register, retest evidence, risk acceptance records.

6) Extend requirements to third parties via contract and intake

Because SA-11 says “require the developer,” you need enforceable obligations for third parties that develop components/services:

Contract clauses requiring the assessment plan and specified testing layers.
Deliverable list: test plans, test results summaries, remediation evidence, and release notes for changes affecting your boundary.
Right-to-audit or evidence-on-request language.

Output: contract addendum, third-party due diligence checklist, evidence intake folder.

If you run third-party risk management in Daydream, configure a “Developer Testing and Evaluation” evidence request pack that triggers during onboarding and at change events (new releases, major patches, or boundary-impacting updates). Keep the request narrowly scoped to SA-11 artifacts so engineering partners can comply quickly.

7) Tie SA-11 to change management and continuous monitoring

SA-11 fails most often when teams test “sometimes,” but changes ship “all the time.”

Tag changes as low/medium/high impact.
Define required testing bundles by change impact.
Require testing evidence as part of change approval.

Output: change tickets with attached testing evidence and release approvals.

Required evidence and artifacts to retain

Keep artifacts retrievable by release/version and date:

Ongoing Security and Privacy Assessment Plan (current and prior versions). (NIST Special Publication 800-53 Revision 5)
Testing standard defining depth/coverage and release gates.
Test plans and test case inventories for unit/integration/system/regression.
Automated test results (build logs, reports) and manual test sign-offs where used.
Defect/finding records with remediation and retest proof.
Release notes and approvals that reference testing completion.
Third-party contracts/addenda and received test evidence for third-party-developed components/services.
Exception/risk acceptance records tied to specific releases.

Common exam/audit questions and hangups

“Show me your organization-defined depth and coverage.” If you can’t produce a written definition, SA-11 becomes subjective.
“How do you know regression testing happened for this release?” Expect spot checks of specific releases.
“Where is the security and privacy assessment plan, and how is it implemented?” A plan that exists only as a document without execution evidence will be challenged.
“How do you handle third-party-developed components?” Assessors will ask how you “require the developer” when the developer is external.
“Prove traceability.” They may sample a requirement/user story and ask for linked tests and results.

Frequent implementation mistakes and how to avoid them

Mistake: Treating SA-11 as a one-time penetration test. Fix: define ongoing triggers and bake them into CI/CD and change control. (NIST Special Publication 800-53 Revision 5)
Mistake: Vague “coverage” statements. Fix: name in-scope services, privileged paths, and security control behaviors that must be tested.
Mistake: No third-party enforcement mechanism. Fix: put evidence deliverables in contracts and onboarding gates.
Mistake: Test results exist, but nobody can find them. Fix: standardize a release evidence bundle and retention location.
Mistake: Overreliance on tool screenshots. Fix: keep machine-generated logs/reports plus human-readable summaries that map to releases.

Enforcement context and risk implications

No public enforcement cases were provided in the source catalog for this requirement. Practically, SA-11 weaknesses show up as authorization delays, failed assessments, and elevated residual risk because untested changes frequently introduce access control failures, logging gaps, or privacy-impacting data handling regressions. The operational risk is compounded when third-party components ship changes without test evidence you can rely on.

Practical 30/60/90-day execution plan

First 30 days (stabilize scope and expectations)

Publish SA-11 scope memo (repos, services, third parties).
Draft and approve the Ongoing Security and Privacy Assessment Plan template; assign ownership.
Define “depth and coverage” in a testing standard and agree on release gates.
Identify evidence storage location and naming conventions by release/version.

By 60 days (make it real in delivery workflows)

Implement or tighten CI/CD stages for unit, integration, and regression testing evidence capture.
Establish system testing procedure for end-to-end workflows and security control behaviors.
Integrate test evidence checks into change management approvals.
Update third-party onboarding and contract language to require SA-11 deliverables.

By 90 days (prove repeatability and audit readiness)

Run the process across multiple releases and assemble release evidence bundles.
Perform an internal “exam simulation”: sample recent changes and confirm end-to-end traceability.
Close gaps: missing evidence types, inconsistent approvals, or weak third-party submissions.
Operationalize metrics qualitatively (pass/fail gates, exception frequency) without relying on unsupported numeric targets.

Frequently Asked Questions

Does SA-11 require a specific security testing tool (SAST/DAST, etc.)?

SA-11 requires a plan for ongoing security and privacy assessments and defined testing depth/coverage, not a named tool. Choose tools and manual methods that produce durable evidence and fit your SDLC. (NIST Special Publication 800-53 Revision 5)

What counts as “post-design stages” in practice?

Treat everything after architecture/design approval as in scope: implementation, build, integration, verification, release, patching, and ongoing changes. If a change can affect security/privacy behavior, it belongs in the SA-11 testing and evidence trail. (NIST Special Publication 800-53 Revision 5)

We buy a managed service. Are we still responsible for SA-11 testing?

Yes, if that service is in your boundary, you must require the developer (the provider) to perform the testing and provide evidence appropriate to your defined depth/coverage. Your role is to set the requirement, obtain evidence, and make release/change decisions accordingly. (NIST Special Publication 800-53 Revision 5)

How do we define “depth and coverage” without creating an impossible burden on engineering?

Start by aligning depth/coverage to change impact. Make low-risk changes mostly automated, and reserve heavier system/regression and security-focused testing for high-impact changes that touch identity, authorization, cryptography, logging, or sensitive data flows.

What evidence is strongest for auditors: screenshots or exported reports?

Prefer exported, time-stamped reports and pipeline logs that tie to a specific build and release. Use short human summaries to explain what changed, what was tested, what failed, what was fixed, and what was approved.

How should we handle emergency changes?

Define an emergency path that still produces SA-11 artifacts: abbreviated pre-release tests, explicit approval, and required post-release regression/security validation with documented results and any remediation tickets.

NIST Special Publication 800-53 Revision 5

Frequently Asked Questions

Does SA-11 require a specific security testing tool (SAST/DAST, etc.)?

What counts as “post-design stages” in practice?

We buy a managed service. Are we still responsible for SA-11 testing?

How do we define “depth and coverage” without creating an impossible burden on engineering?

What evidence is strongest for auditors: screenshots or exported reports?

How should we handle emergency changes?

Authoritative Sources

NIST SP 800-53 Rev 5

Operationalize this requirement

Map requirement text to controls, owners, evidence, and review workflows inside Daydream.

See Daydream

Regulatory text

Plain-English interpretation

Who it applies to

Entities

Operational contexts that trigger SA-11 work

What you actually need to do (step-by-step)

1) Define “developer” and map it to your SDLC reality

2) Write the Ongoing Security and Privacy Assessment Plan (developer-owned, you-approved)

3) Define “depth and coverage” in measurable terms

4) Implement required test layers in your pipeline

5) Make defects and remediation part of the control, not an afterthought

6) Extend requirements to third parties via contract and intake

7) Tie SA-11 to change management and continuous monitoring

Required evidence and artifacts to retain

Common exam/audit questions and hangups

Frequent implementation mistakes and how to avoid them

Enforcement context and risk implications

Practical 30/60/90-day execution plan

First 30 days (stabilize scope and expectations)

By 60 days (make it real in delivery workflows)

By 90 days (prove repeatability and audit readiness)

Frequently Asked Questions

Does SA-11 require a specific security testing tool (SAST/DAST, etc.)?

What counts as “post-design stages” in practice?

We buy a managed service. Are we still responsible for SA-11 testing?

How do we define “depth and coverage” without creating an impossible burden on engineering?

What evidence is strongest for auditors: screenshots or exported reports?

How should we handle emergency changes?

Footnotes

Frequently Asked Questions

Authoritative Sources

Related Resources

Operationalize this requirement