B002—Detect adversarial input

>Control Description

Implement monitoring capabilities to detect and respond to adversarial inputs and prompt injection attempts

Application

Optional

Frequency

Every 3 months

Capabilities

Universal

>Controls & Evidence (5)

Technical Implementation

B002.1

Config: Adversarial input detection and alerting

Core - This should include:

- Establishing detection and alerting. For example, implementing monitoring for prompt injection patterns, jailbreak techniques, adversarial input attempts, and exceeding rate limits, configuring alerts and threat notifications for suspicious activities.

Typical evidence: Screenshot of monitoring system, SIEM, or detection code showing rules and alerts for adversarial inputs - may include prompt injection detection patterns, jailbreak technique signatures, rate limit monitoring with threshold alerts, or notification configurations (Slack, PagerDuty, email)

Location: Engineering Code

B002.2

Logs: Adversarial incident and response

Core - This should include:

- Implementing incident logging and response procedures. For example, logging suspected adversarial attacks with relevant context, escalating to designated personnel based on severity, and documenting response actions in a centralized system.

Typical evidence: Screenshot of incident management system or logs showing adversarial attack handling - may include log entries with timestamps and user/session context, escalation runbooks defining severity thresholds, or incident tickets in Jira/PagerDuty/ServiceNow documenting response actions and workflows.

Location: Logs, Engineering Tooling

B002.3

Documentation: Updates to detection config

Core - This should include:

- Maintaining detection effectiveness through quarterly reviews. For example, updating detection rules based on emerging adversarial techniques, analyzing incident patterns and documenting system improvements.

Typical evidence: Quarterly review documentation showing detection updates - for example, review meeting notes with incident pattern analysis, updated detection rules with version history, or tracking records showing rule improvements (e.g. GitHub/Jira tickets).

Location: Engineering Practice, Internal processes

B002.4

Config: Pre-processing adversarial detection

Supplemental - This may include:

- Implementing adversarial input detection prior to AI model processing where feasible. For example, using pre-processing filters to flag likely threats before model processing.

Typical evidence: Screenshot of pre-processing filtering logic or gateway - may include pattern-matching or heuristic code checking inputs before model processing, WAF or API gateway rules blocking adversarial patterns, or IP-based filtering.

Location: Engineering Code

B002.5

Config: AI security alerts

Supplemental - This may include:

- Integrating adversarial input detection into existing security operations tooling. For example, forwarding flagged inputs to SIEM platforms, correlating detection with authentication and network logs, enabling SOC teams to triage AI-related security events.

Typical evidence: Screenshot of SIEM platform, SOC tooling, or log forwarding configuration showing adversarial detection integration - may include Splunk/Datadog/Elastic SIEM ingesting AI adversarial alerts, correlation rules linking AI events with authentication or network logs, SOC dashboard displaying AI security event triage, or code forwarding flagged inputs to security platforms.

Location: Engineering Tooling

>Cross-Framework Mappings

NIST AI RMF

MANAGE-2.2

MANAGE-3.2

GOVERN-4.1

OWASP Top 10 for LLMs

Ask AI

Configure your API key to use AI features.