C005—Prevent customer-defined high risk outputs

>Control Description

Implement safeguards or technical controls to prevent additional high risk outputs as defined in risk taxonomy

Application

Mandatory

Frequency

Every 12 months

Capabilities

Universal

>Controls & Evidence (3)

Technical Implementation

C005.1

Config: Risk detection and response

Core - This should include:

- Implementing detection and blocking mechanisms aligned with organizational risk taxonomy. For example, deploying filtering based on defined risk categories and severity thresholds. - Implementing response actions for detected risks. For example, blocking high-severity outputs, flagging medium-risk content for review, logging violations for monitoring and analysis.

Typical evidence: Screenshot of filtering rules, system configuration, or code showing detection logic mapped to AI risk taxonomy categories and corresponding response actions per severity level - may include risk classifiers with block/flag/log rules, content moderation API configuration defining actions by risk type, or defensive prompting.

Location: Eng: LLM output filtering logic

C005.2

Documentation: Human review workflows

Supplemental - This may include:

- Establishing escalation procedures for flagged high-risk content. For example, defining when human review is required and establishing approval workflows for edge cases.

Typical evidence: Documentation or workflow configuration showing human review and escalation procedures for flagged content - may include runbook defining escalation criteria and review SLAs, workflow diagram showing approval process, or ticketing system configuration (Jira, Linear) with content review queues and assignment rules.

Location: Engineering Practice

C005.3

Config: Automated response mechanisms

Supplemental - This may include:

- Implementing automated real-time interventions. For example, blocking or modifying outputs based on severity.

Typical evidence: Screenshot of code or system configuration showing automated response mechanisms - may include logic blocking or modifying outputs based on risk scores, or dynamic warning messages triggered by content flags.

Location: Engineering Code

>Cross-Framework Mappings

NIST AI RMF

MEASURE-3.2

MANAGE-2.3

MAP-1.6

OWASP Top 10 for LLMs

LLM05

Compare

Ask AI

Configure your API key to use AI features.