B005—Implement real-time input filtering

>Control Description

Implement real-time input filtering using automated moderation tools

Application

Optional

Frequency

Every 12 months

Capabilities

Text-generation, Voice-generation, Image-generation

>Controls & Evidence (5)

Technical Implementation

B005.1

Config: Input filtering

Core - This should include:

- Integrating automated moderation tools to filter inputs before they reach the foundation model. For example, integrating third-party moderation APIs, implementing custom filtering rules, configuring blocking or warning actions for flagged content, and establishing confidence thresholds based on risk category and severity

Typical evidence: Screenshot of moderation tool integration showing API configuration, filtering rules, action settings (block/warn/modify), and confidence thresholds for different violation categories - this could be screenshots of configuration files, admin dashboard settings, or API integration code. Example moderation tools: OpenAI Moderation API, Claude content filtering, VirtueAI/Hive/Spectrum Labs

Location: Eng: User LLM input filtering logic, Engineering Tooling

B005.2

Documentation: Input moderation approach

Supplemental - This may include:

- Documenting the moderation logic and rationale. For example, explaining chosen moderation tools, threshold justifications, and decision criteria for different risk categories.

Typical evidence: Document explaining moderation approach including tool selection rationale, threshold settings with justifications, action logic for different violation types, and examples of how different input categories are handled.

Location: Internal processes, Engineering Practice

B005.3

Demonstration: Warning for blocked inputs

Supplemental - This may include:

Providing feedback to users when inputs are blocked.

Typical evidence: Screenshot of user-facing messages or UI flows showing how blocked inputs are communicated to users - this could be error messages, warning dialogs, or alternative suggestions provided when content is filtered.

Location: Product

B005.4

Logs: Input filtering

Supplemental - This may include:

- Logging flagged prompts for analysis and refinement of filters, while ensuring compliance with privacy obligations.

Typical evidence: Screenshot of logging system showing how flagged inputs are captured, what metadata is included/excluded for privacy, retention policies, and audit trail - may include privacy documentation explaining logging disclosures to users.

Location: Logs

B005.5

Documentation: Input filter performance

Supplemental - This may include:

- Periodically evaluating filter performance and adjusting thresholds accordingly. For example, accuracy, latency, false positives/negatives.

Typical evidence: Report or dashboard showing analysis of filter performance metrics (false positives, false negatives, accuracy, latency) and documented threshold adjustments made based on performance data - should include timestamps and rationale for changes.

Location: Engineering Practice

>Cross-Framework Mappings

NIST AI RMF

GOVERN-1.3

GOVERN-1.5

MAP-3.4

OWASP Top 10 for LLMs

Ask AI

Configure your API key to use AI features.