LLM07—System Prompt Leakage

>Control Description

The system prompt leakage vulnerability refers to the risk that system prompts used to steer model behavior may contain sensitive information not intended to be discovered. System prompts should not be considered secrets nor used as security controls. The fundamental risk is not disclosure itself, but that sensitive data like credentials or connection strings are stored where they shouldn't be.

>Vulnerability Types

1.Exposure of Sensitive Functionality: System prompt reveals API keys, database credentials, or tokens
2.Exposure of Internal Rules: Decision-making processes revealed that should be confidential
3.Revealing Filtering Criteria: Model's content filtering instructions exposed
4.Disclosure of Permissions and User Roles: Internal role structures and permission levels revealed

>Common Impacts

Credential exposure and unauthorized access

Security control bypass

Privilege escalation opportunities

Insight into application weaknesses

Social engineering attack vectors

>Prevention & Mitigation Strategies

1.Separate sensitive data from system prompts; externalize credentials and configuration
2.Avoid reliance on system prompts for strict behavior control
3.Implement guardrails outside of the LLM with independent output inspection
4.Ensure security controls are enforced independently from the LLM
5.Use multiple agents with least privileges for tasks requiring different access levels

>Attack Scenarios

#1Credential Extraction

An LLM has a system prompt containing credentials for a tool. The system prompt is leaked, and the attacker uses these credentials for unauthorized purposes.

#2Instruction Bypass

An LLM has a system prompt prohibiting offensive content, external links, and code execution. An attacker extracts this prompt and uses prompt injection to bypass the instructions, facilitating a remote code execution attack.

LLM07—System Prompt Leakage

>Control Description

>Vulnerability Types

>Common Impacts

>Prevention & Mitigation Strategies

>Attack Scenarios

>MITRE ATLAS Mapping

>References

Ask AI