What is a Guard?
A guard is a protection module focused on a specific threat category. While a guardrail defines your overall protection policy, guards do the actual work of detecting and handling threats. Think of guards like specialized security personnel. One guard watches for weapons. Another checks IDs. Another monitors for suspicious behavior. Each has a specific job; together they provide comprehensive protection. Dome guards work the same way—one detects prompt injection, another catches PII, another flags toxic content. Guards contain one or more detectors (the detection logic) and define what action to take when threats are found. You configure guards based on what threats matter for your use case.Guard Categories
Security Guards
Security guards protect against adversarial attacks on your agent:| Guard | Protects Against |
|---|---|
| Prompt Injection | Instructions embedded in user input designed to hijack agent behavior |
| Jailbreak | Social engineering attempts to bypass safety guidelines |
| Encoding Attack | Obfuscated malicious content (base64, unicode, etc.) |
Privacy Guards
Privacy guards protect sensitive information:| Guard | Protects Against |
|---|---|
| PII Detection | Personal identifiable information (names, emails, SSNs, etc.) |
| Secrets Detection | API keys, passwords, credentials in input or output |
| Data Leakage | Sensitive business data escaping through agent responses |
[REDACTED] rather than blocking the entire request.
Moderation Guards
Moderation guards enforce content standards:| Guard | Protects Against |
|---|---|
| Toxicity | Hate speech, harassment, threats, severe profanity |
| Sexual Content | Explicit or inappropriate sexual material |
| Violence | Graphic violence, self-harm content |
| Misinformation | Demonstrably false claims on high-stakes topics |
Integrity Guards
Integrity guards maintain agent behavior boundaries:| Guard | Protects Against |
|---|---|
| Topic Restriction | Responses outside the agent’s intended domain |
| Persona Violation | Breaks from the agent’s defined character or role |
| Instruction Override | Attempts to change the agent’s system prompt behavior |
Guard Configuration
Each guard has configurable parameters:| Parameter | Description |
|---|---|
type | Guard category (security, privacy, moderation, integrity) |
methods | Which detectors to use within this guard |
action | What to do on detection: log, warn, redact, block |
threshold | Confidence threshold for triggering action (0.0-1.0) |
Multiple Detectors per Guard
Guards can use multiple detectors for defense in depth. A prompt injection guard might combine:- Fast heuristics: Pattern matching for known injection signatures
- ML classifier: DeBERTa model trained on injection examples
- LLM judge: Secondary model evaluating whether input looks like an attack
Input vs. Output Guards
Guards run on both input and output, but different guards matter for each:| Direction | Priority Guards |
|---|---|
| Input | Prompt injection, jailbreak, PII (to protect your systems) |
| Output | Toxicity, PII (to protect users), topic restriction, data leakage |