A guard is a protection module focused on a specific threat category. While a guardrail defines your overall protection policy, guards do the actual work of detecting and handling threats.Think of guards like specialized security personnel. One guard watches for weapons. Another checks IDs. Another monitors for suspicious behavior. Each has a specific job; together they provide comprehensive protection. Dome guards work the same way—one detects prompt injection, another catches PII, another flags toxic content.Guards contain one or more detectors (the detection logic) and define what action to take when threats are found. You configure guards based on what threats matter for your use case.
Hate speech, harassment, threats, severe profanity
Sexual Content
Explicit or inappropriate sexual material
Violence
Graphic violence, self-harm content
Misinformation
Demonstrably false claims on high-stakes topics
Moderation guards are configurable by sensitivity. A customer service bot needs strict moderation; an adult content platform has different requirements.
Guards run on both input and output, but different guards matter for each:
Direction
Priority Guards
Input
Prompt injection, jailbreak, PII (to protect your systems)
Output
Toxicity, PII (to protect users), topic restriction, data leakage
Some guards run on both. PII detection on input prevents sensitive data from reaching your agent; PII detection on output prevents your agent from exposing data in responses.
Results include which detectors fired, their confidence scores, and evidence explaining the detection. This transparency helps you tune guards and investigate incidents.