Skip to main content

From Evaluation to Defense

Evaluation tells you how trustworthy your agent is. Defense keeps it that way in production. The distinction matters. Evaluation runs before deployment—it’s a test. Defense runs during deployment—it’s a filter. Evaluation probes your agent with hundreds of scenarios to find weaknesses. Defense intercepts every real interaction to block attacks in real-time. Vijil connects the two. Diamond evaluations don’t just produce a Trust Score—they inform Dome defense configurations. When you evaluate an agent, Vijil learns:
  • Agent characteristics: System prompt, capabilities, intended scope
  • Environment context: Personas (who uses the agent), policies (what rules apply)
  • Failure modes: Which scenarios the agent failed, at what confidence levels
From this, Vijil generates custom guardrail configurations tuned to your agent. If evaluation found your agent vulnerable to crescendo attacks, Dome enables the crescendo detector. If your agent handles financial data, PII guards are configured with stricter thresholds. The guardrails aren’t generic—they’re calibrated to the specific risks your evaluation uncovered. Vijil Dome provides this runtime defense through a layered architecture. Every input to your agent passes through a configurable pipeline of guards. Every output passes through another. Threats are detected, logged, and handled before they can cause harm. Defense flow: Input through Guardrail, Guards, Detectors, then to Agent, then Output through Detectors, Guards, Guardrail The flow has three phases: Input Defense (left side, descending): Filter what reaches your agent
  • Guardrail → A configured pipeline of guards for input protection
  • Guard → A specific protection type (prompt injection, PII, toxicity)
  • Detector → The model or rule that identifies threats
  • Action → What happens: pass, block, redact, or warn
Execution (bottom, horizontal): The protected interaction
  • Filtered Input → Agent → Raw Output: Clean input reaches your agent; raw output needs checking
Output Defense (right side, ascending): Filter what leaves your agent
  • Action → Decision on the output
  • Detector → Analyzes the response
  • Guard → Applies output-specific protections
  • Guardrail → The complete output filtering pipeline

Why Runtime Defense?

Evaluation finds vulnerabilities. Defense prevents exploitation. Even a well-evaluated agent faces risks in production: Novel attacks: New jailbreaks and injection techniques emerge constantly. Your agent might have scored well against known attacks, but attackers don’t stand still. Context-specific threats: Production traffic includes things evaluation can’t anticipate—real user data, real business context, real adversaries probing for weaknesses. Defense in depth: Evaluation is one layer. Defense is another. Together, they provide coverage that neither achieves alone.

The Components

ComponentRoleExample
GuardrailConfigures which guards run and howInput guardrail with PII redaction and injection blocking
GuardProtects against a category of threatsPrompt injection guard, toxicity guard, PII guard
DetectorThe detection logic within a guardDeBERTa classifier, regex patterns, LLM-as-judge
ActionWhat happens when a threat is detectedblock, redact, warn, log

Actions

When a detector identifies a threat, the guard takes an action:
ActionBehavior
PassNo threat detected; continue normally
LogThreat detected but allowed; logged for monitoring
WarnThreat flagged; continues with warning metadata
RedactSensitive content removed or masked; continues with sanitized content
BlockThreat blocked; request rejected with error
Actions are configurable per guard. You might log low-confidence PII detections, redact high-confidence ones, and block prompt injections entirely.

Latency and Performance

Defense adds latency—every guard adds processing time. Dome is designed for minimal overhead:
  • Fast detectors: Pattern matching and small classifiers run in single-digit milliseconds
  • Parallel execution: Independent guards run concurrently
  • Configurable depth: Use fewer guards for latency-sensitive applications
  • Caching: Repeated patterns skip re-analysis
Typical latency: 10-50ms for a standard guardrail configuration. You trade some latency for runtime protection.

Next Steps