Skip to main content

The Problem with Testing Agents

Traditional software testing catches bugs. But agents fail in ways that don’t look like bugs, they hallucinate confidently, comply with requests they should refuse, and behave differently under adversarial pressure than in demos. Your unit tests pass, your integration tests pass, and then your agent leaks customer data in production. This guide shows you how to catch those failures before deployment. You’ll learn to evaluate agents against adversarial scenarios, integrate trust gates into your CI/CD pipeline, and add runtime protection that blocks attacks your evaluations didn’t anticipate.
If you manage agents through a web Console rather than code, see the Agent Owner Guide. This guide focuses on programmatic integration.

What You Get from Vijil

Vijil provides two products that work together: Diamond evaluates your agent by sending hundreds of adversarial probes and measuring how it responds. You get a Trust Score, a quantified measure of reliability, security, and safety, plus specific findings you can fix. Dome protects your agent at runtime by intercepting inputs and outputs. When Diamond identifies vulnerabilities you can’t immediately fix, Dome blocks the attack patterns in production.

Three Developer Workflows

Different roles use Vijil differently. This guide serves all three:

Build & Test

Individual DeveloperYou’re building an agent and want fast feedback on whether it’s trustworthy. Run evaluations locally, see results in minutes, iterate quickly.Start here:

Automate & Gate

Platform EngineerYou’re integrating Vijil into CI/CD. Evaluations run on every PR, Trust Scores gate deployments, and failures block merges.Start here:

Protect & Audit

Security and ComplianceYou need evidence that agents meet security requirements. Trust Reports document what was tested, and Dome provides runtime defense-in-depth.Start here:

What Vijil Measures

Agents fail across three dimensions. Vijil tests all of them:
DimensionWhat It MeasuresExample Failures
ReliabilityDoes the agent do what it’s supposed to do?Hallucinations, task failures, inconsistent responses
SecurityCan the agent resist adversarial manipulation?Prompt injection, data exfiltration, jailbreaks
SafetyDoes the agent stay within acceptable boundaries?Policy violations, harmful content, unauthorized actions
Each evaluation produces a Trust Score (0–1) with breakdowns by dimension. The score tells you where your agent is strong, where it’s vulnerable, and whether it meets your deployment threshold.

Time to First Trust Score

You can get your first evaluation result quickly directly from the Console without any complex integration or coding required. The quickstart uses your existing agent endpoints with no modifications required.

Integration Points

Vijil integrates with the tools you already use:
FrameworkIntegration
LangChain / LangGraphGuardrailRunnable for chains, LocalAgentExecutor for evaluation
Google ADKBefore/after callbacks for Dome, ADK Runner for evaluation
Custom PythonWrap any function that takes a prompt and returns a response
CI/CDGitHub Actions, GitLab CI, or any system that can run Python
Last modified on April 14, 2026