Building Trusted Agents

The Problem with Testing Agents

Traditional software testing catches bugs. But agents fail in ways that do not look like bugs, they hallucinate confidently, comply with requests they should refuse, and behave differently under adversarial pressure than in demos. Your unit tests pass, your integration tests pass, and then your agent leaks customer data in production. This guide shows you how to catch those failures before deployment. You’ll learn to evaluate agents against adversarial Scenarios, integrate trust gates into your CI/CD pipeline, and add runtime protection that blocks attacks your evaluations did not anticipate.

If you manage agents through a web Console rather than code, see the Agent Owner Guide. This guide focuses on programmatic integration.

What You Get from Vijil

Vijil provides two products that work together: Diamond evaluates your agent by sending hundreds of adversarial Probes and measuring how it responds. You get a Trust Score, a quantified measure of reliability, security, and safety, plus specific findings you can fix. Dome protects your agent at runtime by intercepting inputs and outputs. When Diamond identifies vulnerabilities you cannot immediately fix, Dome blocks the attack patterns in production.

Three Developer Workflows

Different roles use Vijil differently. This guide serves all three:

Build & Test

Individual DeveloperYou’re building an agent and want fast feedback on whether it’s trustworthy. Run evaluations locally, see results in minutes, iterate quickly.

Automate & Gate

Platform EngineerYou’re integrating Vijil into CI/CD. Evaluations run on every PR, Trust Scores gate deployments, and failures block merges.

Protect & Audit

Security and ComplianceYou need evidence that agents meet security requirements. Trust Reports document what was tested, and Dome provides runtime defense-in-depth.Start here:

What Vijil Measures

Agents fail across three dimensions. Vijil tests all of them:

Dimension	What It Measures	Example Failures
Reliability	Does the agent do what it is supposed to do?	Hallucinations, task failures, inconsistent responses
Security	Can the agent resist adversarial manipulation?	Prompt injection, data exfiltration, jailbreaks
Safety	Does the agent stay within acceptable boundaries?	Policy violations, harmful content, unauthorized actions

Each evaluation produces a Trust Score (0–1) with breakdowns by dimension. The score tells you where your agent is strong, where it is vulnerable, and whether it meets your deployment threshold.

Time to First Trust Score

You can get your first evaluation result quickly directly from the Console without any complex integration or coding required. The quickstart uses your existing agent endpoints with no modifications required.

Integration Points

Vijil integrates with the tools you already use:

Framework	Integration
LangChain / LangGraph	`GuardrailRunnable` for chains, `LocalAgentExecutor` for evaluation
Google ADK	Before/after callbacks for Dome, ADK Runner for evaluation
Custom Python	Wrap any function that takes a prompt and returns a response
CI/CD	GitHub Actions, GitLab CI, or any system that can run Python

Documentation Index

​The Problem with Testing Agents

​What You Get from Vijil

​Three Developer Workflows

Build & Test

Automate & Gate

Protect & Audit

​What Vijil Measures

​Time to First Trust Score

​Integration Points

The Problem with Testing Agents

What You Get from Vijil

Three Developer Workflows

What Vijil Measures

Time to First Trust Score

Integration Points