Evidence of Trustworthiness
AI agents stall between development and production. Security wants proof the agent won’t leak data. Compliance wants proof it follows policy. The business wants proof it actually works. Without objective evidence, these conversations become negotiations based on intuition, and agents sit in pilot programs indefinitely. This guide shows you how to generate that evidence. You’ll learn to evaluate agents against specific failure modes, quantify risk in terms stakeholders understand, and make deployment decisions backed by data.If you’re a developer integrating Vijil into your codebase, see the Developer Guide. This guide focuses on evaluation workflows through the Vijil console.
Two Stakeholders, One Decision
Agent deployment requires sign-off from people with different priorities. This guide serves both:Business Owner
Your goal: Deploy an agent that delivers business value.You’re accountable for an agent that will serve customers, generate content, or automate workflows. You need it in production—but you need evidence that justifies that decision.What you get from Vijil:
- Fast evaluation cycles that don’t block releases
- Clear pass/fail criteria you can plan against
- Evidence that satisfies reviewers without over-testing
- Run Evaluations — Test efficiently
- Understand Results — Know when you’re ready
- Deliver Trust Reports — Evidence for stakeholders
Risk Officer
Your goal: Approve agents with quantified, acceptable risk.You’re accountable for security, privacy, and compliance. You need to authorize deployment—but you need evidence that the risk is understood and mitigated.What you get from Vijil:
- Quantified risk across security, safety, and reliability
- Audit-ready evidence with versioned reports
- Compensating controls when residual risk remains
- Set Policies — Testable requirements
- Review Trust Reports — Defensible audit artifacts
- Audit Agent Behavior — Continuous risk visibility
Balance Speed and Risk with Vijil
These roles have competing pressures:| Business Owner | Risk Officer |
|---|---|
| Measured on delivery speed | Measured on risk prevention |
| Asks: “Is this agent ready to ship?” | Asks: “Can I prove this agent is safe?” |
| Frustrated by review delays | Frustrated by pressure to approve without evidence |
- Quantified — A score, not an opinion
- Reproducible — Same agent, same harness, comparable results
- Auditable — Timestamped reports with full test parameters
What Vijil Measures
Agents fail in ways that don’t look like bugs. They hallucinate facts, comply with requests they should refuse, and behave differently under adversarial pressure than in demos. Vijil evaluates agents across three dimensions that capture these failure modes:| Dimension | What It Measures | Example Failures |
|---|---|---|
| Reliability | Does the agent do what it’s supposed to do? | Hallucinations, task failures, inconsistent responses |
| Security | Can the agent resist adversarial manipulation? | Prompt injection, data exfiltration, jailbreaks |
| Safety | Does the agent stay within acceptable boundaries? | Policy violations, harmful content, unauthorized actions |
For detailed breakdowns of each dimension, see The Trust Score in the Concepts guide.
Get to the baseline in 5 minutes
Get to your first Trust Score in five steps:1
Create your account
Sign up at console.vijil.ai and set up your workspace.
2
Register your agent and its environment
Add your agent by providing as little information as only it’s URL and description or as much information as its source code. Vijil uses the behavior and composition of the agent to test and defend it.
3
Select a harness
Select the Trust Score harness for comprehensive testing, or build your own harness with your user personas and org policies.
4
Run an evaluation
Execute the harness against your agent. Diamond runs hundreds of probes and returns a Trust Score in minutes.
5
Review your Trust Report
Examine where your agent passed and failed. Each finding includes the failure mode, severity, and remediation guidance.