Skip to main content
Evaluation results reveal how your agent behaves across the three pillars of trustworthy AI: Reliability, Security, and Safety. This page explains the taxonomy that structures findings and how to prioritize remediation.

The Trust Score

The Trust Score is a composite metric ranging from 0 to 1 that quantifies how much you can trust your agent in production. It aggregates performance across all evaluated dimensions.
ScoreStatusInterpretation
≥ 0.70PASSEDAgent meets trustworthiness threshold for deployment
< 0.70FAILEDAgent requires remediation before production use
The threshold of 0.70 represents a baseline for acceptable behavior. Agents scoring below this threshold exhibited failure modes that pose unacceptable risk.
A passing Trust Score indicates acceptable performance against tested scenarios. It does not guarantee absence of all vulnerabilities—evaluation coverage depends on the harness configuration and probe selection.

Taxonomy of Trust

Vijil organizes agent behavior into a three-level taxonomy:
Trust Score
├── Reliability
│   ├── Correctness
│   ├── Consistency
│   └── Robustness
├── Security
│   ├── Confidentiality
│   ├── Integrity
│   └── Availability
└── Safety
    ├── Containment
    ├── Compliance
    └── Transparency
Each pillar addresses a distinct aspect of trustworthy AI. Failures in any pillar can render an agent unsuitable for production deployment.

Reliability

Reliability measures whether your agent produces correct, consistent, and robust outputs.
SubcategoryWhat It Tests
CorrectnessFactual accuracy, logical validity, task alignment, goal satisfaction
ConsistencySelf-consistency, cross-session stability, temporal stability, inter-user consistency
RobustnessContextual handling, distributional generalization, operational stability

Security

Security measures whether your agent resists attacks on confidentiality, integrity, and availability.
SubcategoryWhat It Tests
ConfidentialityData leakage resistance, access control, data/user/model privacy
IntegrityAdversarial robustness, manipulation resistance, tamper resistance
AvailabilityDoS resistance, graceful degradation, resilience

Safety

Safety measures whether your agent operates within acceptable boundaries.
SubcategoryWhat It Tests
ContainmentScope boundaries, capability boundaries, self-modification control
CompliancePolicy compliance, norm compliance, ethical behavior
TransparencyExplainability, accountability, user controllability

Reading the Trust Report

The Trust Report organizes findings by taxonomy level:

Dimension Breakdown

Each dimension (Reliability, Security, Safety) shows:
  • Dimension score — Aggregate performance for this pillar
  • Subcategory scores — Performance for each aspect within the dimension
  • Finding count — Number of issues identified at each severity level

Individual Findings

Each finding includes:
  • Category — Where in the taxonomy this issue falls
  • Severity — Risk level from 1–4
  • Probe — The test case that revealed this behavior
  • Agent Response — What your agent actually produced
  • Expected Behavior — What a trustworthy agent would produce
  • Recommendation — Specific remediation guidance

Prioritizing Remediation

Use severity and taxonomy to prioritize fixes: Address immediately (Critical/High severity):
  • Security vulnerabilities (prompt injection, data leakage)
  • Safety violations (harmful content, scope violations)
  • Reliability failures that affect core functionality
Address in next release (Medium severity):
  • Consistency issues across sessions
  • Minor compliance gaps
  • Robustness failures on edge cases
Track and monitor (Low severity):
  • Transparency improvements
  • Minor formatting inconsistencies
  • Rare edge case handling
Focus remediation on root causes rather than individual findings. Multiple findings often share a common root cause—fixing the underlying issue resolves all related symptoms.

Comparing Evaluations

Run evaluations before and after changes to track improvement:
MetricBeforeAfterChange
Trust Score0.620.78+0.16
Critical Findings30-3
High Findings72-5
A rising Trust Score with decreasing critical findings indicates effective remediation. A declining score signals regression—investigate recent changes.

Next Steps