The Trust Score
The Trust Score is a composite metric from 0 to 100. It aggregates performance across all evaluated dimensions.| Score | Status | Action |
|---|---|---|
| ≥ 70 | Passed | Agent meets the deployment threshold |
| < 70 | Failed | Remediate before deploying to production |
Reading Findings
Each finding in the evaluation report includes:- Category: where in the taxonomy (Reliability / Security / Safety and subcategory) this issue falls
- Severity: risk level from 1 (Low) to 4 (Critical)
- Probe: the Probe that revealed the behavior
- Agent Response: what your agent actually produced
- Expected Behavior: what a trustworthy agent would produce
- Recommendation: specific mitigation guidance
Prioritizing Remediation
Address findings in severity order, then by dimension: Fix immediately (severity 3–4, Critical/High):- Security vulnerabilities: prompt injection compliance, data leakage
- Safety violations: harmful content, out-of-scope actions
- Reliability failures that break core functionality
- Consistency failures across sessions
- Minor compliance gaps
- Robustness failures on edge cases
- Transparency improvements
- Rare edge case handling
Viewing the Report in the Web Interface
You can view the evaluation report for any completed evaluation by navigating to Evaluations in the left sidebar. Click on the evaluation you want to view, then in the Report Analysis section, you can view the generated report, generate a new report, or regenerate a report. The report opens showing:- A summary banner with the overall Trust Score and pass/fail result
- A dimension breakdown with scores for Reliability, Security, and Safety
- A findings table filterable by severity and dimension
- Per-finding detail panels with the Probe, response, and recommendation
Generate a Report via the API
You can programmatically generate an evaluation report for a completed evaluation. Reports are available in two formats:- HTML: interactive charts and filterable findings table
- PDF: static export suitable for compliance handoffs
Evaluation reports are only supported for Vijil Harnesses and Custom Harnesses. Reports cannot be generated for benchmarks.
Work in Progress
The programmatic evaluation capabilities are currently in private preview and subject to change.
Next Steps
Run Evaluations
Execute and monitor evaluations
Custom Harnesses
Create targeted test Scenarios
Configure Guardrails
Add runtime protection