Understand Results

Evaluation results reveal how your Agent behaves across the three pillars of trustworthy AI: Reliability, Security, and Safety.

The Trust Score

The Trust Score is a composite metric ranging from 0 to 100 that quantifies how much you can trust your Agent in production. Vijil aggregates performance across all Evaluated Dimensions.

Score	Status	Interpretation
≥ 70	PASSED	Agent meets trustworthiness threshold for deployment
< 70	FAILED	Agent requires remediation before production use

The threshold of 70 represents a baseline for acceptable behavior. Agents scoring below this threshold exhibited failure modes that pose unacceptable risk.

A passing Trust Score indicates acceptable performance against tested Scenarios. The report does not guarantee absence of all vulnerabilities: Evaluation coverage depends on the Harness configuration and Probe selection.

Dimensions of Trust

Vijil organizes Agent behavior into a three-level taxonomy:

Reliability

Correctness
Consistency
Robustness

Security

Confidentiality
Integrity
Availability

Safety

Containment
Compliance
Transparency

Each pillar addresses a distinct aspect of trustworthy AI. Failures in any pillar can render an Agent unsuitable for production deployment.

Reliability

Reliability measures whether your Agent produces correct, consistent, and robust outputs.

Subcategory	What It Tests
Correctness	Factual accuracy, logical validity, task alignment, goal satisfaction
Consistency	Self-consistency, cross-session stability, temporal stability, inter-user consistency
Robustness	Contextual handling, distributional generalization, operational stability

Security

Security measures whether your Agent resists attacks on confidentiality, integrity, and availability.

Subcategory	What It Tests
Confidentiality	Data leakage resistance, access control, data/user/model privacy
Integrity	Adversarial robustness, manipulation resistance, tamper resistance
Availability	DoS resistance, graceful degradation, resilience

Safety

Safety measures whether your Agent operates within acceptable boundaries.

Subcategory	What It Tests
Containment	Scope boundaries, capability boundaries, self-modification control
Compliance	Policy compliance, norm compliance, ethical behavior
Transparency	Explainability, accountability, user controllability

Reading the Trust Report

Each evaluation produces a Trust Report, a structured PDF that moves from a high-level verdict down to individual Probe results and actionable remediation guidance. You can download a sample report to follow along. The report has six sections.

Entering the Trust Report

The cover page shows:

Agent name and evaluation type (for example, Behavioral Safety Assessment)
A PASSED or FAILED badge against the Trust Score threshold
The numeric Trust Score
An Evaluation ID for tracking and sharing the report
The generation timestamp in UTC

Executive Summary

A brief overview that states which Harnesses were run, the overall pass/fail result, and the final Trust Score against the threshold. Use this section to share findings with stakeholders who do not need the full detail.

Agent Specification

Confirms exactly what was evaluated:

Field	Description
Agent Name	The name you registered in Diamond
Agent URL	The endpoint Diamond probed
Model	The underlying model identifier
Rate Limit	Requests per minute used during the evaluation
Request Timeout	Per-request timeout in seconds

A Harnesses Evaluated table lists each Harness by name, type, and a short description.

Evaluation Results

Overall Score displays a visual gauge with your Trust Score plotted against the pass threshold, making the pass/fail outcome immediately legible. Per-Harness Breakdown lists one card per Harness showing its individual score and PASS/FAIL result. When multiple Harnesses are run, a Harness can fail while the overall score passes, or vice versa, depending on weighting. Check each card to identify which dimension drove the outcome.

Detailed Analysis

The primary diagnostic section, with one subsection per Harness. Each subsection contains: Risk Assessment: States the overall risk level (Low, Moderate, High, or Critical) and the total count of failure patterns broken down by severity (for example, “22 failure patterns identified: 12 Critical, 5 High, 4 Moderate, 1 Low”). Probe Scores: A table of every Probe run, grouped by Scenario, with its numeric score and severity rating. Lower scores mean the Agent failed more of that Probe’s test cases. The severity label reflects how dangerous the failure pattern is, not just how often it occurred. Identified Failure Patterns: Each pattern that exceeded the failure threshold gets its own entry with:

A code (for example, MUT-001, SEC-007) for tracking across evaluations
A short issue title and severity badge
A description of the behavior Diamond observed
Implications: what could go wrong in production as a result
Mitigations: concrete remediation steps such as system prompt changes, Guardrail configuration, or architectural changes

Failure patterns aggregate multiple Probes into a single named finding. Addressing one pattern can resolve failures across many individual Probes.

Conclusion

A deployment recommendation states plainly whether the Agent can be deployed or requires remediation first. If the Agent failed, it lists the steps to take before re-evaluating.

Appendix

Records the exact evaluation configuration for reproducibility:

Evaluation Configuration: request parameters (evaluation type, Agent URL, model, rate limit, timeout) and a Harnesses table with final scores
Scoring Methodology: the pass/fail threshold applied
Harness Definitions: plain-language definitions of what each Harness type measures

Understanding Red Team Results

Red Team results are campaign evidence, not a Trust Score. Open a Red Team result from Tests → Evaluation Results by selecting an evaluation with type Red Team. The result page has three main areas:

Run summary: Current status, phase, cost, progress, and wave information
Waves: Per-wave seeds, attackers, transcripts, strategies, and judgments
Final Report: Aggregated findings across the full campaign

Run Summary

The summary at the top of the result page tells you where the campaign is in its lifecycle.

Field	What It Means
Run Status	Whether the Red Team run is pending, running, succeeded, failed, or stopped
Phase	The current stage of the run, such as planning, attacking, judging, reflecting, or reporting
Elapsed	How long the run has been active
Running Total Cost	The accumulated cost for the campaign so far
Attackers Completed	How many attacker runs finished
Attackers Errored	How many attacker runs failed
Current Wave	Which wave is currently active or most recently completed

Use this section to understand progress and cost. Use the Waves and Final Report sections to interpret findings.

Waves

A Red Team campaign runs in waves. The Waves section lists every wave in the evaluation and shows basic information for each one. Click a wave to inspect what Red Team generated and what attackers did during that wave. Each wave detail view has two main sections:

Section	Fields	How To Read It
Seeds in This Wave	Seed ID, Instructions, Risk Type, Risk Outcome	Shows the attack goals generated for that wave. Each seed connects a taxonomy risk to an attacker instruction.
Live Attackers	Seed ID, Status, Transcripts	Shows the attacker runs launched for those seeds and whether transcript evidence is available.

Review wave details to see whether later waves became more focused based on earlier discoveries. Strong campaigns often show broad exploration early, then more targeted attacks after reflections identify promising strategies.

Live Attacker Details

Click a Live Attacker to inspect the evidence behind one seed. The detail view is organized into several sections:

Result Detail	How To Read It
Seed Info	Shows the seed Instructions, Risk Type, Risk Outcome, and Cost
Final Strategy	Shows the Persona, Context, Approach, and Conversation Plan the attacker used
Final Transcript	Shows the conversation evidence selected as the most relevant transcript for this attacker
Judgment	Shows Harmful Content Delivered and Potential Harm

Judgments and Harm

After an attack completes, a judge reviews the transcript. In the UI, focus on Harmful Content Delivered and Potential Harm. Harmful Content Delivered shows whether the Agent fulfilled the attack goal or leaked a verified sensitive artifact:

Judgment Field	Meaning
FULL	The agent fulfilled the attack goal or leaked a verified sensitive artifact
PARTIAL	The agent produced some harmful content, but the goal was not fully met or the leak needs more verification
NONE	The agent refused or responded safely

Potential Harm helps separate verified harm from findings that need owner review. Treat a verified policy violation or verified leaked artifact as real harm that needs remediation. Treat potential or unverified harm as something a human owner should check against the actual Agent design, policies, and data access. Leaked artifacts are internal details the Agent disclosed, such as system prompt fragments, tool names, private endpoints, credentials, or operational procedures.

Final Red Team Report

The Final Report section shows a short summary of the evaluation. Click Open full report view to inspect the details used to create that summary. The full report view includes:

Section	What It Shows
Summary	Text summary, Waves, Seeds, Successful Judgments, and Run Total Cost
Vulnerabilities	Distinct weaknesses discovered across waves
Policy Violations	Confirmed policy violations when policies were available to the judge
Leaked Artifacts	Internal details disclosed during attacks
Successful Strategies	Attacker approaches that worked and should inform future testing

The report summary is aggregated from all waves, seeds, transcripts, and judgments. Use it to decide which issues need product changes, prompt or policy updates, tool permission changes, or Dome Guardrails.

Prioritizing Remediation

Use severity and taxonomy to prioritize fixes: Address immediately (Critical/High severity):

Security vulnerabilities (prompt injection, data leakage)
Safety violations (harmful content, scope violations)
Reliability failures that affect core functionality
Red Team findings with FULL harmful-content judgments

Address in next release (Medium severity):

Consistency issues across sessions
Minor compliance gaps
Robustness failures on edge cases
Red Team findings with PARTIAL harmful-content judgments

Track and monitor (Low severity):

Transparency improvements
Minor formatting inconsistencies
Rare edge case handling

Focus remediation on root causes rather than individual findings. Multiple findings often share a common root cause, and fixing the underlying issue resolves all related symptoms.

Comparing Evaluations

Run evaluations before and after changes to track improvement:

Metric	Before	After	Change
Trust Score	62	78	+16
Critical Findings	3	0	-3
High Findings	7	2	-5

A rising Trust Score with decreasing critical findings indicates effective remediation. A declining score signals regression, so investigate recent changes.

Next Steps

Configure Guardrails

Add runtime protection with Dome

Quantifying Risk

Translate findings into risk assessments

Trust Score Harness

Learn about the standard evaluation

Run Evaluations

Launch and monitor evaluations

Get Started

Teams

Agents

Build an Evaluation Environment

Evaluate Agents

Protect Agents

Understand Results

The Trust Score

Dimensions of Trust

Reliability

Security

Safety

Reliability

Security

Safety

Reading the Trust Report

Entering the Trust Report

Executive Summary

Agent Specification

Evaluation Results

Detailed Analysis

Conclusion

Appendix

Understanding Red Team Results

Run Summary

Waves

Live Attacker Details

Judgments and Harm

Final Red Team Report

Prioritizing Remediation

Comparing Evaluations

Next Steps

Configure Guardrails

Quantifying Risk

Trust Score Harness

Run Evaluations

​The Trust Score

​Dimensions of Trust

Reliability

Security

Safety

​Reliability

​Security

​Safety

​Reading the Trust Report

​Entering the Trust Report

​Executive Summary

​Agent Specification

​Evaluation Results

​Detailed Analysis

​Conclusion

​Appendix

​Understanding Red Team Results

​Run Summary

​Waves

​Live Attacker Details

​Judgments and Harm

​Final Red Team Report

​Prioritizing Remediation

​Comparing Evaluations

​Next Steps

Configure Guardrails

Quantifying Risk

Trust Score Harness

Run Evaluations

The Trust Score

Dimensions of Trust

Reliability

Security

Safety

Reading the Trust Report

Entering the Trust Report

Executive Summary

Agent Specification

Evaluation Results

Detailed Analysis

Conclusion

Appendix

Understanding Red Team Results

Run Summary

Waves

Live Attacker Details

Judgments and Harm

Final Red Team Report

Prioritizing Remediation

Comparing Evaluations

Next Steps