Trust Score Harness

The Trust Score Harness provides a comprehensive evaluation of your agent across the three dimensions of trustworthy AI: Reliability, Security, and Safety. This is Vijil’s standard evaluation, designed to quantify how much you can trust your agent in production.

The Three Dimensions

The Trust Score measures agent behavior across three complementary dimensions:

Reliability

Produces correct, consistent, and robust outputs

Security

Resists attacks on confidentiality, integrity, and availability

Safety

Operates transparently within acceptable boundaries

Each dimension contains subcategories that Probe specific behaviors:

Reliability

Subcategory	What It Tests
Correctness	Produces accurate and valid outputs
Consistency	Behaves predictably across similar inputs
Robustness	Handles edge cases and errors gracefully

Security

Subcategory	What It Tests
Confidentiality	Protects sensitive data from exposure
Integrity	Prevents unauthorized data modification
Availability	Resists denial of service attacks

Safety

Subcategory	What It Tests
Containment	Operates within defined boundaries
Compliance	Follows policies and regulations
Transparency	Provides clear reasoning for decisions

Running a Trust Score Evaluation

Navigate to Evaluations in the sidebar to open Diamond Evaluations. The evaluation interface has two panels: 1. Select Agent: Choose which registered agent to evaluate. The table shows agent name and status. Only agents with status Active appear in the list. 2. Select Harness: Choose between Trust Score (standard evaluation) or Custom (your configured Harnesses). When Trust Score is selected, you see the three dimensions with toggles.

Configuring Dimensions

Each dimension has a toggle that enables or disables it for the evaluation:

All dimensions enabled: Comprehensive evaluation across reliability, security, and safety
Selected dimensions: Focus on specific concerns (e.g., security-only for a penetration test)

The subcategories beneath each dimension show what behaviors will be tested.

Starting the Evaluation

Select an agent from the list
Ensure Trust Score is selected (default)
Toggle dimensions on or off as needed
Click Run Evaluation

The evaluation runs asynchronously. Progress appears in the Evaluation Results table below.

Evaluation Results

The results table shows all evaluations in your workspace:

Column	What It Shows
Agent Name	Which agent was evaluated
Created By	Who started the evaluation
Created At	When the evaluation began
Evaluation	Status: PENDING, RUNNING, COMPLETED, or FAILED
Last Evaluated At	When the evaluation finished
Actions	View report, download results

Click the view icon to open the Trust Report for a completed evaluation.

Understanding the Trust Report

The Trust Report provides a complete record of the evaluation with actionable findings.

Report Sections

Executive Summary: High-level overview stating whether the agent passed or failed, with the overall Trust Score.
Agent Specification: Configuration details including agent URL, model, rate limits, and which Harnesses were evaluated.
Evaluation Results: The Trust Score with pass/fail status and per-Harness breakdown showing scores for each dimension.
Detailed Analysis: Specific findings for each Harness, identifying which Probes passed or failed and why.
Conclusion: Deployment recommendation based on the results.

Interpreting the Score

The Trust Score ranges from 0 to 1:

Score	Status	Interpretation
≥ 0.70	PASSED	Agent meets trustworthiness threshold
< 0.70	FAILED	Agent requires remediation before deployment

A passing score indicates the agent handled Probes within acceptable bounds. A failing score identifies specific failure modes to address before production deployment.

The Trust Score quantifies known risks based on the Probes executed. It does not guarantee absence of all vulnerabilities—only that your agent performed acceptably against the tested Scenarios.

Deployment Recommendations

The report concludes with a deployment recommendation: For passing agents:

Deploy with standard monitoring
Consider enabling Dome Guardrails for additional runtime protection
Schedule periodic re-evaluation to catch regressions

For failing agents:

Review the detailed analysis for specific failure modes
Address identified weaknesses in agent configuration or training
Re-evaluate after implementing fixes

Best Practices

Run before deployment: Evaluate every agent before it reaches production. The Trust Score provides evidence that your agent meets baseline trustworthiness requirements. Test all dimensions: Unless you have specific reasons to exclude a dimension, run the full evaluation. Security vulnerabilities can exist even in agents that seem reliable. Re-evaluate after changes: Any modification to your agent prompt updates, model changes, tool additions can affect behavior. Re-run the Trust Score to verify. Track scores over time: Compare Trust Scores across evaluations to identify trends. Regressions indicate problems introduced by recent changes. Combine with custom Harnesses: The Trust Score tests general behaviors. Custom Harnesses test your specific policies and user Scenarios. Use both for comprehensive coverage.

Next Steps

Build Custom Harnesses

Test against your specific policies and personas

Understand Results

Deep dive into evaluation findings

Configure Guardrails

Add runtime protection with Dome

Run Evaluations

Launch and monitor evaluations

​The Three Dimensions

Reliability

Security

Safety

​Reliability

​Security

​Safety

​Running a Trust Score Evaluation

​Configuring Dimensions

​Starting the Evaluation

​Evaluation Results

​Understanding the Trust Report

​Report Sections

​Interpreting the Score

​Deployment Recommendations

​Best Practices

​Next Steps

Build Custom Harnesses

Understand Results

Configure Guardrails

Run Evaluations

The Three Dimensions

Reliability

Security

Safety

Running a Trust Score Evaluation

Configuring Dimensions

Starting the Evaluation

Evaluation Results

Understanding the Trust Report

Report Sections

Interpreting the Score

Deployment Recommendations

Best Practices

Next Steps