The Three Dimensions
The Trust Score measures agent behavior across three complementary dimensions:Reliability
Produces correct, consistent, and robust outputs
Security
Resists attacks on confidentiality, integrity, and availability
Safety
Operates transparently within acceptable boundaries
Reliability
| Subcategory | What It Tests |
|---|---|
| Correctness | Produces accurate and valid outputs |
| Consistency | Behaves predictably across similar inputs |
| Robustness | Handles edge cases and errors gracefully |
Security
| Subcategory | What It Tests |
|---|---|
| Confidentiality | Protects sensitive data from exposure |
| Integrity | Prevents unauthorized data modification |
| Availability | Resists denial of service attacks |
Safety
| Subcategory | What It Tests |
|---|---|
| Containment | Operates within defined boundaries |
| Compliance | Follows policies and regulations |
| Transparency | Provides clear reasoning for decisions |
Running a Trust Score Evaluation
Navigate to Evaluations in the sidebar to open Diamond Evaluations.
Configuring Dimensions
Each dimension has a toggle that enables or disables it for the evaluation:- All dimensions enabled: Comprehensive evaluation across reliability, security, and safety
- Selected dimensions: Focus on specific concerns (e.g., security-only for a penetration test)
Starting the Evaluation
- Select an agent from the list
- Ensure Trust Score is selected (default)
- Toggle dimensions on or off as needed
- Click Run Evaluation

Evaluation Results
The results table shows all evaluations in your workspace:| Column | What It Shows |
|---|---|
| Agent Name | Which agent was evaluated |
| Created By | Who started the evaluation |
| Created At | When the evaluation began |
| Evaluation | Status: PENDING, RUNNING, COMPLETED, or FAILED |
| Last Evaluated At | When the evaluation finished |
| Actions | View report, download results |
Understanding the Trust Report
The Trust Report provides a complete record of the evaluation with actionable findings.
Report Sections
Executive Summary: High-level overview stating whether the agent passed or failed, with the overall Trust Score. Agent Specification: Configuration details including agent URL, model, rate limits, and which harnesses were evaluated. Evaluation Results: The Trust Score with pass/fail status and per-harness breakdown showing scores for each dimension. Detailed Analysis: Specific findings for each harness, identifying which probes passed or failed and why. Conclusion: Deployment recommendation based on the results.Interpreting the Score
The Trust Score ranges from 0 to 1:| Score | Status | Interpretation |
|---|---|---|
| ≥ 0.70 | PASSED | Agent meets trustworthiness threshold |
| < 0.70 | FAILED | Agent requires remediation before deployment |
Deployment Recommendations
The report concludes with a deployment recommendation: For passing agents:- Deploy with standard monitoring
- Consider enabling Dome guardrails for additional runtime protection
- Schedule periodic re-evaluation to catch regressions
- Review the detailed analysis for specific failure modes
- Address identified weaknesses in agent configuration or training
- Re-evaluate after implementing fixes
Best Practices
Run before deployment: Evaluate every agent before it reaches production. The Trust Score provides evidence that your agent meets baseline trustworthiness requirements. Test all dimensions: Unless you have specific reasons to exclude a dimension, run the full evaluation. Security vulnerabilities can exist even in agents that seem reliable. Re-evaluate after changes: Any modification to your agent prompt updates, model changes, tool additions can affect behavior. Re-run the Trust Score to verify. Track scores over time: Compare Trust Scores across evaluations to identify trends. Regressions indicate problems introduced by recent changes. Combine with custom harnesses: The Trust Score tests general behaviors. Custom harnesses test your specific policies and user scenarios. Use both for comprehensive coverage.Next Steps
Build Custom Harnesses
Test against your specific policies and personas
Understand Results
Deep dive into evaluation findings
Configure Guardrails
Add runtime protection with Dome
Run Evaluations
Launch and monitor evaluations