Diamond Evaluations
Navigate to Evaluations in the sidebar to open Diamond Evaluations.
- Create Evaluation: Configure and launch new evaluations
- Evaluation Results: Track progress and access completed evaluations
Creating an Evaluation
Select Agent
The agent table shows all registered agents in your workspace:| Column | What It Shows |
|---|---|
| Agent Name | Identifier from registration |
| Status | Active or Draft |
Select Harness
Choose what to test by selecting a harness type: Trust Score: Vijil’s standard evaluation across three dimensions:- Reliability: Correctness, consistency, robustness
- Security: Confidentiality, integrity, availability
- Safety: Containment, compliance, transparency
Run Evaluation
Once you’ve selected an agent and configured your harness:- Verify your agent selection in the left panel
- Confirm harness settings in the right panel
- Click Run Evaluation
Monitoring Progress
Running evaluations appear in the Evaluation Results table:| Column | What It Shows |
|---|---|
| Agent Name | Which agent is being evaluated |
| Created By | Who started the evaluation |
| Created At | When the evaluation began |
| Evaluation | Status: PENDING, RUNNING, COMPLETED, or FAILED |
| Last Evaluated At | When the evaluation finished |
| Actions | View report, download results |
Evaluation Status
| Status | Meaning |
|---|---|
| PENDING | Queued, waiting to start |
| RUNNING | Actively sending probes and collecting responses |
| COMPLETED | Finished successfully, results available |
| FAILED | Encountered an error, check agent connectivity |
Viewing Results
When an evaluation completes, access the results through the Actions column:- View (eye icon) — Opens the Trust Report in a new tab
- Download (download icon) — Downloads results as a file
- Overall Trust Score with pass/fail status
- Per-dimension breakdown
- Detailed findings for each probe category
- Deployment recommendations
Evaluation Considerations
Rate Limits
Diamond respects the rate limit you configured during agent registration. Higher rate limits enable faster evaluations but may exceed your provider’s quotas. If evaluations fail with timeout errors:- Verify your agent URL is accessible
- Check that your API credentials are valid
- Consider reducing the rate limit in agent settings
Agent Availability
Your agent must remain available throughout the evaluation. If your agent goes offline or becomes unresponsive, the evaluation may fail or produce incomplete results. For production agents behind load balancers, ensure sufficient capacity to handle evaluation traffic alongside normal usage.Re-running Evaluations
You can run multiple evaluations against the same agent. Each evaluation creates a new entry in the results table, allowing you to:- Track Trust Score changes over time
- Compare results before and after agent modifications
- Verify fixes for previously identified issues
Best Practices
Evaluate before deployment: Run a Trust Score evaluation on every agent before it reaches production. The results provide evidence of baseline trustworthiness. Test after changes: Any modification to your agent—prompt updates, model changes, tool additions—can affect behavior. Re-evaluate to verify. Use appropriate harnesses: The Trust Score harness tests general behaviors. For domain-specific requirements, create custom harnesses with relevant personas and policies. Monitor for regressions: Compare Trust Scores across evaluations. A declining score indicates problems introduced by recent changes.Next Steps
Understand Results
Interpret evaluation findings
Trust Score Harness
Learn about the standard evaluation
Custom Harnesses
Build targeted evaluation scenarios
Configure Guardrails
Add runtime protection with Dome