Prerequisites
- CLI configured and authenticated, or a Bearer token from
POST /auth/jwt/login— see Console API - A registered Agent and its UUID
Start an Evaluation
- CLI
- API
- MCP
| Flag | Description | Required |
|---|---|---|
--agent-id | UUID of the Agent to evaluate | Yes |
--harness-names | JSON array of Harness names | Yes |
--sample-size | Probes per Harness (1–1000); omit to run all | |
--harness-type | standard (default) or custom | |
--wait | Block until the evaluation completes | |
--json | Output as JSON |
Check Status
Status progresses through:starting → pending → running → completed → saving → saved. It may also be failed or canceled.
completed, the response includes per-Harness Trust Scores:
Retrieve Results
Get the full per-Probe breakdown once the evaluation status issaved.
Generate a Report
Produce a Trust Report for sharing or archiving.List and Cancel Evaluations
GET /evaluations/ and POST /evaluations/{evaluation_id}/cancel for the REST API equivalents.
Next Steps
Understand Results
Deep dive into scores and failures
Custom Harnesses
Create targeted Evaluations
Cloud Providers
Configure provider integrations