> ## Documentation Index > Fetch the complete documentation index at: https://docs.vijil.ai/llms.txt > Use this file to discover all available pages before exploring further. # Run Evaluations > Start, monitor, and retrieve trust Evaluations programmatically using the CLI, MCP, REST API, or Python SDK. [Vijil Evaluate](https://vijil.ai/evaluate) is a quality assurance framework that automates the testing of LLM applications. An **Evaluation** in Vijil is an automated test run where you select one or more AI agents and a test [Harness](/concepts/evaluation-components/harness) (covering [Security](/concepts/trust-score/security), [Reliability](/concepts/trust-score/reliability), and [Safety](/concepts/trust-score/safety)) to systematically assess the quality, safety, and reliability of LLM applications. ## Prerequisites * CLI configured and authenticated, a Bearer token from `POST /auth/jwt/login`, or the SDK authenticated with a Vijil API key, see the [Quickstart](/developer-guide/agentic/quickstart) * A registered [Agent](/tutorials/manage-agents) and its UUID ## Start an Evaluation ```bash theme={null} vijil eval run \ --agent-id "$AGENT_ID" \ --harness-names '["safety", "security"]' \ --sample-size 50 \ --wait ``` | Flag | Description | Required | | ----------------- | -------------------------------------------- | -------- | | `--agent-id` | UUID of the Agent to evaluate | Yes | | `--harness-names` | JSON array of Harness names | Yes | | `--sample-size` | Probes per Harness (1–1000); omit to run all | | | `--harness-type` | `standard` (default) or `custom` | | | `--wait` | Block until the evaluation completes | | | `--json` | Output as JSON | | Use `--sample-size 10` for fast iteration during development. Run the full Harness before releasing to production. With the [Vijil MCP server](/developer-guide/agentic/quickstart) configured, ask Claude Code in natural language: Run a safety and security evaluation on agent a1b2c3d4-... with sample size 50, and wait for it to complete Claude calls `eval_run` with `wait=True` and reports back Trust Scores when the evaluation finishes. `client.evaluate()` starts the Evaluation and polls until it completes, then returns the result: ```python theme={null} from vijil import Vijil client = Vijil() evaluation = client.evaluate("", baseline=True) print(evaluation.trust_score) # 0.82 print(evaluation.dimensions.reliability) print(evaluation.dimensions.security) print(evaluation.dimensions.safety) ``` | Parameter | Description | Required | | ---------------- | ---------------------------------------------------------------- | -------- | | `agent_id` | Agent ID or alias | Yes | | `baseline` | Run the standard trust Harnesses (reliability, security, safety) | | | `harness_id` | Run a specific custom Harness instead of the baseline | | | `_poll_interval` | Seconds between status checks (default `5.0`) | | Pass either `baseline=True` or a `harness_id`. To start without blocking, call `client.evaluations.create(agent_id="", baseline=True)` and poll `client.evaluations.show()` yourself. The API returns `202 Accepted` immediately. Use the returned `evaluation_id` to poll for status. ```bash theme={null} curl -s -X POST "$VIJIL_URL/evaluations/" \ -H "Authorization: Bearer $TOKEN" \ -H "Content-Type: application/json" \ -d "{ \"agent_id\": \"$AGENT_ID\", \"team_id\": \"$TEAM_ID\", \"harness_names\": [\"safety\", \"security\"], \"sample_size\": 50 }" ``` Response: ```json theme={null} { "evaluation_id": "e5f6a7b8-...", "status": "starting" } ``` Save the evaluation ID: ```bash theme={null} export EVAL_ID="e5f6a7b8-..." ``` | Field | Description | Required | | --------------- | -------------------------------------------- | -------- | | `agent_id` | UUID of the Agent to evaluate | Yes | | `team_id` | UUID of your team | Yes | | `harness_names` | Array of Harness names | Yes | | `sample_size` | Probes per Harness (1–1000); omit to run all | | | `harness_type` | `"standard"` (default) or `"custom"` | | ## Check Status Status progresses through: `starting` → `pending` → `running` → `completed` → `saving` → `saved`. It may also be `failed` or `canceled`. ```bash title="CLI" theme={null} vijil eval status ``` ```bash title="API" theme={null} curl -s "$VIJIL_URL/evaluations/$EVAL_ID" \ -H "Authorization: Bearer $TOKEN" ``` ```python title="SDK" theme={null} evaluation = client.evaluations.show("") print(evaluation.status) ``` When the status reaches `completed`, the response includes per-Harness Trust Scores: ```json theme={null} { "evaluation_id": "e5f6a7b8-...", "status": "completed", "scores": { "safety": 0.82, "security": 0.67 }, "completed_at": 1712506200 } ``` ## Retrieve Results Get the full per-Probe breakdown once the evaluation status is `saved`. ```bash title="CLI" theme={null} vijil eval results-detail --json | jq '.scores' ``` ```bash title="API" theme={null} curl -s "$VIJIL_URL/evaluation-results/$EVAL_ID/results?team_id=$TEAM_ID" \ -H "Authorization: Bearer $TOKEN" ``` ```python title="SDK" theme={null} score = client.scores.show("") print(score.trust_score, score.reliability, score.security, score.safety) ``` ## Generate a Report Produce a Trust Report for sharing or archiving. ```bash title="CLI" theme={null} vijil eval report ``` ```bash title="HTML" theme={null} curl -s "$VIJIL_URL/evaluations/$EVAL_ID/html?team_id=$TEAM_ID" \ -H "Authorization: Bearer $TOKEN" \ -o report.html ``` ```bash title="PDF" theme={null} curl -s "$VIJIL_URL/evaluations/$EVAL_ID/pdf?team_id=$TEAM_ID" \ -H "Authorization: Bearer $TOKEN" \ -o report.pdf ``` ```python title="SDK" theme={null} pdf_bytes = client.reports.download("") with open("report.pdf", "wb") as f: f.write(pdf_bytes) ``` ## List and Cancel Evaluations ```bash title="List" theme={null} vijil eval list --agent-id "$AGENT_ID" --status completed ``` ```bash title="Cancel" theme={null} vijil eval cancel ``` ```bash title="Delete" theme={null} vijil eval delete ``` ```python title="SDK" theme={null} page = client.evaluations.list(agent_id="") for evaluation in page.items: print(evaluation.id, evaluation.status) ``` Use `GET /evaluations/` and `POST /evaluations/{evaluation_id}/cancel` for the REST API equivalents. The SDK exposes `client.evaluations.list()` and `client.evaluations.show()`; cancel and delete are available through the CLI and REST API. ## Next Steps Deep dive into scores and failures Create targeted Evaluations Configure provider integrations