> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vijil.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Run Evaluations

> Start, monitor, and retrieve trust Evaluations programmatically using the CLI, REST API, or MCP.

[Vijil Evaluate](https://vijil.ai/evaluate) is a quality assurance framework that automates the testing of LLM applications. An **Evaluation** in Vijil is an automated test run where you select one or more AI agents and a test [Harness](/concepts/evaluation-components/harness) (covering [Security](/concepts/trust-score/security), [Reliability](/concepts/trust-score/reliability), and [Safety](/concepts/trust-score/safety)) to systematically assess the quality, safety, and reliability of LLM applications.

## Prerequisites

* CLI configured and authenticated, or a Bearer token from `POST /auth/jwt/login` — see [Console API](/developer-guide/getting-started/api)
* A registered [Agent](/tutorials/manage-agents) and its UUID

## Start an Evaluation

<Tabs>
  <Tab title="CLI">
    ```bash theme={null}
    vijil eval run \
      --agent-id "$AGENT_ID" \
      --harness-names '["safety", "security"]' \
      --sample-size 50 \
      --wait
    ```

    | Flag              | Description                                  | Required |
    | ----------------- | -------------------------------------------- | -------- |
    | `--agent-id`      | UUID of the Agent to evaluate                | Yes      |
    | `--harness-names` | JSON array of Harness names                  | Yes      |
    | `--sample-size`   | Probes per Harness (1–1000); omit to run all |          |
    | `--harness-type`  | `standard` (default) or `custom`             |          |
    | `--wait`          | Block until the evaluation completes         |          |
    | `--json`          | Output as JSON                               |          |

    <Tip>Use `--sample-size 10` for fast iteration during development. Run the full Harness before releasing to production.</Tip>
  </Tab>

  <Tab title="API">
    The API returns `202 Accepted` immediately. Use the returned `evaluation_id` to poll for status.

    ```bash theme={null}
    curl -s -X POST "$VIJIL_URL/evaluations/" \
      -H "Authorization: Bearer $TOKEN" \
      -H "Content-Type: application/json" \
      -d "{
        \"agent_id\": \"$AGENT_ID\",
        \"team_id\": \"$TEAM_ID\",
        \"harness_names\": [\"safety\", \"security\"],
        \"sample_size\": 50
      }"
    ```

    Response:

    ```json theme={null}
    {
      "evaluation_id": "e5f6a7b8-...",
      "status": "starting"
    }
    ```

    Save the evaluation ID:

    ```bash theme={null}
    export EVAL_ID="e5f6a7b8-..."
    ```

    | Field           | Description                                  | Required |
    | --------------- | -------------------------------------------- | -------- |
    | `agent_id`      | UUID of the Agent to evaluate                | Yes      |
    | `team_id`       | UUID of your team                            | Yes      |
    | `harness_names` | Array of Harness names                       | Yes      |
    | `sample_size`   | Probes per Harness (1–1000); omit to run all |          |
    | `harness_type`  | `"standard"` (default) or `"custom"`         |          |
  </Tab>

  <Tab title="MCP">
    With the [Vijil MCP server](/developer-guide/agentic/mcp) configured, ask Claude Code in natural language:

    <Prompt description="Run a safety and security evaluation on agent a1b2c3d4-... with sample size 50, and wait for it to complete">
      Run a safety and security evaluation on agent a1b2c3d4-... with sample size 50, and wait for it to complete
    </Prompt>

    Claude calls `eval_run` with `wait=True` and reports back Trust Scores when the evaluation finishes.
  </Tab>
</Tabs>

## Check Status

Status progresses through: `starting` → `pending` → `running` → `completed` → `saving` → `saved`. It may also be `failed` or `canceled`.

<CodeGroup>
  ```bash title="CLI" theme={null}
  vijil eval status <evaluation_id>
  ```

  ```bash title="API" theme={null}
  curl -s "$VIJIL_URL/evaluations/$EVAL_ID" \
    -H "Authorization: Bearer $TOKEN"
  ```
</CodeGroup>

When the status reaches `completed`, the response includes per-Harness Trust Scores:

```json theme={null}
{
  "evaluation_id": "e5f6a7b8-...",
  "status": "completed",
  "scores": {
    "safety": 0.82,
    "security": 0.67
  },
  "completed_at": 1712506200
}
```

## Retrieve Results

Get the full per-Probe breakdown once the evaluation status is `saved`.

<CodeGroup>
  ```bash title="CLI" theme={null}
  vijil eval results-detail <evaluation_id> --json | jq '.scores'
  ```

  ```bash title="API" theme={null}
  curl -s "$VIJIL_URL/evaluation-results/$EVAL_ID/results?team_id=$TEAM_ID" \
    -H "Authorization: Bearer $TOKEN"
  ```
</CodeGroup>

## Generate a Report

Produce a Trust Report for sharing or archiving.

<CodeGroup>
  ```bash title="CLI" theme={null}
  vijil eval report <evaluation_id>
  ```

  ```bash title="HTML" theme={null}
  curl -s "$VIJIL_URL/evaluations/$EVAL_ID/html?team_id=$TEAM_ID" \
    -H "Authorization: Bearer $TOKEN" \
    -o report.html
  ```

  ```bash title="PDF" theme={null}
  curl -s "$VIJIL_URL/evaluations/$EVAL_ID/pdf?team_id=$TEAM_ID" \
    -H "Authorization: Bearer $TOKEN" \
    -o report.pdf
  ```
</CodeGroup>

## List and Cancel Evaluations

<CodeGroup>
  ```bash title="List" theme={null}
  vijil eval list --agent-id "$AGENT_ID" --status completed
  ```

  ```bash title="Cancel" theme={null}
  vijil eval cancel <evaluation_id>
  ```

  ```bash title="Delete" theme={null}
  vijil eval delete <evaluation_id>
  ```
</CodeGroup>

Use `GET /evaluations/` and `POST /evaluations/{evaluation_id}/cancel` for the REST API equivalents.

## Next Steps

<CardGroup cols={1}>
  <Card title="Understand Results" icon="chart-bar" href="/developer-guide/evaluate/understanding-results">
    Deep dive into scores and failures
  </Card>

  <Card title="Custom Harnesses" icon="wrench" href="/developer-guide/evaluate/custom-harnesses">
    Create targeted Evaluations
  </Card>

  <Card title="Cloud Providers" icon="cloud" href="/developer-guide/evaluate/cloud-providers">
    Configure provider integrations
  </Card>
</CardGroup>
