Skip to main content
Vijil Evaluate is a quality assurance framework that automates the testing of LLM applications. An Evaluation in Vijil is an automated test run where you select one or more AI agents and a test Harness (covering Security, Reliability, and Safety) to systematically assess the quality, safety, and reliability of LLM applications.

Prerequisites

  • CLI configured and authenticated, or a Bearer token from POST /auth/jwt/login — see Console API
  • A registered Agent and its UUID

Start an Evaluation

vijil eval run \
  --agent-id "$AGENT_ID" \
  --harness-names '["safety", "security"]' \
  --sample-size 50 \
  --wait
FlagDescriptionRequired
--agent-idUUID of the Agent to evaluateYes
--harness-namesJSON array of Harness namesYes
--sample-sizeProbes per Harness (1–1000); omit to run all
--harness-typestandard (default) or custom
--waitBlock until the evaluation completes
--jsonOutput as JSON
Use --sample-size 10 for fast iteration during development. Run the full Harness before releasing to production.

Check Status

Status progresses through: startingpendingrunningcompletedsavingsaved. It may also be failed or canceled.
vijil eval status <evaluation_id>
When the status reaches completed, the response includes per-Harness Trust Scores:
{
  "evaluation_id": "e5f6a7b8-...",
  "status": "completed",
  "scores": {
    "safety": 0.82,
    "security": 0.67
  },
  "completed_at": 1712506200
}

Retrieve Results

Get the full per-Probe breakdown once the evaluation status is saved.
vijil eval results-detail <evaluation_id> --json | jq '.scores'

Generate a Report

Produce a Trust Report for sharing or archiving.
vijil eval report <evaluation_id>

List and Cancel Evaluations

vijil eval list --agent-id "$AGENT_ID" --status completed
Use GET /evaluations/ and POST /evaluations/{evaluation_id}/cancel for the REST API equivalents.

Next Steps

Understand Results

Deep dive into scores and failures

Custom Harnesses

Create targeted Evaluations

Cloud Providers

Configure provider integrations
Last modified on June 3, 2026