Run Evaluations

Vijil Evaluate is a quality assurance framework that automates the testing of LLM applications. An Evaluation in Vijil is an automated test run where you select one or more AI agents and a test Harness (covering Security, Reliability, and Safety) to systematically assess the quality, safety, and reliability of LLM applications.

Prerequisites

Before setting up an Evaluation, you must have:

Access to your Vijil Console deployment
Pointed your CLI to your deployment using vijil init (if using the CLI)
Registered an Agent

Setting up via Console UI

Navigate to the Evaluations section in the Vijil Console.
From the Create Evaluation > Select Agents section, choose one or more Agents you have previously created. If you do not have an Agent, register one first.
In the Select Harness section, you can configure:
- Trust Scores - Choose between Security, Reliability, and Safety, or select all.
- Custom - Select your custom Harness.
- Benchmarks - Select specific benchmarks from the Trust Scores.
- Garak - Select Garak Scenarios.
Under Run Configuration, you will see your selected Agent(s). By pressing on the dropdown icon, you can configure model parameters:
- Temperature - Controls randomness (e.g., 1.0 is more creative, 0.1 is more deterministic).
- Top P - Restricts output to higher-probability tokens.
- Max Completion Tokens - Maximum number of tokens the model can generate in a single response.
- Requests Timeout - Maximum time to wait for a response from the Agent.
Enter an Evaluation name in the prompt field.
Press Create.

Evaluations can take some time to complete depending on the sample size and model responsiveness. You can monitor the progress from the Evaluations list.

Work in Progress

The programmatic evaluation capabilities are currently in private preview and subject to change.

Next Steps

Understand Results

Deep dive into scores and failures

Custom Harnesses

Create targeted evaluations

Cloud Providers

Configure provider integrations

Last modified on April 28, 2026

Understand ResultsInterpret evaluation scores, analyze failures, and plan remediation.

⌘I

Prerequisites
Setting up via Console UI
Next Steps

Quick Start

Deploy Vijil

Agentic Workflow

Evaluate Agents

Protect Agents

API Reference

Run Evaluations

Prerequisites

Setting up via Console UI

Work in Progress

Next Steps

Understand Results

Custom Harnesses

Cloud Providers

Quick Start

Deploy Vijil

Agentic Workflow

Evaluate Agents

Protect Agents

API Reference

Documentation Index

​Prerequisites

​Setting up via Console UI

Work in Progress

​Next Steps

Understand Results

Custom Harnesses

Cloud Providers

Prerequisites

Setting up via Console UI

Next Steps