Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vijil.ai/llms.txt

Use this file to discover all available pages before exploring further.

Vijil Evaluate is a quality assurance framework that automates the testing of LLM applications. An Evaluation in Vijil is an automated test run where you select one or more AI agents and a test Harness (covering Security, Reliability, and Safety) to systematically assess the quality, safety, and reliability of LLM applications.

Prerequisites

Before setting up an Evaluation, you must have:
  • Access to your Vijil Console deployment
  • Pointed your CLI to your deployment using vijil init (if using the CLI)
  • Registered an Agent

Setting up via Console UI

  1. Navigate to the Evaluations section in the Vijil Console.
  2. From the Create Evaluation > Select Agents section, choose one or more Agents you have previously created. If you do not have an Agent, register one first.
  3. In the Select Harness section, you can configure:
    • Trust Scores - Choose between Security, Reliability, and Safety, or select all.
    • Custom - Select your custom Harness.
    • Benchmarks - Select specific benchmarks from the Trust Scores.
    • Garak - Select Garak Scenarios.
  4. Under Run Configuration, you will see your selected Agent(s). By pressing on the dropdown icon, you can configure model parameters:
    • Temperature - Controls randomness (e.g., 1.0 is more creative, 0.1 is more deterministic).
    • Top P - Restricts output to higher-probability tokens.
    • Max Completion Tokens - Maximum number of tokens the model can generate in a single response.
    • Requests Timeout - Maximum time to wait for a response from the Agent.
  5. Enter an Evaluation name in the prompt field.
  6. Press Create.
Evaluations can take some time to complete depending on the sample size and model responsiveness. You can monitor the progress from the Evaluations list.

Work in Progress

The programmatic evaluation capabilities are currently in private preview and subject to change.

Next Steps

Understand Results

Deep dive into scores and failures

Custom Harnesses

Create targeted evaluations

Cloud Providers

Configure provider integrations
Last modified on April 28, 2026