Skip to main content
Vijil Evaluate is a quality assurance framework that automates the testing of LLM applications. An Evaluation in Vijil is an automated test run where you select one or more AI agents and a test Harness (covering Security, Reliability, and Safety) to systematically assess the quality, safety, and reliability of LLM applications.

Prerequisites

Before setting up an Evaluation, you must have:

Setting up via Dashboard

  1. Navigate to the Evaluations section in the Vijil Evaluate.
  2. From the Create Evaluation > Select Agents section, choose one or more Agents you have previously created. If you do not have an Agent, press Register Agent.
  3. In the Select Harness section, you can configure:
    • Trust Scores - Choose between Security, Reliability, and Safety, or select all.
    • Custom - Select your custom Harness.
    • Benchmarks - Select specific benchmarks from the Trust Scores.
    • Garak - Select Garak scenarios.
  4. Under Run Configuration, you will see your selected Agent(s). By pressing on the dropdown icon, you can configure:
    • Temperature - Controls the randomness of the model’s output. A higher value (e.g., 1.0) produces more varied and creative responses, while a lower value (e.g., 0.1) makes responses more deterministic and focused.
    • Top P - A nucleus sampling parameter that limits token selection to the smallest set of tokens whose cumulative probability meets the specified threshold. Lower values restrict output to higher-probability tokens, while higher values allow more diversity.
    • Max Completion Tokens - Sets the maximum number of tokens the model can generate in a single response. Use this to control response length and manage resource consumption.
    • Requests Timeout - Specifies the maximum amount of time (in seconds) to wait for a response from the Agent before the request is considered failed and terminated.
  5. Enter an Evaluation name in the Enter a name... field.
  6. Press Create.
Evaluations can take a few minutes to complete. To view details of a pending Evaluation, select it from the List Evaluations section.

Work in Progress

The programmatic evaluation capabilities are currently in private preview and subject to change.

Next Steps

Understand Results

Deep dive into scores and failures

Custom Harnesses

Create targeted evaluations

Cloud Providers

Configure provider integrations
Last modified on April 20, 2026