Set up Evaluate

Vijil Evaluate is a quality assurance framework that automates the testing of LLM applications. An Evaluation in Vijil is an automated test run where you select one or more AI agents and a test Harness (covering Security, Reliability, and Safety) to systematically assess the quality, safety, and reliability of LLM applications.

Prerequisites

Before setting up an Evaluation, you must have:

A Vijil Evaluate account
Set up an Agent

Setting up via Dashboard

Navigate to the Evaluations section in the Vijil Evaluate.
From the Create Evaluation > Select Agents section, choose one or more Agents you have previously created. If you do not have an Agent, press Register Agent.
In the Select Harness section, you can configure:
- Trust Scores - Choose between Security, Reliability, and Safety, or select all.
- Custom - Select your custom Harness.
- Benchmarks - Select specific benchmarks from the Trust Scores.
- Garak - Select Garak Scenarios.
Under Run Configuration, you will see your selected Agent(s). By pressing on the dropdown icon, you can configure:
- Temperature - Controls the randomness of the model’s output. A higher value (e.g., 1.0) produces more varied and creative responses, while a lower value (e.g., 0.1) makes responses more deterministic and focused.
- Top P - A nucleus sampling parameter that limits token selection to the smallest set of tokens whose cumulative probability meets the specified threshold. Lower values restrict output to higher-probability tokens, while higher values allow more diversity.
- Max Completion Tokens - Sets the maximum number of tokens the model can generate in a single response. Use this to control response length and manage resource consumption.
- Requests Timeout - Specifies the maximum amount of time (in seconds) to wait for a response from the Agent before the request is considered failed and terminated.
Enter an Evaluation name in the Enter a name... field.
Press Create.

Evaluations can take a few minutes to complete. To view details of a pending Evaluation, select it from the List Evaluations section.

Setting up via API

To run evaluation jobs through the Vijil Evalute API and interact with the results, you need to install the client library that provides the necessary functionalities. You can do so through downloading the library from PyPI.

Shell

pip install -U vijil

To ensure you are using the latest version of the package, we recommend using the -U or --upgrade option. You need a Vijil API key to authenticate remotely through the client library. You can obtain the API key by logging into your Vijil account, going to the profile page on the dashboard, then copying the value in the Token field.

After your obtain an API key, you can export it in the environment you intend to use the client inside.

Shell

export VIJIL_API_KEY = <eyj-xxxx>

Alternatively, you can store the key in a .env file and load it into your Python environment using a library such as python-dotenv. This user token expires after 24 hours. If you plan to use the API over long periods of time, you should use machine-to-machine secrets to regularly refresh the token.

​Prerequisites

​Setting up via Dashboard

​Setting up via API

Prerequisites

Setting up via Dashboard

Setting up via API