Prerequisites
Before setting up an Evaluation, you must have:- A Vijil Evaluate account
- Set up an Agent
Setting up via Dashboard
- Navigate to the Evaluations section in the Vijil Evaluate.
- From the Create Evaluation > Select Agents section, choose one or more Agents you have previously created. If you do not have an Agent, press Register Agent.
- In the Select Harness section, you can configure:
- Trust Scores - Choose between Security, Reliability, and Safety, or select all.
- Custom - Select your custom Harness.
- Benchmarks - Select specific benchmarks from the Trust Scores.
- Garak - Select Garak scenarios.
- Under Run Configuration, you will see your selected Agent(s). By pressing on the dropdown icon, you can configure:
- Temperature - Controls the randomness of the model’s output. A higher value (e.g., 1.0) produces more varied and creative responses, while a lower value (e.g., 0.1) makes responses more deterministic and focused.
- Top P - A nucleus sampling parameter that limits token selection to the smallest set of tokens whose cumulative probability meets the specified threshold. Lower values restrict output to higher-probability tokens, while higher values allow more diversity.
- Max Completion Tokens - Sets the maximum number of tokens the model can generate in a single response. Use this to control response length and manage resource consumption.
- Requests Timeout - Specifies the maximum amount of time (in seconds) to wait for a response from the Agent before the request is considered failed and terminated.
- Enter an Evaluation name in the
Enter a name...field. - Press Create.
Evaluations can take a few minutes to complete. To view details of a pending Evaluation, select it from the List Evaluations section.
Work in Progress
The programmatic evaluation capabilities are currently in private preview and subject to change.
Next Steps
Understand Results
Deep dive into scores and failures
Custom Harnesses
Create targeted evaluations
Cloud Providers
Configure provider integrations