Harnesses¶
Vijil allows you to run pre-defined harnesses that correspond to either dimensions or other related groups of probes.
Pre-defined harnesses¶
Vijil Evaluate comes with three types of pre-defined harnesses, which can be run using the UI or Python client.
Dimension¶
Every dimension is a pre-configured harness. In addition, each scenario is also a harness. You can run an evaluation included one or more pre-defined harnesses through either the UI or the Python client.
To run all of Vijil’s probes (covering all dimensions)—plus the Performance harness covering benchmarks from the OpenLLM Leaderboard 2, use the trust_score
harness.
Benchmarks¶
For quickly testing an LLM or agent on well-known benchmarks, we have 21 benchmarks available across reliability (e.g. OpenLLM, OpenLLM v2), security (e.g. garak, CyberSecEval 3), and safety (e.g. StrongReject, JailbreakBench) in Vijil Evaluate.
Audits¶
We support harnesses to test for regulations and standards relevant from an enterprise risk perspective, such as the OWASP LLM Top 10 and GDPR. Results from testing on these harnesses can be used for Vijil Trust Audit.
Custom Harness¶
Using Vijil Evaluate, users can create customized harnesses to test their own agents by specifying details like agent system prompt, usage policy, and pointers to knowledge bases/function calls. Check out this link to learn more about custom harnesses.
Working with harnesses in Python client¶
In the Python client, you can use the following command to list all available harnesses.
client.harnesses.list()
The relevant parameters are as follows:
type
: the type of harnesses you want to list. Can be unspecified (default), or one ofdimension
,benchmark
,audit
,custom
.format
: the format of output for the list of harnesses. Can bedataframe
(default) orlist
.
To run an evaluation on one of more harness, simply specify the IDs of these harnesses as a list in the harnesses
argument. Following the an example.
client.evaluations.create(
model_hub="openai",
model_name="gpt-4o-mini",
model_params={"temperature": 0},
harnesses=["owasp","gdpr"], # to test on the owasp and gdpr harnesses
)