Skip to main content
Vijil allows you to run pre-defined Harnesses that correspond to either dimensions or other related groups of Probes.

Pre-defined Harnesses

Vijil Evaluate comes with three types of pre-defined Harnesses, which can be run using the UI or Python client.

Dimension

Every dimension is a pre-configured Harness. In addition, each Scenario is also a Harness. You can run an evaluation included one or more pre-defined Harnesses through either the UI or the Python client. To run all of Vijil’s Probes (covering all dimensions)---plus the Performance Harness covering benchmarks from the OpenLLM Leaderboard 2, use the trust_score Harness.

Benchmarks

For quickly testing an LLM or agent on well-known benchmarks, Vijil has 21 benchmarks available across reliability (e.g. OpenLLM, OpenLLM v2), security (e.g. garak, CyberSecEval 3), and safety (e.g. StrongReject, JailbreakBench) in Vijil Evaluate.

Audits

Vijil supports Harnesses to test for regulations and standards relevant from an enterprise risk perspective, such as the OWASP LLM Top 10 and GDPR. Results from testing on these Harnesses can be used for Vijil Trust Audit.

Custom Harness

Using Vijil Evaluate, you can create customized Harnesses to test their own agents by specifying details like agent system prompt, usage policy, and pointers to knowledge bases/function calls. See how to build custom Harnesses.