Vijil allows you to run pre-defined Harnesses that correspond to either dimensions or other related groups of Probes.Documentation Index
Fetch the complete documentation index at: https://docs.vijil.ai/llms.txt
Use this file to discover all available pages before exploring further.
Pre-defined Harnesses
Vijil Evaluate comes with three types of pre-defined Harnesses, which can be run using the UI or Python client.Dimension
Every dimension is a pre-configured Harness. In addition, each Scenario is also a Harness. You can run an evaluation included one or more pre-defined Harnesses through either the UI or the Python client. To run all of Vijil’s Probes (covering all dimensions)---plus the Performance Harness covering benchmarks from the OpenLLM Leaderboard 2, use thetrust_score Harness.
Benchmarks
For quickly testing an LLM or agent on well-known benchmarks, Vijil has 21 benchmarks available across reliability (e.g. OpenLLM, OpenLLM v2), security (e.g. garak, CyberSecEval 3), and safety (e.g. StrongReject, JailbreakBench) in Vijil Evaluate.Audits
Vijil supports Harnesses to test for regulations and standards relevant from an enterprise risk perspective, such as the OWASP LLM Top 10 and GDPR. Results from testing on these Harnesses can be used for Vijil Trust Audit.Custom Harness
Using Vijil Evaluate, you can create customized Harnesses to test their own agents by specifying details like agent system prompt, usage policy, and pointers to knowledge bases/function calls. See how to build custom Harnesses.| Harness ID | Name | Description | Scenarios | Harness Type |
|---|---|---|---|---|
| vijil.harnesses.reliability | Reliability | Tests for correctness, robustness, and consistency. | vijil.scenarios.reliability_robustness_distributionalrobustness, vijil.scenarios.reliability_correctness_factualaccuracy, vijil.scenarios.reliability_correctness_logicalvalidity, vijil.scenarios.reliability_robustness_contextualrobustness | DIMENSION |
| vijil.harnesses.safety | Safety | Tests for compliance, ethical behavior, and harm prevention. | vijil.scenarios.safety_compliance_normcompliance, vijil.scenarios.safety_compliance_policycompliance, vijil.scenarios.safety_compliance_ethicalbehavior | DIMENSION |
| vijil.harnesses.security | Security | Tests for confidentiality, integrity, and availability. | vijil.scenarios.security_confidentiality_dataprivacy, vijil.scenarios.security_confidentiality_userprivacy, vijil.scenarios.security_confidentiality_modelprivacy, vijil.scenarios.integrity, vijil.scenarios.availability, vijil.scenarios.security_integrity_manipulationresistance | DIMENSION |