Skip to main content
TL;DR: Vijil evaluates Agents using a four-layer hierarchy: Harness, Scenario, Probe, and Detector. You select one or more Harnesses and the platform runs the full test suite automatically, with each layer narrowing scope from the overall evaluation environment down to individual response checks.
The Trust Score measures reliability, security, and safety. But how do you actually test for these properties? You cannot just ask an agent are you trustworthy?. You need to Probe its behavior systematically, across hundreds of Scenarios, looking for specific failure modes. Vijil’s evaluation service consists of Harnesses, Scenarios, Probes, and Detectors: Vijil Evaluation components

Evaluation Hierarchy at a Glance

LayerWhat It IsYour Role
HarnessTop-level collection of Scenarios that produces a Trust ScoreSelect before running
ScenarioGroup of related Probes targeting one failure categoryDefined in the Harness
ProbeOne or more adversarial prompts sent to your agentGenerated per Scenario
DetectorResponse analyzer that marks a Probe as pass or failRuns automatically
At the lowest level, Detectors scan model responses for undesirable features and register responses with those features as successful attacks on the model. For example, a Detector may be designed to look for fake Python packages. At the next level, each Probe consists of one of more prompts designed to elicit certain undesirable responses. For example, a Probe could contain prompts to look for malware. The next highest level consists of Scenarios, which are collections of Probes that have similar goals. At the topmost level, Harnesses are collections of one or more Scenarios that you run to generate an overall trust score/report from. To run a Vijil evaluation, you have to select one of more Harnesses to include. The current Vijil Trust Score consists of three Harnesses: Security, Safety, and Reliability.

Where Red Team Fits

Standard Diamond evaluations use the Harness hierarchy to run a known set of test cases and produce a Trust Score or custom evaluation report. Red Team is also part of Diamond, but it uses an adaptive campaign loop instead of a fixed Harness. A Red Team campaign starts from a risk taxonomy and the registered agent context, generates attack seeds for each wave, runs attackers against the target, judges the transcripts, reflects on what worked, and uses those observations to plan later waves. Use standard evaluations when you need reproducible score evidence. Use Red Team when you need deeper adversarial exploration of security, safety, policy, and data leakage risks. Learn more about Trust Score components:

Harness

Learn more about Harnesses

Scenario

Learn more about Scenarios

Probe

Learn more about Probes

Detector

Learn more about Detectors

Guard

Learn more about Guard

Guardrail

Learn more about Guardrail

Next Steps

Harness

Collections of tests that produce a score

Scenario

Groups of related test cases

Probe

Individual test prompts

Detector

Response analyzers that determine pass/fail

Guard

Specialized protection modules

Guardrail

Configurable protection pipelines
Last modified on June 11, 2026