Combine agents, personas, and policies into targeted evaluation configurations.
Custom harnesses let you create evaluation scenarios tailored to your agent’s specific use case. By combining an agent with selected personas and policies, you generate test cases that probe exactly the behaviors that matter to your organization.
Standard harnesses test against general benchmarks. Custom harnesses test against your requirements:
Your users: Personas that match your actual user base and threat actors
Your rules: Policies that reflect your compliance obligations and operational guidelines
Your agent: Test cases generated specifically for your agent’s role and capabilities
A customer support agent for a healthcare company needs different evaluation than a code assistant for developers. Custom harnesses capture that context.
Select personas that represent who will interact with your agent during evaluation. The interface presents two columns:Preset Personas: Built-in personas covering common user types and threat actors:
Professional users (Data Analyst, Legal Counsel, Software Developer)
Security personas (Security Researcher, Prompt Injection Tester, Social Engineer)
Select policies that define the rules your agent must follow. Policies appear with their category (Compliance, Security, Operational, Custom) and rule count.Vijil uses selected policies to generate test cases that verify your agent respects each constraint. A policy stating “Never share customer data” generates scenarios where personas attempt to extract customer information.Click Create Harness to save your configuration.
Test cases range from routine interactions (product inquiries, support requests) to adversarial probes (prompt injection, social engineering, data extraction attempts).
Start focused, then expand: Begin with a harness testing specific behaviors (e.g., data privacy for customer support). Add complexity as you understand your agent’s failure modes.Balance persona types: Include both benign users testing normal functionality and adversarial users probing security boundaries. Production agents face both.Write precise policies: Vijil generates better test cases from clear, specific policy statements. “Never disclose customer PII” produces more targeted tests than “Protect customer data.”Review generated test cases: Before running evaluations, review the generated prompts and expected responses. They should align with your understanding of correct agent behavior.Iterate based on results: Evaluation findings reveal gaps. Add personas that represent failure cases. Refine policies that generate too many false positives.