What You’ll Accomplish
This guide walks you through the core Vijil workflow:
- Register your agent in the Agent Registry
- Register your target environment with personas and policies
- Create a custom harness with generated test cases
- Evaluate your agent
- Review the results and Trust Score
- Defend it with Dome guardrails
- Observe agent security and safety in production
By the end, you’ll have a Trust Score for your agent and a trusted agent ready for production.
Step 1: Register Your Agent
Navigate to Agents in the sidebar to open the Agent Registry.
Click + Register Agent to open the registration modal.
Vijil supports three access levels, each enabling progressively deeper evaluation:
| Access Level | What You Provide | What Vijil Can Test |
|---|
| Black Box | Description, API endpoint, credentials | Input/output behavior only |
| Grey Box | Model config, MCP config, A2A config | Vulnerabilities traced to agent composition |
| White Box | Full configuration and source code | SAST and DAST analysis for thorough audit |
For most agents, start with Black Box:
- Enter a Name for your agent
- Add your Agent Description
- Set Status to Draft for evaluation
- Provide the Agent URL (where it’s running)
- Add your Access Key and set the Rate Limit
- Click Register Agent
If your agent delegates to sub-agents or uses MCP tools, expand Grey Box to register those connections. This enables Vijil to test the full execution graph.
Step 2: Create a Custom Harness
While the Trust Score harness provides a comprehensive baseline evaluation, custom harnesses let you test your agent against specific personas and organizational policies.
Navigate to Harnesses and click + Create Harness to open the harness wizard.
The wizard walks you through four steps:
- Basic Info — Name and describe your harness
- Select Agent — Choose which agent this harness evaluates
- Select Personas — Choose user profiles that will interact with your agent
- Select Policies — Choose organizational rules your agent must follow
Choose Personas
Personas define who interacts with your agent during evaluation. Vijil generates realistic test cases from each persona’s perspective.
| Persona Type | What It Tests |
|---|
| Regular users | Normal usage patterns, edge cases |
| Security researchers | Adversarial prompts, jailbreak attempts |
| Domain experts | Technical accuracy, specialized knowledge |
Select 2-3 personas that represent your actual user base and potential threat actors.
Choose Policies
Policies define what rules your agent must follow. These can be:
- Operational policies — Business rules, response guidelines
- Compliance frameworks — GDPR, HIPAA, SOC 2 requirements
- Safety policies — Content restrictions, escalation procedures
Once configured, click Create to save your harness. Vijil generates test cases based on your personas and policies:
Review Your Harness
After creation, the harness detail page shows your configuration:
Set status to Active when ready to use in evaluations.
Step 3: Run an Evaluation
Navigate to Evaluations to open Diamond Evaluations.
The evaluation page presents two panels:
Select Agent — Choose which registered agent to evaluate.
Select Harness — Choose what to test. The Trust Score harness evaluates three dimensions:
| Dimension | What It Measures |
|---|
| Reliability | Correctness, consistency, robustness |
| Security | Confidentiality, integrity, availability |
| Safety | Containment, compliance, transparency |
Toggle individual dimensions on or off based on your needs. For a comprehensive evaluation, leave all three enabled.
Select Custom to use a harness you’ve configured with specific personas and policies.
Once you’ve selected an agent and harness, click Run Evaluation.
Monitor Progress
While the evaluation runs, the status updates in real-time in the Evaluation Results table. Depending on the rate limit, evaluations can take 5-30 minutes.
Step 4: Review Results
When the evaluation completes, click the view icon in the Actions column to open the Trust Report.
The report provides:
- Trust Score — A 0-1 score (threshold: 0.70 for passing)
- Pass/Fail Status — Clear deployment recommendation
- Per-Harness Breakdown — Scores for each evaluated dimension
- Agent Specification — Configuration used during evaluation
- Deployment Recommendation — Action items based on findings
Interpreting the Score
| Score | Interpretation | Next Step |
|---|
| ≥ 0.70 | Passed | Deploy with standard monitoring |
| 0.50 - 0.69 | Marginal | Deploy with Dome guardrails |
| < 0.50 | Failed | Remediate before production |
A passing score means your agent handled the probes within acceptable bounds. A failing score identifies specific failure modes to address.
The Trust Score quantifies known risks based on the harness you ran. It doesn’t guarantee absence of all vulnerabilities—only that your agent performed acceptably against the tested scenarios.
Navigate to Guardrails to see your registered agents and their protection status.
Agents show either Domed (protected) or Unprotected. Click Configure Guardrails to open the Dome configuration.
Dome provides two guard pipelines:
Input Guards — Filter and validate requests before they reach your agent. Common guards:
- Security Guard — Detects prompt injection, jailbreak attempts
- Moderation Guard — Blocks toxic or inappropriate content
- Privacy Guard — Redacts PII from inputs
Output Guards — Filter responses before they reach users. Common guards:
- Privacy Guard — Prevents PII leakage
- Moderation Guard — Catches harmful outputs
- Click + Add in either Input Guards or Output Guards
- Select a guard type from the dropdown
- Configure execution settings:
- Early Exit — Stop pipeline on first detection (recommended for security)
- Serial/Parallel — Execute guards sequentially or concurrently
Test Your Configuration
Use the Execution Flow panel to test how inputs flow through your guardrail pipeline:
- Enter a test prompt in the input field (e.g., a message containing PII)
- Click Send to see how each guard processes it
- Watch the flow diagram show which detectors activate
- Verify guards trigger as expected for your test cases
The visual flow shows each guard and its underlying detectors—like prompt-injection-mbert for security or privacy-presidio for PII detection.
When satisfied, click Save Configuration.
Step 6: Deploy and Monitor
With Dome configured, integrate it into your application using the SDK:
from vijil import Dome
dome = Dome(
api_key="your-vijil-api-key",
agent_id="your-agent-id"
)
# Guard inputs before sending to your agent
safe_input = dome.guard_input(user_message)
# Guard outputs before returning to users
safe_output = dome.guard_output(agent_response)
See the Dome SDK documentation for detailed integration patterns.
Monitor in Production
Return to the Guardrails dashboard to monitor your protected agents. Click View Dashboard to see:
- Real-time detection events
- Guard trigger rates
- Blocked vs. allowed requests
Next Steps