Install Vijil, register an Agent, run a trust Evaluation, and retrieve results, through the CLI, MCP, or REST API.
Vijil exposes three programmatic interfaces. This quickstart takes you from a fresh setup to a completed trust Evaluation. The workflow is the same for all three, so pick the tab for the interface you prefer at each step.
You will be prompted for your email and password. The CLI stores your token in ~/.vijil/config.yaml.If you belong to multiple teams, select the one you want to work with:
vijil team listvijil team use <team_id>
Every subsequent command uses the active team automatically, so you do not need to pass a team ID manually.
vijil-mcp reads the same credentials as the CLI. Configure and log in once:
vijil auth init --url https://console-api.example.comvijil auth loginvijil team use <team_id> # if you belong to multiple teams
Then create a .mcp.json file in your project root so Claude Code launches the server:
Claude calls agent_create and shows you the new Agent including its ID. Note that ID, you will use it in the next steps. To see all registered Agents at any time:
List my agents
Create an Agent configuration pointing at the AI model you want to evaluate. The team is derived from your JWT token, so no team_id parameter is needed:
The response (HTTP 201) includes the new Agent’s id. Save it:
export AGENT_ID="a1b2c3d4-..."
4
Choose a Harness
Harnesses are test suites that cover a specific trust dimension. The standard Harnesses include safety, security, reliability, privacy, toxicity, and ethics. For this quickstart you will run safety and security.
--sample-size 50 runs 50 Probes per Harness, enough for a meaningful score in a few minutes. Omit it to run the full Harness (~1,250 Probes for security). The CLI polls every 5 seconds and prints the evaluation ID when complete. Save it:
export EVAL_ID="e5f6a7b8-..."
Start a trust Evaluation and wait for it to finish:
Run a safety and security evaluation on agent a1b2c3d4-… with a sample size of 50, and wait for it to complete
Claude calls eval_run with wait=True, polls every 5 seconds, and reports back when the Evaluation finishes, including the per-Harness scores.
Ask for a sample size of 10 for fast iteration during development. Run the full Harness before releasing to production.
Start a trust Evaluation. Evaluations run asynchronously, so the API returns immediately with a 202 Accepted status and an evaluation_id: