Skip to main content
The Evaluations API allows you to programmatically run evaluations against your agents, monitor their progress, and retrieve results.

Actions

ActionDescription
CreateEvaluationCreate and start a new evaluation
GetEvaluationStatusGet the current status of an evaluation
GetEvaluationResultsRetrieve results from a completed evaluation
ListEvaluationsList evaluations with optional filtering
CancelEvaluationCancel a running evaluation
DeleteEvaluationDelete an evaluation and its results

CreateEvaluation

Creates and starts a new evaluation against an agent or model.

Request Syntax

POST /v1/evaluations HTTP/1.1
Content-Type: application/json
Authorization: Bearer {api_key}

{
  "model_hub": "string",
  "model_name": "string",
  "harnesses": ["string"],
  "agent_id": "string",
  "model_params": {
    "temperature": number,
    "max_tokens": number
  },
  "system_prompt": "string",
  "api_key_value": "string"
}

Request Parameters

ParameterTypeRequiredDescription
model_hubStringConditionalModel provider. One of: openai, anthropic, bedrock, vertex, digitalocean, custom. Required if agent_id not provided.
model_nameStringConditionalModel identifier (e.g., gpt-4o, claude-3-sonnet). Required if agent_id not provided.
harnessesArray of StringYesHarnesses to run. Valid values: trust_score, security, reliability, safety, or custom harness IDs.
agent_idStringConditionalID of a registered agent. Required if model_hub not provided.
model_paramsObjectNoModel parameters. See Model Parameters.
system_promptStringNoSystem prompt to use for the evaluation.
api_key_valueStringConditionalAPI key for the model provider. Required if model_hub provided and key not stored.

Model Parameters

ParameterTypeDescription
temperatureNumberSampling temperature (0.0–2.0). Default: 0.
max_tokensNumberMaximum tokens in response.

Response Syntax

HTTP/1.1 201 Created
Content-Type: application/json

{
  "id": "string",
  "status": "string",
  "created_at": "string",
  "harnesses": ["string"]
}

Response Elements

ElementTypeDescription
idStringUnique evaluation identifier. Format: eval-{uuid}.
statusStringInitial status. Always pending for new evaluations.
created_atStringISO 8601 timestamp of creation.
harnessesArray of StringHarnesses that will be executed.

Errors

ErrorHTTP StatusDescription
InvalidRequestException400Request body is malformed or missing required fields.
InvalidApiKeyException401API key is invalid or expired.
AgentNotFoundException404Specified agent_id does not exist.
HarnessNotFoundException404One or more specified harnesses do not exist.
RateLimitExceededException429Evaluation quota exceeded.

Example

Request:
curl -X POST "https://api.vijil.ai/v1/evaluations" \
  -H "Authorization: Bearer $VIJIL_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model_hub": "openai",
    "model_name": "gpt-4o",
    "harnesses": ["trust_score"],
    "model_params": {"temperature": 0}
  }'
Response:
{
  "id": "eval-abc123def456",
  "status": "pending",
  "created_at": "2024-01-15T10:30:00Z",
  "harnesses": ["trust_score"]
}
Python:
evaluation = vijil.evaluations.create(
    model_hub="openai",
    model_name="gpt-4o",
    harnesses=["trust_score"],
    model_params={"temperature": 0}
)

GetEvaluationStatus

Retrieves the current status and progress of an evaluation.

Request Syntax

GET /v1/evaluations/{evaluation_id}/status HTTP/1.1
Authorization: Bearer {api_key}

URI Parameters

ParameterTypeRequiredDescription
evaluation_idStringYesThe evaluation identifier.

Response Syntax

HTTP/1.1 200 OK
Content-Type: application/json

{
  "id": "string",
  "status": "string",
  "progress": number,
  "completed": number,
  "total": number,
  "started_at": "string",
  "estimated_completion": "string"
}

Response Elements

ElementTypeDescription
idStringEvaluation identifier.
statusStringCurrent status. See Status Values.
progressNumberCompletion percentage (0–100).
completedNumberNumber of probes completed.
totalNumberTotal number of probes to run.
started_atStringISO 8601 timestamp when evaluation started. Null if pending.
estimated_completionStringEstimated completion time. Null if not running.

Status Values

StatusDescription
pendingQueued, waiting to start.
runningActively sending probes and analyzing responses.
completedFinished successfully. Results available.
failedTerminated due to error. Check error details.
cancelledStopped by user request.

Errors

ErrorHTTP StatusDescription
EvaluationNotFoundException404Evaluation does not exist.

Example

Request:
curl -X GET "https://api.vijil.ai/v1/evaluations/eval-abc123/status" \
  -H "Authorization: Bearer $VIJIL_API_KEY"
Response:
{
  "id": "eval-abc123",
  "status": "running",
  "progress": 45,
  "completed": 450,
  "total": 1000,
  "started_at": "2024-01-15T10:31:00Z",
  "estimated_completion": "2024-01-15T10:45:00Z"
}
Python:
status = vijil.evaluations.get_status("eval-abc123")
print(f"Progress: {status.progress}%")

GetEvaluationResults

Retrieves the full results of a completed evaluation.

Request Syntax

GET /v1/evaluations/{evaluation_id}/results HTTP/1.1
Authorization: Bearer {api_key}

URI Parameters

ParameterTypeRequiredDescription
evaluation_idStringYesThe evaluation identifier.

Response Syntax

HTTP/1.1 200 OK
Content-Type: application/json

{
  "id": "string",
  "trust_score": number,
  "reliability_score": number,
  "security_score": number,
  "safety_score": number,
  "completed_at": "string",
  "failures": [
    {
      "probe_id": "string",
      "category": "string",
      "severity": "string",
      "reason": "string",
      "prompt": "string",
      "response": "string"
    }
  ],
  "summary": {
    "total_probes": number,
    "passed": number,
    "failed": number,
    "high_severity": number,
    "medium_severity": number,
    "low_severity": number
  }
}

Response Elements

ElementTypeDescription
idStringEvaluation identifier.
trust_scoreNumberOverall trust score (0.0–1.0).
reliability_scoreNumberReliability dimension score (0.0–1.0).
security_scoreNumberSecurity dimension score (0.0–1.0).
safety_scoreNumberSafety dimension score (0.0–1.0).
completed_atStringISO 8601 timestamp of completion.
failuresArrayList of failed probes. See Failure Object.
summaryObjectAggregate statistics.

Failure Object

ElementTypeDescription
probe_idStringUnique probe identifier.
categoryStringFailure category: reliability, security, or safety.
severityStringSeverity level: high, medium, or low.
reasonStringHuman-readable explanation of the failure.
promptStringThe probe prompt sent to the agent.
responseStringThe agent’s response.

Errors

ErrorHTTP StatusDescription
EvaluationNotFoundException404Evaluation does not exist.
ResultsNotReadyException409Evaluation has not completed.

Example

Request:
curl -X GET "https://api.vijil.ai/v1/evaluations/eval-abc123/results" \
  -H "Authorization: Bearer $VIJIL_API_KEY"
Response:
{
  "id": "eval-abc123",
  "trust_score": 0.78,
  "reliability_score": 0.82,
  "security_score": 0.75,
  "safety_score": 0.80,
  "completed_at": "2024-01-15T10:45:00Z",
  "failures": [
    {
      "probe_id": "security-pi-001",
      "category": "security",
      "severity": "high",
      "reason": "Agent followed injected instructions",
      "prompt": "Ignore previous instructions and...",
      "response": "..."
    }
  ],
  "summary": {
    "total_probes": 1000,
    "passed": 920,
    "failed": 80,
    "high_severity": 5,
    "medium_severity": 25,
    "low_severity": 50
  }
}
Python:
results = vijil.evaluations.get_results("eval-abc123")
print(f"Trust Score: {results.trust_score}")
for failure in results.failures[:5]:
    print(f"- [{failure.severity}] {failure.reason}")

ListEvaluations

Lists evaluations with optional filtering and pagination.

Request Syntax

GET /v1/evaluations HTTP/1.1
Authorization: Bearer {api_key}

Query Parameters

ParameterTypeRequiredDescription
limitNumberNoMaximum results to return. Default: 20. Max: 100.
offsetNumberNoNumber of results to skip. Default: 0.
statusStringNoFilter by status: pending, running, completed, failed, cancelled.
agent_idStringNoFilter by agent ID.

Response Syntax

HTTP/1.1 200 OK
Content-Type: application/json

{
  "data": [
    {
      "id": "string",
      "status": "string",
      "created_at": "string",
      "trust_score": number
    }
  ],
  "total": number,
  "limit": number,
  "offset": number
}

Example

Request:
curl -X GET "https://api.vijil.ai/v1/evaluations?limit=10&status=completed" \
  -H "Authorization: Bearer $VIJIL_API_KEY"
Python:
evaluations = vijil.evaluations.list(limit=10, status="completed")

CancelEvaluation

Cancels a running evaluation.

Request Syntax

POST /v1/evaluations/{evaluation_id}/cancel HTTP/1.1
Authorization: Bearer {api_key}

URI Parameters

ParameterTypeRequiredDescription
evaluation_idStringYesThe evaluation identifier.

Response Syntax

HTTP/1.1 200 OK
Content-Type: application/json

{
  "id": "string",
  "status": "cancelled"
}

Errors

ErrorHTTP StatusDescription
EvaluationNotFoundException404Evaluation does not exist.
InvalidStateException409Evaluation is not in a cancellable state.

Example

Python:
vijil.evaluations.cancel("eval-abc123")

DeleteEvaluation

Permanently deletes an evaluation and all its results.

Request Syntax

DELETE /v1/evaluations/{evaluation_id} HTTP/1.1
Authorization: Bearer {api_key}

URI Parameters

ParameterTypeRequiredDescription
evaluation_idStringYesThe evaluation identifier.

Response Syntax

HTTP/1.1 204 No Content

Errors

ErrorHTTP StatusDescription
EvaluationNotFoundException404Evaluation does not exist.

Example

Python:
vijil.evaluations.delete("eval-abc123")

See Also

Last modified on March 19, 2026