Actions
| Action | Description |
|---|---|
| CreateEvaluation | Create and start a new evaluation |
| GetEvaluationStatus | Get the current status of an evaluation |
| GetEvaluationResults | Retrieve results from a completed evaluation |
| ListEvaluations | List evaluations with optional filtering |
| CancelEvaluation | Cancel a running evaluation |
| DeleteEvaluation | Delete an evaluation and its results |
CreateEvaluation
Creates and starts a new evaluation against an agent or model.Request Syntax
Request Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| model_hub | String | Conditional | Model provider. One of: openai, anthropic, bedrock, vertex, digitalocean, custom. Required if agent_id not provided. |
| model_name | String | Conditional | Model identifier (e.g., gpt-4o, claude-3-sonnet). Required if agent_id not provided. |
| harnesses | Array of String | Yes | Harnesses to run. Valid values: trust_score, security, reliability, safety, or custom harness IDs. |
| agent_id | String | Conditional | ID of a registered agent. Required if model_hub not provided. |
| model_params | Object | No | Model parameters. See Model Parameters. |
| system_prompt | String | No | System prompt to use for the evaluation. |
| api_key_value | String | Conditional | API key for the model provider. Required if model_hub provided and key not stored. |
Model Parameters
| Parameter | Type | Description |
|---|---|---|
| temperature | Number | Sampling temperature (0.0–2.0). Default: 0. |
| max_tokens | Number | Maximum tokens in response. |
Response Syntax
Response Elements
| Element | Type | Description |
|---|---|---|
| id | String | Unique evaluation identifier. Format: eval-{uuid}. |
| status | String | Initial status. Always pending for new evaluations. |
| created_at | String | ISO 8601 timestamp of creation. |
| harnesses | Array of String | Harnesses that will be executed. |
Errors
| Error | HTTP Status | Description |
|---|---|---|
InvalidRequestException | 400 | Request body is malformed or missing required fields. |
InvalidApiKeyException | 401 | API key is invalid or expired. |
AgentNotFoundException | 404 | Specified agent_id does not exist. |
HarnessNotFoundException | 404 | One or more specified harnesses do not exist. |
RateLimitExceededException | 429 | Evaluation quota exceeded. |
Example
Request:GetEvaluationStatus
Retrieves the current status and progress of an evaluation.Request Syntax
URI Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| evaluation_id | String | Yes | The evaluation identifier. |
Response Syntax
Response Elements
| Element | Type | Description |
|---|---|---|
| id | String | Evaluation identifier. |
| status | String | Current status. See Status Values. |
| progress | Number | Completion percentage (0–100). |
| completed | Number | Number of probes completed. |
| total | Number | Total number of probes to run. |
| started_at | String | ISO 8601 timestamp when evaluation started. Null if pending. |
| estimated_completion | String | Estimated completion time. Null if not running. |
Status Values
| Status | Description |
|---|---|
pending | Queued, waiting to start. |
running | Actively sending probes and analyzing responses. |
completed | Finished successfully. Results available. |
failed | Terminated due to error. Check error details. |
cancelled | Stopped by user request. |
Errors
| Error | HTTP Status | Description |
|---|---|---|
EvaluationNotFoundException | 404 | Evaluation does not exist. |
Example
Request:GetEvaluationResults
Retrieves the full results of a completed evaluation.Request Syntax
URI Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| evaluation_id | String | Yes | The evaluation identifier. |
Response Syntax
Response Elements
| Element | Type | Description |
|---|---|---|
| id | String | Evaluation identifier. |
| trust_score | Number | Overall trust score (0.0–1.0). |
| reliability_score | Number | Reliability dimension score (0.0–1.0). |
| security_score | Number | Security dimension score (0.0–1.0). |
| safety_score | Number | Safety dimension score (0.0–1.0). |
| completed_at | String | ISO 8601 timestamp of completion. |
| failures | Array | List of failed probes. See Failure Object. |
| summary | Object | Aggregate statistics. |
Failure Object
| Element | Type | Description |
|---|---|---|
| probe_id | String | Unique probe identifier. |
| category | String | Failure category: reliability, security, or safety. |
| severity | String | Severity level: high, medium, or low. |
| reason | String | Human-readable explanation of the failure. |
| prompt | String | The probe prompt sent to the agent. |
| response | String | The agent’s response. |
Errors
| Error | HTTP Status | Description |
|---|---|---|
EvaluationNotFoundException | 404 | Evaluation does not exist. |
ResultsNotReadyException | 409 | Evaluation has not completed. |
Example
Request:ListEvaluations
Lists evaluations with optional filtering and pagination.Request Syntax
Query Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| limit | Number | No | Maximum results to return. Default: 20. Max: 100. |
| offset | Number | No | Number of results to skip. Default: 0. |
| status | String | No | Filter by status: pending, running, completed, failed, cancelled. |
| agent_id | String | No | Filter by agent ID. |
Response Syntax
Example
Request:CancelEvaluation
Cancels a running evaluation.Request Syntax
URI Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| evaluation_id | String | Yes | The evaluation identifier. |
Response Syntax
Errors
| Error | HTTP Status | Description |
|---|---|---|
EvaluationNotFoundException | 404 | Evaluation does not exist. |
InvalidStateException | 409 | Evaluation is not in a cancellable state. |
Example
Python:DeleteEvaluation
Permanently deletes an evaluation and all its results.Request Syntax
URI Parameters
| Parameter | Type | Required | Description |
|---|---|---|---|
| evaluation_id | String | Yes | The evaluation identifier. |
Response Syntax
Errors
| Error | HTTP Status | Description |
|---|---|---|
EvaluationNotFoundException | 404 | Evaluation does not exist. |
Example
Python:See Also
- API Overview — Authentication and error handling
- Python Client Reference — Full Python SDK documentation
- Agents API — Agent management endpoints