Create a new evaluation job.
Creates a Kubernetes Job to run the evaluation and tracks it in memory. Fetches agent configuration from Agent Registry using the provided agent_id. JWT token is retrieved from request context (set by JWTAuthMiddleware).
Args: request: Evaluation configuration with agent_id and team_id claims: JWT claims with user and team info diamond_domain: Diamond domain orchestrator
Returns: Evaluation ID, status, and status URL
Request model for creating a new evaluation.
UUID of the agent from Agent Registry
UUID of the team that owns this evaluation (required)
List of harnesses to run (e.g., ['safety', 'ethics', 'privacy', 'security', 'toxicity'])
1Type of evaluation to run. Currently only 'behavioral' is supported.
Type of all harnesses: 'standard' or 'custom'. All harnesses in harness_names must be of this type.
standard, custom Number of prompts to randomly sample per harness. If omitted, all prompts run (~1250 for security). Recommended: 10 for fast iteration, 50 for moderate, 100 for thorough.
1 <= x <= 1000Optional UUID for the evaluation. If provided, this UUID will be used when creating the evaluation in Diamond. If not provided, Diamond will generate one.