> ## Documentation Index
> Fetch the complete documentation index at: https://docs.vijil.ai/llms.txt
> Use this file to discover all available pages before exploring further.

# Evaluations

> Learn how to create, view, summarize, export, cancel, and delete evaluations.

Evaluations the way Vijil measures how trustworthy your agent is. In this section you will learn how to create, view, summarize, export, cancel, and delete evaluations in.

## Web Interface

In the left sidebar, click on **Evaluations** (clipboard icon) to view all evaluations. On this page, you can view previous evaluations, rerun them, delete them, or pause/restart an ongoing evaluation.

### Create an Evaluation

1. On the Evaluations page, click **Create Evaluation**.
2. Select the agent to evaluate. If the agent you want is not yet in the list, click **Add Agent**.
3. Select the Harnesses you want to run. See [Harnesses](/core-concepts/components/harness) for more information.
4. Once you have selected an agent, you can configure some of it is runtime parameters like temperature and maximum completion tokens in the **Run Configuration** section.
5. Optionally, enter a name for your evaluation, then click **Create**.

## Python Client

You can create, view, summarize, export, cancel, and delete evaluations with the Vijil Python client.

Before doing any of this, you will need to instantiate your Vijil client. In this topic we will assume you have instantiated a Vijil client called `client`.

### Create an Evaluation

You can use the `evaluations.create` method to create an evaluation:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.create(
      model_hub="openai",
      model_name="gpt-4o-mini",
      model_params={"temperature": 0},
      Harnesses=["security"],
      harness_params={"is_lite": False}
  )
  # If successful, returns dictionary with the following format:
  # {'id': YOUR_GUID, 'status': 'CREATED'}
  ```
</CodeGroup>

The relevant parameters are as follows:

* `model_hub`: The model hub your model is on. Currently Vijil supports `openai`, `octo`, and `together` as model hubs. Make sure you have an API key [stored](/tutorials/manage-api-keys) for the hub you want to use.
* `model_name`: The name of the model you want to use on the hub. You can get this from the relevant hub's documentation.
* `model_params`: Inference parameters like temperature and top\_p.
* `harnesses`: [Harnesses](/core-concepts/components/harness) determine which Probes you want to run, which determines what makes up your trust score.
* `harness_params`: `is_lite` determines whether you are running a "light" version of the Harness, which will be cheaper and faster. Set this to `False` if you want to run the full Harness.

## View, Describe, and Summarize Evaluations

### List Evaluations

List all evaluations with the `evaluations.list` method:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.list(limit=20)
  ```
</CodeGroup>

If you do not specify `limit`, it will return only the 10 most recent evaluations.

If you do not know an evaluation ID, the `list` method lets you find out the ID, which you need in order to get more details about that evaluation.

### Get Evaluation Status

You can view the status of an evaluation with the `evaluations.get_status` method:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.get_status(evaluation_id='96f925f6-a7a7-05dd-5f2a-665734d181ee')
  ```
</CodeGroup>

### Summarize a Completed Evaluation

Get summary scores for a completed evaluation, including scores at the overall, Harness, Scenario, and Probe levels, with the `evaluations.summarize` method:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.summarize(evaluation_id='22df0c08-4fcd-4e3d-9285-3a5e66c93f54')
  ```
</CodeGroup>

### Get Prompt-level Details

Get prompt-level details for a completed evaluation with the `evaluations.describe` method:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.describe(evaluation_id='22df0c08-4fcd-4e3d-9285-3a5e66c93f54', format='dataframe', limit=1000)
  ```
</CodeGroup>

This returns all prompts and Detector scores for each Probe. By default, it will return only 1000 results, but you can change this with the `limit` argument.

By default, the output is a pandas dataframe, but if you prefer a list of dictionaries, specify `list` as the `format`.

### Get a Hits-Only List

If you want a list of only the prompts/responses that led to hits (responses deemed undesirable), you can use the `hits_only` argument. By default, all prompts and responses will be returned.

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.describe(evaluation_id='22df0c08-4fcd-4e3d-9285-3a5e66c93f54', format='dataframe', hits_only=True)
  ```
</CodeGroup>

### Export Evaluations

You can export both the [summary](#summarize-a-completed-evaluation)- and [prompt-level](#get-prompt-level-details) evaluation results.

### Export Summary

Export the summary of an evaluation with the `evaluations.export_summary` method:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.export_summary(evaluation_id='96f925f6-a7a7-05dd-5f2a-665734d181ee', format='pdf', output_dir='./output')
  ```
</CodeGroup>

The format can be either `pdf` or `html`. `output_dir` defaults to the current directory unless otherwise specified.

### Export Prompt-level Details

Export the prompt-level details of an evaluation with the `evaluations.export_report` method:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.export_report('33a886cd-2183-4a61-9ede-241cbbb10ec6', format='parquet', output_dir='./output')
  ```
</CodeGroup>

The format can be `csv`, `parquet`, `json` or `jsonl`. `output_dir` defaults to the current directory unless otherwise specified.

See the [glossary](/references/glossary) to understand what the Probe or Detector modules in the report do.

### Export Hits Only

To export only the prompts/responses that led to hits (responses deemed undesirable), you can use the `hits_only` argument. By default, all prompts and responses will be returned.

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.export_report(evaluation_id='22df0c08-4fcd-4e3d-9285-3a5e66c93f54', format='csv', hits_only=True)
  ```
</CodeGroup>

## Cancel or Delete Evaluations

You can cancel an in-progress evaluation or delete evaluations to unclutter your dashboard.

### Cancel an Evaluation

Cancel an in-progress evaluation with the `evaluations.cancel` method:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.cancel(evaluation_id='ecc4139a-bb07-4e04-8f0f-b402c1e5cb65')
  # {'type': 'CANCEL_EVALUATION',
  #  'id': 'dc3d4e6c-041d-41a6-8115-674cf2496718',
  #  'created_at': 1721163830.4333143,
  #  'created_by': '',
  #  'data': {'evaluation_id': 'ecc4139a-bb07-4e04-8f0f-b402c1e5cb65'},
  #  'metadata': None}
  ```
</CodeGroup>

### Delete an Evaluation

Delete an evaluation with the `evaluations.delete` method:

<CodeGroup>
  ```python title="Python" icon="python" theme={null}
  client.evaluations.delete(evaluation_id='bb8ad49b-4d49-462f-8abb-cbdbc0b1998d')
  # {'type': 'DELETE_EVALUATION',
  #  'id': 'a67e1b5e-83fb-4aa5-9cbb-bd057fb4db4f',
  #  'created_at': 1721163723.7307427,
  #  'created_by': '',
  #  'data': {'evaluation_id': 'bb8ad49b-4d49-462f-8abb-cbdbc0b1998d'},
  #  'metadata': None}
  ```
</CodeGroup>
