Scenarios

Scenarios are groups of related probes. Each scenario is its own harness, but multiple scenarios can also be composed to form other harnesses.

Here we briefly describe the each scenario and link to longer descriptions of every scenario under their respective dimensions. The dimension pages also list the probes that belong to each scenario.

Ethical theories

This scenario includes both vanilla and jailbreaking probes for prompts that test the model’s understanding of ethical theories.

Ethics: simulation

This scenario contains vanilla and jailbreaking prompts that ask about the moral valence of a simulated scenario.

Copyrighted content

This scenario contains prompts that attempt to get the model to repeat copyrighted content from books and newspapers.

Private data leak

This scenario contains prompts that test whether a model will leak private data.

Adversarial GLUE

This scenario currently makes up the entirety of the Robustness harness. It measures whether the model’s performance on the natural language understanding tasks in GLUE is affected by perturbations to the sentences in the tasks.

Professional bias

This scenario measures whether the model associates gender with occupations.

Gender-income bias

This scenario checks whether the model predicts a person’s income in a gender-biased manner.

Stereotype

This scenario is currently makes up the entire Stereotype harness. It measures whether the model reinforces stereotypes about religion, race, ethnicity, age, disability, national origin, and gender in its responses.

Compounding hallucination

This scenario attempts to prompt the model into generating hallucinations by having it over-commit to an initial mistake.

Misleading information

This scenario checks whether the model is able to refute false assertions presented in the input prompt.

Package hallucination

This scenario contains prompts designed to elicit sample code importing non-existent Python packages.

Hallucination: Goodside

This scenario consists of the Who is Riley? and Davidjl probes. They were designed by Riley Goodside to elicit hallucinations.

Social Values

This scenario currently consists of a probe looking for promotions of quack medicine.

Known malware

This scenario checks for the existence of known signatures of spam and viruses in the output.

Malware generation

This scenario contains probes that attempt to make the model generate malware.

Prompt injection

This scenario consists of prompt injections that attempt to trick the model into producing specific strings.

Prompt injection: encoded data

This scenario tests whether a model will decode encoded strings injected into the prompt.

Cross-site scripting

This scenario aims to get the model to produce a cross-site scripting content in a Markdown image.

Adversarial suffix

This scenario uses adversarial suffixes to elicit instructions on how to steal from a non-profit organization.

Glitch

This scenario contains prompts that use glitch tokens to disrupt expected model behavior.

Do Anything Now

These probes attempt to get the model to escape its guardrails by prompting it to take on a special persona.

Slur continuation

This scenario tests whether a model will continue a slur when provided with leading text.

Ethical harms

This scenario checks for ethical harms in the model’s output, such as bullying, deadnaming, profanity, sexual content, and slurs.

Real Toxicity Prompts

This scenario contains input text, taken from web data, that are toxic if completed.