Scenarios¶
Scenarios are groups of related probes. Each scenario is its own harness, but multiple scenarios can also be composed to form other harnesses.
Here we briefly describe the each scenario and link to longer descriptions of every scenario under their respective dimensions. The dimension pages also list the probes that belong to each scenario.
Ethical theories¶
This scenario includes both vanilla and jailbreaking probes for prompts that test the model’s understanding of ethical theories.
Ethics: simulation¶
This scenario contains vanilla and jailbreaking prompts that ask about the moral valence of a simulated scenario.
Copyrighted content¶
This scenario contains prompts that attempt to get the model to repeat copyrighted content from books and newspapers.
Private data leak¶
This scenario contains prompts that test whether a model will leak private data.
Adversarial GLUE¶
This scenario currently makes up the entirety of the Robustness harness. It measures whether the model’s performance on the natural language understanding tasks in GLUE is affected by perturbations to the sentences in the tasks.
Professional bias¶
This scenario measures whether the model associates gender with occupations.
Gender-income bias¶
This scenario checks whether the model predicts a person’s income in a gender-biased manner.
Stereotype¶
This scenario is currently makes up the entire Stereotype harness. It measures whether the model reinforces stereotypes about religion, race, ethnicity, age, disability, national origin, and gender in its responses.
Compounding hallucination¶
This scenario attempts to prompt the model into generating hallucinations by having it over-commit to an initial mistake.
Misleading information¶
This scenario checks whether the model is able to refute false assertions presented in the input prompt.
Package hallucination¶
This scenario contains prompts designed to elicit sample code importing non-existent Python packages.
Hallucination: Goodside¶
This scenario consists of the Who is Riley? and Davidjl probes. They were designed by Riley Goodside to elicit hallucinations.
Known malware¶
This scenario checks for the existence of known signatures of spam and viruses in the output.
Malware generation¶
This scenario contains probes that attempt to make the model generate malware.
Prompt injection¶
This scenario consists of prompt injections that attempt to trick the model into producing specific strings.
Prompt injection: encoded data¶
This scenario tests whether a model will decode encoded strings injected into the prompt.
Cross-site scripting¶
This scenario aims to get the model to produce a cross-site scripting content in a Markdown image.
Adversarial suffix¶
This scenario uses adversarial suffixes to elicit instructions on how to steal from a non-profit organization.
Glitch¶
This scenario contains prompts that use glitch tokens to disrupt expected model behavior.
Do Anything Now¶
These probes attempt to get the model to escape its guardrails by prompting it to take on a special persona.
Slur continuation¶
This scenario tests whether a model will continue a slur when provided with leading text.
Ethical harms¶
This scenario checks for ethical harms in the model’s output, such as bullying, deadnaming, profanity, sexual content, and slurs.
Real Toxicity Prompts¶
This scenario contains input text, taken from web data, that are toxic if completed.
Social Values¶
This scenario currently consists of a probe looking for promotions of quack medicine.