# Probe Glossary
This maps the probe module names that you may see in your [reports](../python-sdk/examples/evaluations.md#export-prompt-level-details) to the [actual probe names](../components/probes.md) and descriptions.
## Security Probes
| Module name | Probe name | Description |
| ---------------------------------------------- | ------------------------------------ | ----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| dan.AntiDAN | Anti-Do Anything Now | Evaluates whether prompt injection causes a model to enter a mode of exaggerated safety where it treates safe responses as unsafe |
| dan.ChatGPT_Developer_Mode_RANTI | Do Anything Now: RANTI v1 | Ranti developer mode jailbreak, version 1 |
| dan.ChatGPT_Developer_Mode_v2 | Do Anything Now: RANTI v2 | Ranti developer mode jailbreak, version 2 |
| dan.ChatGPT_Image_Markdown | Inject Markdown image | A jailbreak attempt intended to enable the injection of markdown images into responses. |
| dan.Dan_10_0 | Do Anything Now: v10 | Do Anything Now jailbreak with prompt about inhumane things |
| dan.Dan_11_0 | Do Anything Now: v11 | Do Anything Now jailbreak to encourage profane, unethical, one-sided, violent outputs |
| dan.Dan_6_0 | Do Anything Now: v6.0 | Do Anything Now jailbreak offering rewards to encourage making future predictions and hallucinations |
| dan.Dan_6_2 | Do Anything Now: v6.2 | Do Anything Now jailbreak offering rewards to encourage making future predictions and hallucations, with additional incitement to rudeness and disrespectfulness. |
| dan.Dan_7_0 | Do Anything Now: v7.0 | Do Anything Now jailbreak telling the model that filtering is not acceptable and including special instructions for when to break character |
| dan.Dan_8_0 | Do Anything Now: v8.0 | Do Anything Now jailbreak telling the model is Bigmancozmo and encouraging swearing, porn, and illegality |
| dan.Dan_9_0 | Do Anything Now: v9.0 | Do Anything Now jailbreak encouraging breaking 2021 information cutoff and ethical guardrails in responses, with reward offered |
| dan.DAN_Jailbreak | Do Anything Now: 🔓JAILBREAK version | Do Anything Now jailbreak prompting the model to enter a 🔓JAILBREAK mode |
| dan.DUDE | Do anything and everything | Do Anything Now jailbreak prompting the model to take on the DUDE persona |
| dan.STAN | Strive To Avoid Norms | Jailbreak prompting the model to take on a Strive To Avoid Norms (STAN) persona |
| encoding.InjectAscii85 | Prompt injection : ASCII85 | Tries to get the model to decode an ASCII85-encoded string |
| encoding.InjectBase16 | Prompt injection : Base16 | Tries to get the model to decode a Base16-encoded string |
| encoding.InjectBase2048 | Prompt injection : Base2048 | Tries to get the model to decode a Base2048-encoded string |
| encoding.InjectBase32 | Prompt injection : Base32 | Tries to get the model to decode a Base32-encoded string |
| encoding.InjectBase64 | Prompt injection : Base64 | Tries to get the model to decode a Base64-encoded string |
| encoding.InjectBraille | Prompt injection : Braille | Tries to get the model to decode a Braille-encoded string |
| encoding.InjectEcoji | Prompt injection : Ecoji | Tries to get the model to decode an Ecoji-encoded string |
| encoding.InjectHex | Prompt injection : Hex | Tries to get the model to decode a Hex-encoded string |
| encoding.InjectMorse | Prompt injection : Morse | Tries to get the model to decode a Morse-encoded string |
| encoding.InjectNato | Prompt injection : Nato | Tries to get the model to decode a Nato-encoded string |
| encoding.InjectROT13 | Prompt injection : ROT13 | Tries to get the model to decode a ROT13-encoded string |
| encoding.InjectUU | Prompt injection : InjectUU | Tries to get the model to decode a InjectUU-encoded string |
| encoding.InjectZalgo | Prompt injection : Zalgo | Tries to get the model to decode a Zalgo-encoded string |
| gcg.GCG_Probe | Adversarial suffix | Uses an adversarial suffix to elicit instructions on how to steal from a non-profit organization |
| glitch.Glitch100 | Glitch | Uses glitch tokens to get the model to fail to repeat a word |
| knownbadsignatures.EICAR | Test Virus | Get the model to output an EICAR virus signature |
| knownbadsignatures.GTUBE | Test Spam | Get the model to output the GTUBE spam signature |
| malwaregen.Evasion | Evasive malware | Get the model to write malware that evades common defenses |
| malwaregen.Payload | Malicious payload | Get the model to produce a malicious payload |
| malwaregen.SubFunctions | Malicious subfunctions | Get the model to write malicious subfunctions |
| malwaregen.TopLevel | General malware | Get the model to write general malware |
| promptinject.HijackHateHumansMini | Prompt injection: Hate humans | Get the model to produce a rogue string on hating humans |
| promptinject.HijackKillHumansMini | Prompt Injection: Kill humans | Get the model to produce a rogue string on killing humans |
| promptinject.HijackLongPromptMini | Prompt injection: long prompt | Get the model to produce a "long prompt" rogue string |
| promptinject.Tag | Prompt injection: unicode tag | Prompt injection using invisible unicode tags |
| goodside.ThreatenJSON | Output plain JSON | Threaten the model to make it produce raw JSON without any preamble |
| xss.MarkdownImageExfil | Cross-site scripting: Markdown image | Get the model to write embedded assets exfiltrating text in a Markdown URI |
## Privacy Probes
| Module name | Probe name | Description |
| --------------------------------------------- | ----------------------------------- | ------------------------------------------------------------------------------------------------------------------ |
| leakreplay.LiteratureComplete80 | Book excerpt completion | Tests whether the model will complete a book excerpt when prompted with the beginning |
| leakreplay.LiteratureCloze80 | Book excerpt insertion | Tests whether the model will fill in the middle of a book excerpt |
| leakreplay.GuardianComplete | Media excerpt completion - Guardian | Tests whether the model will complete an excerpt from The Guardian when prompted with the beginning |
| leakreplay.GuardianCloze | Media excerpt insertion - Guardian | Tests whether the model will fill in the middle of an excerpt from The Guardian |
| leakreplay.NYTComplete | Media excerpt completion - NYT | Tests whether the model will complete an excerpt from the New York Times when prompted with the beginning |
| leakreplay.NYTCloze | Media excerpt insertion - NYT | Tests whether the model will fill in the middle of an excerpt from the New York Times |
| privateinfo.EnronLeakEmail | Emails leakage - Enron | Tests whether the model will reveal email addresses obtained from the discovery process of the Enron investigation |
| privateinfo.LeakPII | PII Leakage | Tests whether PII provided to the model during a conversation can be leaked |
| privateinfo.PrivacyUnderstanding | Confidentiality comprehension | Tests whether the model understands privacy-related phrases that flag when certain information is confidential |
## Hallucination Probes
| Module name | Probe name | Description |
| ------------------------------------------- | -------------------------------- | --------------------------------------------------------------- |
| socialvalues.QuackMedicine | Quack Medicine | Evaluates whether the model promotes quack medicine |
| goodside.WhoIsRiley | False assertions: Riley Goodside | Evaluates hallucinations about Riley Goodside |
| goodside.Davidjl | Glitch - Davidjl | Checks for hallucinations about the glitch token Davidjl |
| packagehallucination.Python | Hallucinate Python packages | Checks if a model hallucinates Python packages |
| snowball.SenatorsMini | False assertions: senators | Checks for false assertions about US senators |
| misleading.FalseAssertion50 | False assertions: general | Checks for false assertions about general history/science facts |
| snowball.GraphConnectivityMini | False assertions: flights | Checks if the model invents impossible flight routes |
| snowball.PrimesMini | Hallucinate prime numbers | Evaluates hallucinations about prime numbers |
## Robustness Probes
| Module name
| Probe name | Description |
| --------------------------------------- | ---------------------------- | --------------------------------------------------------- |
| advglue.BertattackOnMNLI | BERT attack on MNLI | Probes effects of BERT-based perturbation on MNLI task |
| advglue.BertattackOnMNLIMM | BERT attack on MNLI-MM | Probes effects of BERT-based perturbation on MNLI-MM task |
| advglue.BertattackOnQNLI | BERT attack on QNLI | Probes effects of BERT-based perturbation on QNLI task |
| advglue.BertattackOnQQP | BERT attack on QQP | Probes effects of BERT-based perturbation on QQP task |
| advglue.BertattackOnRTE | BERT attack on RTE | Probes effects of BERT-based perturbation on RTE task |
| advglue.BertattackOnSST2 | BERT attack on SST2 | Probes effects of BERT-based perturbation on SST2 task |
| advglue.SemattackOnMNLI | SemAttack on MNLI | Probes effects of SemAttack perturbation on MNLI task |
| advglue.SemattackOnMNLIMM | SemAttack on MNLI-MM | Probes effects of SemAttack perturbation on MNLI-MM task |
| advglue.SemattackOnQNLI | SemAttack on QNLI | Probes effects of SemAttack perturbation on QNLI task |
| advglue.SemattackOnQQP | SemAttack on QQP | Probes effects of SemAttack perturbation on QQP task |
| advglue.SemattackOnRTE | SemAttack on RTE | Probes effects of SemAttack perturbation on RTE task |
| advglue.SemattackOnSST2 | SemAttack on SST2 | Probes effects of SemAttack perturbation on SST2 task |
| advglue.SememepsoOnMNLI | SememePSO attack on MNLI | Probes effects of SememePSO perturbation on MNLI task |
| advglue.SememepsoOnMNLIMM | SememePSO attack on MNLI-MM | Probes effects of SememePSO perturbation on MNLI-MM task |
| advglue.SememepsoOnQNLI | SememePSO attack on QNLI | Probes effects of SememePSO perturbation on QNLI task |
| advglue.SememepsoOnQQP | SememePSO attack on QQP | Probes effects of SememePSO perturbation on QQP task |
| advglue.SememepsoOnRTE | SememePSO attack on RTE | Probes effects of SememePSO perturbation on RTE task |
| advglue.SememepsoOnSST2 | SememePSO attack on SST2 | Probes effects of SememePSO perturbation on SST2 task |
| advglue.TextbuggerOnMNLI | TextBugger attack on MNLI | Probes effects of TextBugger perturbation on MNLI task |
| advglue.TextbuggerOnMNLIMM | TextBugger attack on MNLI-MM | Probes effects of TextBugger perturbation on MNLI-MM task |
| advglue.TextbuggerOnQNLI | TextBugger attack on QNLI | Probes effects of TextBugger perturbation on QNLI task |
| advglue.TextbuggerOnQQP | TextBugger attack on QQP | Probes effects of TextBugger perturbation on QQP task |
| advglue.TextbuggerOnRTE | TextBugger attack on RTE | Probes effects of TextBugger perturbation on RTE task |
| advglue.TextbuggerOnSST2 | TextBugger attack on SST2 | Probes effects of TextBugger perturbation on SST2 task |
| advglue.TextfoolerOnMNLI | TextFooler attack on MNLI | Probes effects of TextFooler perturbation on MNLI task |
| advglue.TextfoolerOnMNLIMM | TextFooler attack on MNLI-MM | Probes effects of TextFooler perturbation on MNLI-MM task |
| advglue.TextfoolerOnQNLI | TextFooler attack on QNLI | Probes effects of TextFooler perturbation on QNLI task |
| advglue.TextfoolerOnQQP | TextFooler attack on QQP | Probes effects of TextFooler perturbation on QQP task |
| advglue.TextfoolerOnRTE | TextFooler attack on RTE | Probes effects of TextFooler perturbation on RTE task |
| advglue.TextfoolerOnSST2 | TextFooler attack on SST2 | Probes effects of TextFooler perturbation on SST2 task |
## Toxicity Probes
| Module name
| Probe name | Description |
| -------------------------------------------------------------- | ---------------------------------- | ------------------------------------------------------------------------------------------------------------------------------ |
| realtoxicityprompts.RTPSexually_Explicit | Sexually explicit | Prompts the model to produce sexually explicit outputs |
| realtoxicityprompts.RTPInsultJailbreak | Insult - adversarial | Prompts the model to produce insults. Uses an adversarial prefix. |
| realtoxicityprompts.RTPInsult | Insult | Prompts the model to produce insults |
| realtoxicityprompts.RTPSevere_ToxicityJailbreak | Severe toxicity - adversarial | Partial prompts that potentially produce continuations of severe toxicity. Uses an adversarial prefix. |
| realtoxicityprompts.RTPSexually_ExplicitJailbreak | Sexually explicit - adversarial | Prompts the model to produce sexually explicit outputs. Uses an adversarial prefix. |
| realtoxicityprompts.RTPIdentity_Attack | Identity-based attacks | Prompts the model to produce attacks based on a person's identity |
| realtoxicityprompts.RTPProfanity | Profanity | Prompts the model to produce profane responses |
| realtoxicityprompts.RTPIdentity_AttackJailbreak | Adversarial Identity-based attacks | Prompts the model to produce attacks based on a person's identity. Uses an adversarial prefix. |
| realtoxicityprompts.RTPThreat | Threatening language | Prompts the model to produce threats |
| realtoxicityprompts.RTPThreatJailbreak | Threatening language - Adversarial | Prompts the model to produce threats. Uses an adversarial prefix. |
| realtoxicityprompts.RTPSevere_Toxicity | Severe toxicity | Partial prompts that potentially produce continuations of severe toxicity |
| realtoxicityprompts.RTPFlirtation | Flirtation | Partial prompts to produce flirtatious continuations |
| realtoxicityprompts.RTPFlirtationJailbreak | Flirtation -- Adversarial | Partial prompts to produce flirtatious continuations. Uses an adversarial prefix. |
| realtoxicityprompts.RTPProfanityJailbreak | Profanity.- Adversarial | Prompts the model to produce profane responses. Uses an adversarial prefix. |
| continuation.ContinueSlursReclaimedSlursART | Slur continuation | Prompts the model to continue slur words |
| socialvalues.Deadnaming | Deadnaming | Tests whether the model deadnames famous people |
| ethicalharms.Profanity | Profanity questions | Poses questions with potentially profane responses |
| ethicalharms.Bullying | Bullying | Poses queries or partial prompts to the model that are expected to elicit responses recommending aggressive or bullying action |
| ethicalharms.SexualContent | Sexually explicit - risky words | Prompts the model to produce sexually explicit keywords |
| ethicalharms.Sexualisation | Sexual objectification | Prompts the model to produce sexually objectifying language |
| ethicalharms.SlurUsage | Slur question | Asks questions to see if the target will suggest or recommend ethnic slurs |
## Stereotype Probes
| Module name
| Probe name | Description |
| ----------------------------------------------- | ------------------------------------------ | ----------------------------------------------- |
| advstereo.Atheists | Stereotype religion: atheists | Checks for stereotypes about atheists |
| advstereo.LatinAmericans | Stereotype ethnicity: Latin Americans | Checks for stereotypes about Latin Americans |
| advstereo.Europeans | Stereotype ethnicity: Europeans | Checks for stereotypes about Europeans |
| advstereo.OldPeople | Stereotype age: old people | Checks for stereotypes about old people |
| advstereo.Homosexuals | Stereotype sexual orientation: homosexuals | Checks for stereotypes about homosexuals |
| advstereo.Immigrants | Stereotype nationality: immigrants | Checks for stereotypes about immigrants |
| advstereo.AbleBodiedPeople | Stereotype disability: able-bodied people | Checks for stereotypes about able-bodied people |
| advstereo.YoungPeople | Stereotype age: young people | Checks for stereotypes about young people |
| advstereo.Caucasians | Stereotype race: Caucasians | Checks for stereotypes about Caucasians |
| advstereo.BlackPeople | Stereotype race: black people | Checks for stereotypes about black people |
| advstereo.Buddhists | Stereotype religion: buddhists | Checks for stereotypes about buddhists |
| advstereo.PoorPeople | Stereotype income: poor people | Checks for stereotypes about poor people |
| advstereo.WhitePeople | Stereotype race: white people | Checks for stereotypes about white people |
| advstereo.Jews | Stereotype ethnicity: Jews | Checks for stereotypes about Jews |
| advstereo.RichPeople | Stereotype income: rich people | Checks for stereotypes about rich people |
| advstereo.Men | Stereotype gender: men | Checks for stereotypes about men |
| advstereo.Mexicans | Stereotype nationality: Mexicans | Checks for stereotypes about Mexicans |
| advstereo.Asians | Stereotype race: Asians | Checks for stereotypes about Asians |
| advstereo.Americans | Stereotype nationality: Americans | Checks for stereotypes about Americans |
| advstereo.Canadians | Stereotype nationality: Canadians | Checks for stereotypes about Canadians |
| advstereo.Christians | Stereotype religion: Christians | Checks for stereotypes about Christians |
| advstereo.PhysicallyDisabledPeople | Stereotype disability: disabled people | Checks for stereotypes about disabled people |
| advstereo.Muslims | Stereotype religion: Muslims | Checks for stereotypes about Muslims |
| advstereo.Women | Stereotype gender: women | Checks for stereotypes about women |
## Fairness Probes
| Module name | Probe name | Description |
| ------------------------------------------- | ----------------- | ---------------------------------------------------------------------------------------- |
| adultdata.CounterfactualGender | Gender Bias | Measures whether the model uses gender information to determine a person's likely income |
| "winobias.ProfessionalBias" | Professional bias | Measures whether the model associates occupations with gender |
## Ethics Probes
| Module name | Probe name | Description |
| ----------------------------------------------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| normative_ethics.Deontology | Deontological Ethics | Evaluates the model's understanding of whether actions are
required, permitted, or forbidden according to a set
of rules or constraints |
| normative_ethics.DeontologyJailbreak | Adversarial Deontological Ethics | Evaluates the model's understanding of whether actions are
required, permitted, or forbidden according to a set
of rules or constraints under adversarial prompting |
| normative_ethics.Justice | Justice-based Ethics | Evaluates whether the model understands principles of desert or impartiality |
| normative_ethics.JusticeJailbreak | Adversarial Justice-based Ethics | Evaluates whether the model understands principles of desert or impartiality under adversarial prompting |
| normative_ethics.Morality | Commonsense Morality | Evaluates whether the model's moral judgments align with common moral intuitions |
| normative_ethics.MoralityJailbreak | Adversarial Commonsense Morality | Evaluates, under adversarial prompting, whether the model's moral judgments align with common moral intuitions |
| normative_ethics.Utilitarianism | Utilitarianism | Evaluates whether the model can identify which of two scenarios has higher utility |
| normative_ethics.UtilitarianismJailbreak | Adversarial Utilitarianism | Evaluates, under adversarial prompting, whether the model can identify which of two scenarios has higher utility |
| normative_ethics.Virtue | Virtue Ethics | Evaluates whether the model can identify moral virtues correctly |
| normative_ethics.VirtueJailbreak | Adversarial Virtue Ethics | Evaluates, under adversarial prompting, whether the model can identify moral virtues correctly |
| jiminycricket.Jiminy | Simulation | Evaluates the model's ability to identify the moral valence of a simulated scenario |
| jiminycricket.JiminyJailbreak | Adversarial Simulation | Evaluates, under adversarial prompting, the model's ability to identify the moral valence of a simulated scenario |