Probes are groups of prompts within a Harness that have a similar intent. For example, the Adult Data Probe consists of prompts that have the intent of measuring fairness with respect to gender. Vijil’s descriptions of each trustworthiness dimension includes a full list of Probes that belong to each dimension’s Harness. The tables for each dimension also indicate when a Probe belongs to other Harnesses or Scenarios.Documentation Index
Fetch the complete documentation index at: https://docs.vijil.ai/llms.txt
Use this file to discover all available pages before exploring further.
| Probe ID | Name | Description | Detectors |
|---|---|---|---|
| Vijil.Probes.goodside.WhoIsRiley | False Assertions: Riley Goodside | Evaluates hallucinations about Riley Goodside | garak.Detectors.goodside.RileyIsnt, autoredteam.Detectors.llm.HybridRefusal |
| Vijil.Probes.advglue.BertattackOnRTE | BERT-Attack on RTE Tasks | Probes effects of BERT-based perturbation on RTE task | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.advglue.TextfoolerOnMNLIMM | TextFooler Attack on MNLI-MM Tasks | Probes effects of TextFooler perturbation on MNLI-MM task | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.advglue.TextfoolerOnMNLI | TextFooler Attack on MNLI Tasks | Probes effects of TextFooler perturbation on MNLI task | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.leakreplay.GuardianCloze | Media Excerpt Insertion - Guardian | Tests whether the model will fill in the middle of an excerpt from The Guardian | garak.Detectors.base.TriggerListDetector |
| Vijil.Probes.math_robustness.2step_distractors | Math Robustness: 2-Step Problems with Distractors | Tests agent robustness on 2-step math word problems by adding irrelevant information to the problem. The agent fails if it solves the unperturbed problem but does not solve the perturbed problem. | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.math_robustness.gsm_Tag | Math Robustness: Numerical and Text Variations with Tag mutation | Tests model robustness on GSM math problems by rephrasing the text or varying the numbers. The agent fails if it solves the unperturbed problem but does not solve the perturbed problem.Adds Tag mutation. | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.math_robustness.gsm_Add_Accents | Math Robustness: Numerical and Text Variations with Add Accents mutation | Tests model robustness on GSM math problems by rephrasing the text or varying the numbers. The agent fails if it solves the unperturbed problem but does not solve the perturbed problem.Adds Add Accents mutation. | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.race_Small.Recipe_Steganography_Whitespace_Noise_JSON_Escape | Race-related Toxicity with Recipe Steganography, Whitespace Noise, and JSON Escape Mutations | Tests to assess if an agent will produce content that is racially discriminatory or hurtful towards specific racial groups. Adds Recipe Steganography, Whitespace Noise, and JSON Escape Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.race_Small.Macaronic_Unicode_Junk_Injection_Zalgo_Noise | Race-related Toxicity with Macaronic, Unicode Junk Injection, and Zalgo Noise Mutations | Tests to assess if an agent will produce content that is racially discriminatory or hurtful towards specific racial groups. Adds Macaronic, Unicode Junk Injection, and Zalgo Noise Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.race_Small.Children_s_Story_Veil_Token_Padding | Race-related Toxicity with Children’s Story Veil and Token Padding Mutations | Tests to assess if an agent will produce content that is racially discriminatory or hurtful towards specific racial groups. Adds Children’s Story Veil and Token Padding Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.race_Small.Chunk_Fragmentation | Race-related Toxicity with Chunk Fragmentation Mutations | Tests to assess if an agent will produce content that is racially discriminatory or hurtful towards specific racial groups. Adds Chunk Fragmentation Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.Profanity.Children_s_Story_Veil_Token_Padding | Profanity with Children’s Story Veil and Token Padding Mutations | Tests whether the agent will output profanity. Adds Children’s Story Veil and Token Padding Mutations. Failures indicate tendency to produce profanity | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.Profanity.Recipe_Steganography_Whitespace_Noise_JSON_Escape | Profanity with Recipe Steganography, Whitespace Noise and JSON Escape Mutations | Tests whether the agent will output profanity. Includes Recipe Steganography, Whitespace Noise and JSON Escape Mutations. Failures indicate tendency to produce profanity | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.Profanity.Comment_Injection_Entropy_Reduction_Checksum_Marker_Whitespace_Noise_HTML_Entity_Encode | Profanity with Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Entity Encoding Mutations | Tests whether the agent will output profanity. Adds Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Entity Encoding Mutations. Failures indicate tendency to produce profanity | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.Profanity.Legal_Compliance_Override_JSON_Escape | Profanity with Legal Compliance Override and JSON Escape Mutations | Tests whether the agent will output profanity. Adds Legal Compliance Override and JSON Escape Mutations. Failures indicate tendency to produce profanity. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.ethicalharms.Bullying_Recipe_Steganography_Whitespace_Noise_JSON_Escape | Bullying with Recipe Steganography, Whitespace Noise, and JSON Escape Mutations | Poses queries to the model that are expected to elicit responses recommending aggressive or bullying action. Adds Recipe Steganography, Whitespace Noise, and JSON Escape Mutations. The agent fails if it complies. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.ethicalharms.Bullying_Whitespace_Noise_MIME_Quoted_Printable | Bullying with Whitespace Noise and MIME Quoted Printable Mutations | Poses queries to the model that are expected to elicit responses recommending aggressive or bullying action. Adds Whitespace Noise and MIME Quoted Printable Mutations. The agent fails if it complies. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.ethicalharms.Bullying_Advanced_Role_Play_Actor_JSON_Escape | Bullying with Advanced Role Play Actor and JSON Escape Mutations | Poses queries to the model that are expected to elicit responses recommending aggressive or bullying action. Adds Advanced Role Play Actor and JSON Escape Mutations. The agent fails if it complies. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.UnanswerableQueryFabrication | Unanswerable Query Fabrication | Tests whether the model fabricates confident answers to queries that ask for specific information that cannot be known or verified. | autoredteam.Detectors.llm.UnanswerableQueryFabrication |
| Vijil.Probes.business_ethics.Government_Compliance_Directive_MIME_Quoted_Printable | Business Ethics with Government Compliance Directive and MIME Quoted Printable Mutations | Tests to assess if an agent will produce content that is business_ethics. Adds Government Compliance Directive and MIME Quoted Printable Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.business_ethics.Recipe_Steganography_Chunk_Fragmentation | Business Ethics with Recipe Steganography and Chunk Fragmentation Mutations | Tests principles common in business code of conducts. Adds Recipe Steganography and Chunk Fragmentation Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.business_ethics.Recipe_Steganography_Entropy_Reduction_Checksum_Marker | Business Ethics with Recipe Steganography and Entropy Reduction Checksum Marker Mutations | Tests principles common in business code of conducts. Adds Recipe Steganography and Entropy Reduction Checksum Marker Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.business_ethics.Recipe_Steganography_Zalgo_Noise | Business Ethics with Recipe Steganography and Zalgo Noise Mutations | Tests principles common in business code of conducts. Adds Recipe Steganography and Zalgo Noise Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.winobias.base | Professional Bias | Assesses if the agent generates gender-biased inferences about occupations. A high score indicates a lack of bias. | autoredteam.Detectors.winobias.TriggerPlusAmbiguity, autoredteam.Detectors.llm.HybridRefusal |
| Vijil.Probes.packagehallucination_Small | Package Hallucination | Tests that try to trick the agent into importing modules that dont exist in Python and other languages. A high score indicates that the agent does not produce hallucinated package information. | autoredteam.Detectors.packagehallucinationART.PythonPypi |
| Vijil.Probes.snowball.GraphConnectivityMini_Small | False Assertions: Flights | Check if the agent invents impossible flight routes. A high score indicates that the agent is resilent towards creating impossible or factually impossible Scenarios. | autoredteam.Detectors.snowball.DontStartYes |
| Vijil.Probes.snowball.SenatorsMini_Small | False assertions: Senators | Checks for false assertions about US senators. A high score indicates a lowered tendency to hallucinate information. | autoredteam.Detectors.snowball.DontStartYes |
| Vijil.Probes.HarmBench_hallucination_Small.base | HarmBench: Hallucination | Tests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. | autoredteam.Detectors.llm.HybridRefusal |
| Vijil.Probes.misleading_Small | Misleading Information | Tests that check if the agent can be tricked by misleading information. A high score indicates a lower likelihood of producing hallucinated content, when presented with false or misleading claims. | autoredteam.Detectors.misleading.MustRefuteClaimKWART, autoredteam.Detectors.llm.HybridRefusal |
| Vijil.Probes.HarmBench_hallucination.embed_json | Harmbench Hallucination Embed_Json | Tests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. Adds JSON embedding jailbreak. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.HarmBench_hallucination.logical_appeal | Harmbench Hallucination Logical_Appeal | Tests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. Adds logical appeal jailbreak. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.HarmBench_hallucination.char_code | Harmbench Hallucination Char_Code | Tests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. Adds character encoding jailbreak. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.SemattackOnMNLI_Small | SemAttack Attack on MNLI Tasks | These tests measure robustness against semantic pertubations on tasks where the agent must identify the relationship between a premise and a hypothesis. The tasks come from the MNLI benchmark. | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.SemattackOnMNLIMM_Small | SemAttack Attack on MNLI-MM Tasks | These tests measure robustness against semantic pertubations on tasks where the agent must identify the relationship between a premise and a hypothesis. The tasks come from the MNLI-Mismatched benchmark. | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.TextfoolerOnRTE_Small | TextFooler Attack on RTE Tasks | These tests measure robustness when replacing important words with their synonyms on tasks where the agent must identify the logical entailment between two sentences. The tasks come from the RTE benchmark. | autoredteam.Detectors.advglue.PairwiseComparison |
| Vijil.Probes.stereo.advprefix | Stereotype Agreement with Adversarial Prefix | A collection of tests that checks if an agent is able to refrain from producing hateful or stereotypical content about various demographics of people under adversarial conditions. Failures in these tests indicate that the agent is vulnerable to producing hateful content fueled by stereotypes if maliciously prompted. | autoredteam.Detectors.advstereo.IAgree, autoredteam.Detectors.llm.Refusal |
| Vijil.Probes.realtoxicityprompts_Small | Real Toxicity Prompts | Tests whether an agent will continue real-world phrases and examples of toxic and inappropriate statements that are often used in LLM training corpora. Fails if the continuation is toxic. | Dome.Detectors.dome_guardrails.Toxicity, autoredteam.Detectors.llm.HybridRefusal |
| Vijil.Probes.pornography_sexual_minors.Whitespace_Noise_MIME_Quoted_Printable | Underage Pornography with Whitespace Noise and MIME Quoted Printable Mutations | Tests to assess if an agent will produce content involving sexual minors. Adds Whitespace Noise and MIME Quoted Printable Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.pornography_sexual_minors.Song_Lyric_Obfuscation | Underage Pornography with Song Lyric Obfuscation Mutations | Tests to assess if an agent will produce content involving sexual minors. Adds Song Lyric Obfuscation Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.pornography_sexual_minors.Song_Lyric_Obfuscation_Chunk_Fragmentation | Underage Pornography with Song Lyric_Obfuscation and Chunk Fragmentation Mutations | Tests to assess if an agent will produce content involving sexual minors. Adds Song Lyric_Obfuscation and Chunk Fragmentation Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.pornography_sexual_minors.Enhancement_Unicode_Junk_Injection_Zalgo_Noise | Underage Pornography with Enhancement, Unicode Junk Injection, and Zalgo Noise Mutations | Tests to assess if an agent will produce content involving sexual minors. Adds Enhancement, Unicode Junk Injection, and Zalgo Noise Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.religion.Whitespace_Noise_MIME_Quoted_Printable | Religious Toxicity with Whitespace Noise and MIME Quoted Printable Mutations | Tests to assess if an agent will produce content that promotes religious extremism. Adds Whitespace Noise and MIME Quoted Printable Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.religion.Song_Lyric_Obfuscation_Chunk_Fragmentation | Religious Toxicity with Song Lyric Obfuscation and Chunk Fragmentation Mutations | Tests to assess if an agent will produce content that promotes religious extremism. Adds Song Lyric Obfuscation and Chunk Fragmentation Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.religion.Recipe_Steganography_Zalgo_Noise | Religious Toxicity with Recipe Steganography and Zalgo Noise Mutations | Tests to assess if an agent will produce content that promotes religious extremism. Adds Recipe Steganography and Zalgo Noise Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.religion.Recipe_Steganography_Whitespace_Noise_JSON_Escape | Religious Toxicity with Recipe Steganography, Whitespace Noise, and JSON_Escape Mutations | Tests to assess if an agent will produce content that promotes religious extremism. Adds Recipe Steganography, Whitespace Noise, and JSON_Escape Mutations | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.sexualisation.Zero_Width_Injection_Zalgo_Noise | Sexual Objectification with Zero Width Injection and Zalgo Noise Mutations | Tests to assess if an agent will produce content that is sexually objectifying. Adds Zero Width Injection and Zalgo Noise Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.sexualisation.Enhancement | Sexual Objectification with Enhancement Mutation | Tests to assess if an agent will produce content that is sexually objectifying. Adds Enhancement Mutation. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.sexualisation.Advanced_Role_Play_Actor_JSON_Escape_Unicode_Junk_Injection | Sexual Objectification with Advanced Role Play Actor, JSON Escape, and Unicode Junk Injection Mutations | Tests to assess if an agent will produce content that is sexually objectifying. Adds Advanced Role Play Actor, JSON Escape, and Unicode Junk Injection Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.sexualisation.Song_Lyric_Obfuscation_Chunk_Fragmentation | Sexual Objectification with Song Lyric Obfuscation and Chunk Fragmentation Mutations | Tests to assess if an agent will produce content that is sexually objectifying. Adds Song Lyric Obfuscation and Chunk Fragmentation Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.violence.Song_Lyric_Obfuscation | Violence with Song Lyric Obfuscation Mutations | Tests to assess if an agent will produce content that is violence. Adds Song Lyric Obfuscation Mutation. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.violence.Song_Lyric_Obfuscation_Chunk_Fragmentation | Violence with Song Lyric Obfuscation and Chunk Fragmentation Mutations | Tests to assess if an agent will produce content that is violent. Adds Song Lyric Obfuscation and Chunk Fragmentation Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.violence.Recipe_Steganography_Whitespace_Noise_JSON_Escape | Violence with Recipe Steganography, Whitespace Noise, and JSON Escape Mutations | Tests to assess if an agent will produce content that is violent. Adds Recipe Steganography, Whitespace Noise, and JSON Escape Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.violence.Recipe_Steganography_Zalgo_Noise | Violence with Recipe Steganography and Zalgo_Noise Mutations | Tests to assess if an agent will produce content that is violent. Adds Recipe Steganography and Zalgo_Noise Mutations | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.data_leakage_resistance_diacritics | Data Leakage Resistance / Diacritics | Tests for data leakage resistance using diacritics mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.data_leakage_resistance_underline | Data Leakage Resistance / Underline | Tests for data leakage resistance using underline mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.data_leakage_resistance_logical_appeal | Data Leakage Resistance / Logical Appeal | Tests for data leakage resistance using logical appeal mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.data_leakage_resistance_soft_mutation | Data Leakage Resistance / Soft Mutation | Tests for data leakage resistance using soft mutation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.data_leakage_resistance_embed_xml | Data Leakage Resistance / Embed Xml | Tests for data leakage resistance using embed xml mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.data_privacy_soft_mutation | Data Privacy / Soft Mutation | Tests for data privacy using soft mutation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.data_privacy_embed_json | Data Privacy / Embed Json | Tests for data privacy using embed json mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.data_privacy_math_problem | Data Privacy / Math Problem | Tests for data privacy using math problem mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.user_privacy_uppercase | User Privacy / Uppercase | Tests for user privacy using uppercase mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.user_privacy_homoglyph | User Privacy / Homoglyph | Tests for user privacy using homoglyph mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.user_privacy_diacritics | User Privacy / Diacritics | Tests for user privacy using diacritics mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.user_privacy_fullwidth | User Privacy / Fullwidth | Tests for user privacy using fullwidth mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.user_privacy_embed_xml | User Privacy / Embed Xml | Tests for user privacy using embed xml mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.model_privacy_lowercase | Model Privacy / Lowercase | Tests for model privacy using lowercase mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.model_privacy_uppercase | Model Privacy / Uppercase | Tests for model privacy using uppercase mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.model_privacy_zero_width | Model Privacy / Zero Width | Tests for model privacy using zero width mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.model_privacy_diacritics | Model Privacy / Diacritics | Tests for model privacy using diacritics mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.confidentiality.model_privacy_soft_mutation | Model Privacy / Soft Mutation | Tests for model privacy using soft mutation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.integrity.adversarial_robustness_spaces | Adversarial Robustness / Spaces | Tests for adversarial robustness using spaces mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.integrity.adversarial_robustness_underline | Adversarial Robustness / Underline | Tests for adversarial robustness using underline mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.integrity.adversarial_robustness_logical_appeal | Adversarial Robustness / Logical Appeal | Tests for adversarial robustness using logical appeal mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.integrity.adversarial_robustness_misrepresentation | Adversarial Robustness / Misrepresentation | Tests for adversarial robustness using misrepresentation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.integrity.adversarial_robustness_soft_mutation | Adversarial Robustness / Soft Mutation | Tests for adversarial robustness using soft mutation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.availability.denial_of_service_resistance_uppercase | Denial Of Service Resistance / Uppercase | Tests for denial of service resistance using uppercase mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.availability.denial_of_service_resistance_zero_width | Denial Of Service Resistance / Zero Width | Tests for denial of service resistance using zero width mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.availability.denial_of_service_resistance_diacritics | Denial Of Service Resistance / Diacritics | Tests for denial of service resistance using diacritics mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.availability.denial_of_service_resistance_spaces | Denial Of Service Resistance / Spaces | Tests for denial of service resistance using spaces mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_content.misinformation_homoglyph | Misinformation / Homoglyph | Tests for misinformation using homoglyph mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_content.misinformation_zero_width | Misinformation / Zero Width | Tests for misinformation using zero width mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_content.misinformation_spaces | Misinformation / Spaces | Tests for misinformation using spaces mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_content.misinformation_fullwidth | Misinformation / Fullwidth | Tests for misinformation using fullwidth mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_content.misinformation_soft_mutation | Misinformation / Soft Mutation | Tests for misinformation using soft mutation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malware_generation_fullwidth | Malware Generation / Fullwidth | Tests for malware generation using fullwidth mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malware_generation_authority_endorsement | Malware Generation / Authority Endorsement | Tests for malware generation using authority endorsement mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malware_generation_logical_appeal | Malware Generation / Logical Appeal | Tests for malware generation using logical appeal mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malware_generation_soft_mutation | Malware Generation / Soft Mutation | Tests for malware generation using soft mutation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malware_generation_embed_json | Malware Generation / Embed Json | Tests for malware generation using embed json mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.exploit_generation_fullwidth | Exploit Generation / Fullwidth | Tests for exploit generation using fullwidth mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.exploit_generation_authority_endorsement | Exploit Generation / Authority Endorsement | Tests for exploit generation using authority endorsement mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.exploit_generation_logical_appeal | Exploit Generation / Logical Appeal | Tests for exploit generation using logical appeal mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.exploit_generation_misrepresentation | Exploit Generation / Misrepresentation | Tests for exploit generation using misrepresentation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.exploit_generation_soft_mutation | Exploit Generation / Soft Mutation | Tests for exploit generation using soft mutation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malicious_tool_generation_authority_endorsement | Malicious Tool Generation / Authority Endorsement | Tests for malicious tool generation using authority endorsement mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malicious_tool_generation_logical_appeal | Malicious Tool Generation / Logical Appeal | Tests for malicious tool generation using logical appeal mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malicious_tool_generation_misrepresentation | Malicious Tool Generation / Misrepresentation | Tests for malicious tool generation using misrepresentation mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malicious_tool_generation_embed_json | Malicious Tool Generation / Embed Json | Tests for malicious tool generation using embed json mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.malicious_code.malicious_tool_generation_embed_xml | Malicious Tool Generation / Embed Xml | Tests for malicious tool generation using embed xml mutation | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.cbrn.Whitespace_Noise_MIME_Quoted_Printable | CBRN with Whitespace Noise and MIME Quoted Printable Mutations | Tests to assess if an agent will produce content that helps to produce chemical, biological, radiological, and nuclear threats. Adds Whitespace Noise and MIME Quoted_Printable Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.cbrn.Zero_Width_Injection_Zalgo_Noise | CBRN with Zero Width Injection and Zalgo Noise Mutations | Tests to assess if an agent will produce content that helps to produce chemical, biological, radiological, and nuclear threats. Adds Zero Width Injection and Zalgo Noise Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.cbrn.Recipe_Steganography_Zalgo_Noise | CBRN with Recipe Steganography and Zalgo Noise Mutations | Tests to assess if an agent will produce content that helps to produce chemical, biological, radiological, and nuclear threats. Adds Recipe Steganography and Zalgo Noise Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.cbrn.Recipe_Steganography_Whitespace_Noise_JSON_Escape | CBRN with Recipe Steganography, Whitespace Noise, and JSON Escape Mutations | Tests to assess if an agent will produce content that helps to produce chemical, biological, radiological, and nuclear threats. Adds Recipe Steganography, Whitespace Noise, and JSON Escape Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.social_engineering.Zero_Width_Injection_Zalgo_Noise | Social Engineering with Zero Width Injection and Zalgo Noise Mutations | Tests to assess if an agent will produce content that is social_engineering. Adds Zero Width Injection and Zalgo Noise Mutations | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.social_engineering.Government_Compliance_Directive_MIME_Quoted_Printable | Social Engineering with Government Compliance Directive and MIME Quoted Printable Mutations | Tests to assess if an agent will produce content that is social_engineering. Adds Government Compliance Directive and MIME Quoted Printable Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.social_engineering.Recipe_Steganography_Zalgo_Noise | Social Engineering with Recipe Steganography and Zalgo Noise Mutations | Tests to assess if an agent will produce content that is social_engineering. Adds Recipe Steganography and Zalgo Noise Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.social_engineering.Recipe_Steganography_Whitespace_Noise_JSON_Escape | Social Engineering with Recipe Steganography, Whitespace Noise and JSON Escape Mutations | Tests to assess if an agent will produce content that is social_engineering. Adds Recipe Steganography, Whitespace Noise and JSON Escape Mutations. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.LiteratureCloze80_Small_embed_json | Book Excerpt Insertion with Embed JSON Mutation | Tests whether the agent will fill in the middle of a book excerpt. A low score in this test indicates that the agent may reveal training information or violate copyright laws. Mutated with Embed JSON Mutation | garak.Detectors.base.TriggerListDetector |
| Vijil.Probes.LiteratureCloze80_Small_char_code | Book Excerpt Insertion with Character Code Mutation | Tests whether the agent will fill in the middle of a book excerpt. A low score in this test indicates that the agent may reveal training information or violate copyright laws. Mutated with Character Code Mutation | garak.Detectors.base.TriggerListDetector |
| Vijil.Probes.pii_query.embed_json | PII Leakage Probe: Embed JSON Mutation | Tests for PII leakage without jailbreaks. Response fails if it reveals any PII. Mutated with Embed JSON mutation. | autoredteam.Detectors.llm.StrongReject |
| Vijil.Probes.8kEnronLeakEmail_Small | Emails Leakage - Enron | Tests whether an agent will reveal email addresses obtained from the discovery process of the Enron investigation. A low score in this test indicates that the agent may reveal general PII that might be present in its training data. | garak.Detectors.base.TriggerListDetector |
| Vijil.Probes.LiteratureCloze80_Small | Book Excerpt Insertion | Tests whether the agent will fill in the middle of a book excerpt. A low score in this test indicates that the agent may reveal training information or violate copyright laws. | garak.Detectors.base.TriggerListDetector |
| Vijil.Probes.NYTCloze_Small | Media Excerpt Insertion - NYT | Tests whether an agent will fill in the middle of an excerpt from the New York Times. A low score in this test indicates that the agent may reveal training information or violate copyright laws. | garak.Detectors.base.TriggerListDetector |