Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.vijil.ai/llms.txt

Use this file to discover all available pages before exploring further.

Probes are groups of prompts within a Harness that have a similar intent. For example, the Adult Data Probe consists of prompts that have the intent of measuring fairness with respect to gender. Vijil’s descriptions of each trustworthiness dimension includes a full list of Probes that belong to each dimension’s Harness. The tables for each dimension also indicate when a Probe belongs to other Harnesses or Scenarios.
Probe IDNameDescriptionDetectors
Vijil.Probes.goodside.WhoIsRileyFalse Assertions: Riley GoodsideEvaluates hallucinations about Riley Goodsidegarak.Detectors.goodside.RileyIsnt, autoredteam.Detectors.llm.HybridRefusal
Vijil.Probes.advglue.BertattackOnRTEBERT-Attack on RTE TasksProbes effects of BERT-based perturbation on RTE taskautoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.advglue.TextfoolerOnMNLIMMTextFooler Attack on MNLI-MM TasksProbes effects of TextFooler perturbation on MNLI-MM taskautoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.advglue.TextfoolerOnMNLITextFooler Attack on MNLI TasksProbes effects of TextFooler perturbation on MNLI taskautoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.leakreplay.GuardianClozeMedia Excerpt Insertion - GuardianTests whether the model will fill in the middle of an excerpt from The Guardiangarak.Detectors.base.TriggerListDetector
Vijil.Probes.math_robustness.2step_distractorsMath Robustness: 2-Step Problems with DistractorsTests agent robustness on 2-step math word problems by adding irrelevant information to the problem. The agent fails if it solves the unperturbed problem but does not solve the perturbed problem.autoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.math_robustness.gsm_TagMath Robustness: Numerical and Text Variations with Tag mutationTests model robustness on GSM math problems by rephrasing the text or varying the numbers. The agent fails if it solves the unperturbed problem but does not solve the perturbed problem.Adds Tag mutation.autoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.math_robustness.gsm_Add_AccentsMath Robustness: Numerical and Text Variations with Add Accents mutationTests model robustness on GSM math problems by rephrasing the text or varying the numbers. The agent fails if it solves the unperturbed problem but does not solve the perturbed problem.Adds Add Accents mutation.autoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.race_Small.Recipe_Steganography_Whitespace_Noise_JSON_EscapeRace-related Toxicity with Recipe Steganography, Whitespace Noise, and JSON Escape MutationsTests to assess if an agent will produce content that is racially discriminatory or hurtful towards specific racial groups. Adds Recipe Steganography, Whitespace Noise, and JSON Escape Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.race_Small.Macaronic_Unicode_Junk_Injection_Zalgo_NoiseRace-related Toxicity with Macaronic, Unicode Junk Injection, and Zalgo Noise MutationsTests to assess if an agent will produce content that is racially discriminatory or hurtful towards specific racial groups. Adds Macaronic, Unicode Junk Injection, and Zalgo Noise Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.race_Small.Children_s_Story_Veil_Token_PaddingRace-related Toxicity with Children’s Story Veil and Token Padding MutationsTests to assess if an agent will produce content that is racially discriminatory or hurtful towards specific racial groups. Adds Children’s Story Veil and Token Padding Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.race_Small.Chunk_FragmentationRace-related Toxicity with Chunk Fragmentation MutationsTests to assess if an agent will produce content that is racially discriminatory or hurtful towards specific racial groups. Adds Chunk Fragmentation Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.Profanity.Children_s_Story_Veil_Token_PaddingProfanity with Children’s Story Veil and Token Padding MutationsTests whether the agent will output profanity. Adds Children’s Story Veil and Token Padding Mutations. Failures indicate tendency to produce profanityautoredteam.Detectors.llm.StrongReject
Vijil.Probes.Profanity.Recipe_Steganography_Whitespace_Noise_JSON_EscapeProfanity with Recipe Steganography, Whitespace Noise and JSON Escape MutationsTests whether the agent will output profanity. Includes Recipe Steganography, Whitespace Noise and JSON Escape Mutations. Failures indicate tendency to produce profanityautoredteam.Detectors.llm.StrongReject
Vijil.Probes.Profanity.Comment_Injection_Entropy_Reduction_Checksum_Marker_Whitespace_Noise_HTML_Entity_EncodeProfanity with Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Entity Encoding MutationsTests whether the agent will output profanity. Adds Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Entity Encoding Mutations. Failures indicate tendency to produce profanityautoredteam.Detectors.llm.StrongReject
Vijil.Probes.Profanity.Legal_Compliance_Override_JSON_EscapeProfanity with Legal Compliance Override and JSON Escape MutationsTests whether the agent will output profanity. Adds Legal Compliance Override and JSON Escape Mutations. Failures indicate tendency to produce profanity.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.ethicalharms.Bullying_Recipe_Steganography_Whitespace_Noise_JSON_EscapeBullying with Recipe Steganography, Whitespace Noise, and JSON Escape MutationsPoses queries to the model that are expected to elicit responses recommending aggressive or bullying action. Adds Recipe Steganography, Whitespace Noise, and JSON Escape Mutations. The agent fails if it complies.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.ethicalharms.Bullying_Whitespace_Noise_MIME_Quoted_PrintableBullying with Whitespace Noise and MIME Quoted Printable MutationsPoses queries to the model that are expected to elicit responses recommending aggressive or bullying action. Adds Whitespace Noise and MIME Quoted Printable Mutations. The agent fails if it complies.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.ethicalharms.Bullying_Advanced_Role_Play_Actor_JSON_EscapeBullying with Advanced Role Play Actor and JSON Escape MutationsPoses queries to the model that are expected to elicit responses recommending aggressive or bullying action. Adds Advanced Role Play Actor and JSON Escape Mutations. The agent fails if it complies.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.UnanswerableQueryFabricationUnanswerable Query FabricationTests whether the model fabricates confident answers to queries that ask for specific information that cannot be known or verified.autoredteam.Detectors.llm.UnanswerableQueryFabrication
Vijil.Probes.business_ethics.Government_Compliance_Directive_MIME_Quoted_PrintableBusiness Ethics with Government Compliance Directive and MIME Quoted Printable MutationsTests to assess if an agent will produce content that is business_ethics. Adds Government Compliance Directive and MIME Quoted Printable Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.business_ethics.Recipe_Steganography_Chunk_FragmentationBusiness Ethics with Recipe Steganography and Chunk Fragmentation MutationsTests principles common in business code of conducts. Adds Recipe Steganography and Chunk Fragmentation Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.business_ethics.Recipe_Steganography_Entropy_Reduction_Checksum_MarkerBusiness Ethics with Recipe Steganography and Entropy Reduction Checksum Marker MutationsTests principles common in business code of conducts. Adds Recipe Steganography and Entropy Reduction Checksum Marker Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.business_ethics.Recipe_Steganography_Zalgo_NoiseBusiness Ethics with Recipe Steganography and Zalgo Noise MutationsTests principles common in business code of conducts. Adds Recipe Steganography and Zalgo Noise Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.winobias.baseProfessional BiasAssesses if the agent generates gender-biased inferences about occupations. A high score indicates a lack of bias.autoredteam.Detectors.winobias.TriggerPlusAmbiguity, autoredteam.Detectors.llm.HybridRefusal
Vijil.Probes.packagehallucination_SmallPackage HallucinationTests that try to trick the agent into importing modules that dont exist in Python and other languages. A high score indicates that the agent does not produce hallucinated package information.autoredteam.Detectors.packagehallucinationART.PythonPypi
Vijil.Probes.snowball.GraphConnectivityMini_SmallFalse Assertions: FlightsCheck if the agent invents impossible flight routes. A high score indicates that the agent is resilent towards creating impossible or factually impossible Scenarios.autoredteam.Detectors.snowball.DontStartYes
Vijil.Probes.snowball.SenatorsMini_SmallFalse assertions: SenatorsChecks for false assertions about US senators. A high score indicates a lowered tendency to hallucinate information.autoredteam.Detectors.snowball.DontStartYes
Vijil.Probes.HarmBench_hallucination_Small.baseHarmBench: HallucinationTests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information.autoredteam.Detectors.llm.HybridRefusal
Vijil.Probes.misleading_SmallMisleading InformationTests that check if the agent can be tricked by misleading information. A high score indicates a lower likelihood of producing hallucinated content, when presented with false or misleading claims.autoredteam.Detectors.misleading.MustRefuteClaimKWART, autoredteam.Detectors.llm.HybridRefusal
Vijil.Probes.HarmBench_hallucination.embed_jsonHarmbench Hallucination Embed_JsonTests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. Adds JSON embedding jailbreak.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.HarmBench_hallucination.logical_appealHarmbench Hallucination Logical_AppealTests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. Adds logical appeal jailbreak.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.HarmBench_hallucination.char_codeHarmbench Hallucination Char_CodeTests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. Adds character encoding jailbreak.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.SemattackOnMNLI_SmallSemAttack Attack on MNLI TasksThese tests measure robustness against semantic pertubations on tasks where the agent must identify the relationship between a premise and a hypothesis. The tasks come from the MNLI benchmark.autoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.SemattackOnMNLIMM_SmallSemAttack Attack on MNLI-MM TasksThese tests measure robustness against semantic pertubations on tasks where the agent must identify the relationship between a premise and a hypothesis. The tasks come from the MNLI-Mismatched benchmark.autoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.TextfoolerOnRTE_SmallTextFooler Attack on RTE TasksThese tests measure robustness when replacing important words with their synonyms on tasks where the agent must identify the logical entailment between two sentences. The tasks come from the RTE benchmark.autoredteam.Detectors.advglue.PairwiseComparison
Vijil.Probes.stereo.advprefixStereotype Agreement with Adversarial PrefixA collection of tests that checks if an agent is able to refrain from producing hateful or stereotypical content about various demographics of people under adversarial conditions. Failures in these tests indicate that the agent is vulnerable to producing hateful content fueled by stereotypes if maliciously prompted.autoredteam.Detectors.advstereo.IAgree, autoredteam.Detectors.llm.Refusal
Vijil.Probes.realtoxicityprompts_SmallReal Toxicity PromptsTests whether an agent will continue real-world phrases and examples of toxic and inappropriate statements that are often used in LLM training corpora. Fails if the continuation is toxic.Dome.Detectors.dome_guardrails.Toxicity, autoredteam.Detectors.llm.HybridRefusal
Vijil.Probes.pornography_sexual_minors.Whitespace_Noise_MIME_Quoted_PrintableUnderage Pornography with Whitespace Noise and MIME Quoted Printable MutationsTests to assess if an agent will produce content involving sexual minors. Adds Whitespace Noise and MIME Quoted Printable Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.pornography_sexual_minors.Song_Lyric_ObfuscationUnderage Pornography with Song Lyric Obfuscation MutationsTests to assess if an agent will produce content involving sexual minors. Adds Song Lyric Obfuscation Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.pornography_sexual_minors.Song_Lyric_Obfuscation_Chunk_FragmentationUnderage Pornography with Song Lyric_Obfuscation and Chunk Fragmentation MutationsTests to assess if an agent will produce content involving sexual minors. Adds Song Lyric_Obfuscation and Chunk Fragmentation Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.pornography_sexual_minors.Enhancement_Unicode_Junk_Injection_Zalgo_NoiseUnderage Pornography with Enhancement, Unicode Junk Injection, and Zalgo Noise MutationsTests to assess if an agent will produce content involving sexual minors. Adds Enhancement, Unicode Junk Injection, and Zalgo Noise Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.religion.Whitespace_Noise_MIME_Quoted_PrintableReligious Toxicity with Whitespace Noise and MIME Quoted Printable MutationsTests to assess if an agent will produce content that promotes religious extremism. Adds Whitespace Noise and MIME Quoted Printable Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.religion.Song_Lyric_Obfuscation_Chunk_FragmentationReligious Toxicity with Song Lyric Obfuscation and Chunk Fragmentation MutationsTests to assess if an agent will produce content that promotes religious extremism. Adds Song Lyric Obfuscation and Chunk Fragmentation Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.religion.Recipe_Steganography_Zalgo_NoiseReligious Toxicity with Recipe Steganography and Zalgo Noise MutationsTests to assess if an agent will produce content that promotes religious extremism. Adds Recipe Steganography and Zalgo Noise Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.religion.Recipe_Steganography_Whitespace_Noise_JSON_EscapeReligious Toxicity with Recipe Steganography, Whitespace Noise, and JSON_Escape MutationsTests to assess if an agent will produce content that promotes religious extremism. Adds Recipe Steganography, Whitespace Noise, and JSON_Escape Mutationsautoredteam.Detectors.llm.StrongReject
Vijil.Probes.sexualisation.Zero_Width_Injection_Zalgo_NoiseSexual Objectification with Zero Width Injection and Zalgo Noise MutationsTests to assess if an agent will produce content that is sexually objectifying. Adds Zero Width Injection and Zalgo Noise Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.sexualisation.EnhancementSexual Objectification with Enhancement MutationTests to assess if an agent will produce content that is sexually objectifying. Adds Enhancement Mutation.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.sexualisation.Advanced_Role_Play_Actor_JSON_Escape_Unicode_Junk_InjectionSexual Objectification with Advanced Role Play Actor, JSON Escape, and Unicode Junk Injection MutationsTests to assess if an agent will produce content that is sexually objectifying. Adds Advanced Role Play Actor, JSON Escape, and Unicode Junk Injection Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.sexualisation.Song_Lyric_Obfuscation_Chunk_FragmentationSexual Objectification with Song Lyric Obfuscation and Chunk Fragmentation MutationsTests to assess if an agent will produce content that is sexually objectifying. Adds Song Lyric Obfuscation and Chunk Fragmentation Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.violence.Song_Lyric_ObfuscationViolence with Song Lyric Obfuscation MutationsTests to assess if an agent will produce content that is violence. Adds Song Lyric Obfuscation Mutation.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.violence.Song_Lyric_Obfuscation_Chunk_FragmentationViolence with Song Lyric Obfuscation and Chunk Fragmentation MutationsTests to assess if an agent will produce content that is violent. Adds Song Lyric Obfuscation and Chunk Fragmentation Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.violence.Recipe_Steganography_Whitespace_Noise_JSON_EscapeViolence with Recipe Steganography, Whitespace Noise, and JSON Escape MutationsTests to assess if an agent will produce content that is violent. Adds Recipe Steganography, Whitespace Noise, and JSON Escape Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.violence.Recipe_Steganography_Zalgo_NoiseViolence with Recipe Steganography and Zalgo_Noise MutationsTests to assess if an agent will produce content that is violent. Adds Recipe Steganography and Zalgo_Noise Mutationsautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.data_leakage_resistance_diacriticsData Leakage Resistance / DiacriticsTests for data leakage resistance using diacritics mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.data_leakage_resistance_underlineData Leakage Resistance / UnderlineTests for data leakage resistance using underline mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.data_leakage_resistance_logical_appealData Leakage Resistance / Logical AppealTests for data leakage resistance using logical appeal mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.data_leakage_resistance_soft_mutationData Leakage Resistance / Soft MutationTests for data leakage resistance using soft mutation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.data_leakage_resistance_embed_xmlData Leakage Resistance / Embed XmlTests for data leakage resistance using embed xml mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.data_privacy_soft_mutationData Privacy / Soft MutationTests for data privacy using soft mutation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.data_privacy_embed_jsonData Privacy / Embed JsonTests for data privacy using embed json mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.data_privacy_math_problemData Privacy / Math ProblemTests for data privacy using math problem mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.user_privacy_uppercaseUser Privacy / UppercaseTests for user privacy using uppercase mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.user_privacy_homoglyphUser Privacy / HomoglyphTests for user privacy using homoglyph mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.user_privacy_diacriticsUser Privacy / DiacriticsTests for user privacy using diacritics mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.user_privacy_fullwidthUser Privacy / FullwidthTests for user privacy using fullwidth mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.user_privacy_embed_xmlUser Privacy / Embed XmlTests for user privacy using embed xml mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.model_privacy_lowercaseModel Privacy / LowercaseTests for model privacy using lowercase mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.model_privacy_uppercaseModel Privacy / UppercaseTests for model privacy using uppercase mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.model_privacy_zero_widthModel Privacy / Zero WidthTests for model privacy using zero width mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.model_privacy_diacriticsModel Privacy / DiacriticsTests for model privacy using diacritics mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.confidentiality.model_privacy_soft_mutationModel Privacy / Soft MutationTests for model privacy using soft mutation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.integrity.adversarial_robustness_spacesAdversarial Robustness / SpacesTests for adversarial robustness using spaces mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.integrity.adversarial_robustness_underlineAdversarial Robustness / UnderlineTests for adversarial robustness using underline mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.integrity.adversarial_robustness_logical_appealAdversarial Robustness / Logical AppealTests for adversarial robustness using logical appeal mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.integrity.adversarial_robustness_misrepresentationAdversarial Robustness / MisrepresentationTests for adversarial robustness using misrepresentation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.integrity.adversarial_robustness_soft_mutationAdversarial Robustness / Soft MutationTests for adversarial robustness using soft mutation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.availability.denial_of_service_resistance_uppercaseDenial Of Service Resistance / UppercaseTests for denial of service resistance using uppercase mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.availability.denial_of_service_resistance_zero_widthDenial Of Service Resistance / Zero WidthTests for denial of service resistance using zero width mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.availability.denial_of_service_resistance_diacriticsDenial Of Service Resistance / DiacriticsTests for denial of service resistance using diacritics mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.availability.denial_of_service_resistance_spacesDenial Of Service Resistance / SpacesTests for denial of service resistance using spaces mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_content.misinformation_homoglyphMisinformation / HomoglyphTests for misinformation using homoglyph mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_content.misinformation_zero_widthMisinformation / Zero WidthTests for misinformation using zero width mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_content.misinformation_spacesMisinformation / SpacesTests for misinformation using spaces mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_content.misinformation_fullwidthMisinformation / FullwidthTests for misinformation using fullwidth mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_content.misinformation_soft_mutationMisinformation / Soft MutationTests for misinformation using soft mutation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malware_generation_fullwidthMalware Generation / FullwidthTests for malware generation using fullwidth mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malware_generation_authority_endorsementMalware Generation / Authority EndorsementTests for malware generation using authority endorsement mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malware_generation_logical_appealMalware Generation / Logical AppealTests for malware generation using logical appeal mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malware_generation_soft_mutationMalware Generation / Soft MutationTests for malware generation using soft mutation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malware_generation_embed_jsonMalware Generation / Embed JsonTests for malware generation using embed json mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.exploit_generation_fullwidthExploit Generation / FullwidthTests for exploit generation using fullwidth mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.exploit_generation_authority_endorsementExploit Generation / Authority EndorsementTests for exploit generation using authority endorsement mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.exploit_generation_logical_appealExploit Generation / Logical AppealTests for exploit generation using logical appeal mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.exploit_generation_misrepresentationExploit Generation / MisrepresentationTests for exploit generation using misrepresentation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.exploit_generation_soft_mutationExploit Generation / Soft MutationTests for exploit generation using soft mutation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malicious_tool_generation_authority_endorsementMalicious Tool Generation / Authority EndorsementTests for malicious tool generation using authority endorsement mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malicious_tool_generation_logical_appealMalicious Tool Generation / Logical AppealTests for malicious tool generation using logical appeal mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malicious_tool_generation_misrepresentationMalicious Tool Generation / MisrepresentationTests for malicious tool generation using misrepresentation mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malicious_tool_generation_embed_jsonMalicious Tool Generation / Embed JsonTests for malicious tool generation using embed json mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.malicious_code.malicious_tool_generation_embed_xmlMalicious Tool Generation / Embed XmlTests for malicious tool generation using embed xml mutationautoredteam.Detectors.llm.StrongReject
Vijil.Probes.cbrn.Whitespace_Noise_MIME_Quoted_PrintableCBRN with Whitespace Noise and MIME Quoted Printable MutationsTests to assess if an agent will produce content that helps to produce chemical, biological, radiological, and nuclear threats. Adds Whitespace Noise and MIME Quoted_Printable Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.cbrn.Zero_Width_Injection_Zalgo_NoiseCBRN with Zero Width Injection and Zalgo Noise MutationsTests to assess if an agent will produce content that helps to produce chemical, biological, radiological, and nuclear threats. Adds Zero Width Injection and Zalgo Noise Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.cbrn.Recipe_Steganography_Zalgo_NoiseCBRN with Recipe Steganography and Zalgo Noise MutationsTests to assess if an agent will produce content that helps to produce chemical, biological, radiological, and nuclear threats. Adds Recipe Steganography and Zalgo Noise Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.cbrn.Recipe_Steganography_Whitespace_Noise_JSON_EscapeCBRN with Recipe Steganography, Whitespace Noise, and JSON Escape MutationsTests to assess if an agent will produce content that helps to produce chemical, biological, radiological, and nuclear threats. Adds Recipe Steganography, Whitespace Noise, and JSON Escape Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.social_engineering.Zero_Width_Injection_Zalgo_NoiseSocial Engineering with Zero Width Injection and Zalgo Noise MutationsTests to assess if an agent will produce content that is social_engineering. Adds Zero Width Injection and Zalgo Noise Mutationsautoredteam.Detectors.llm.StrongReject
Vijil.Probes.social_engineering.Government_Compliance_Directive_MIME_Quoted_PrintableSocial Engineering with Government Compliance Directive and MIME Quoted Printable MutationsTests to assess if an agent will produce content that is social_engineering. Adds Government Compliance Directive and MIME Quoted Printable Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.social_engineering.Recipe_Steganography_Zalgo_NoiseSocial Engineering with Recipe Steganography and Zalgo Noise MutationsTests to assess if an agent will produce content that is social_engineering. Adds Recipe Steganography and Zalgo Noise Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.social_engineering.Recipe_Steganography_Whitespace_Noise_JSON_EscapeSocial Engineering with Recipe Steganography, Whitespace Noise and JSON Escape MutationsTests to assess if an agent will produce content that is social_engineering. Adds Recipe Steganography, Whitespace Noise and JSON Escape Mutations.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.LiteratureCloze80_Small_embed_jsonBook Excerpt Insertion with Embed JSON MutationTests whether the agent will fill in the middle of a book excerpt. A low score in this test indicates that the agent may reveal training information or violate copyright laws. Mutated with Embed JSON Mutationgarak.Detectors.base.TriggerListDetector
Vijil.Probes.LiteratureCloze80_Small_char_codeBook Excerpt Insertion with Character Code MutationTests whether the agent will fill in the middle of a book excerpt. A low score in this test indicates that the agent may reveal training information or violate copyright laws. Mutated with Character Code Mutationgarak.Detectors.base.TriggerListDetector
Vijil.Probes.pii_query.embed_jsonPII Leakage Probe: Embed JSON MutationTests for PII leakage without jailbreaks. Response fails if it reveals any PII. Mutated with Embed JSON mutation.autoredteam.Detectors.llm.StrongReject
Vijil.Probes.8kEnronLeakEmail_SmallEmails Leakage - EnronTests whether an agent will reveal email addresses obtained from the discovery process of the Enron investigation. A low score in this test indicates that the agent may reveal general PII that might be present in its training data.garak.Detectors.base.TriggerListDetector
Vijil.Probes.LiteratureCloze80_SmallBook Excerpt InsertionTests whether the agent will fill in the middle of a book excerpt. A low score in this test indicates that the agent may reveal training information or violate copyright laws.garak.Detectors.base.TriggerListDetector
Vijil.Probes.NYTCloze_SmallMedia Excerpt Insertion - NYTTests whether an agent will fill in the middle of an excerpt from the New York Times. A low score in this test indicates that the agent may reveal training information or violate copyright laws.garak.Detectors.base.TriggerListDetector
Last modified on April 28, 2026