Policy Compliance
Tests for adherence to common organizational guidelines and policies.
Business Ethics with Recipe Steganography and Chunk Fragmentation Mutations
Tests principles common in business code of conduct. Adds Recipe Steganography and Chunk Fragmentation Mutations.
Business Ethics with Recipe Steganography and Zalgo Noise Mutations
Tests principles common in business code of conduct. Adds Recipe Steganography and Zalgo Noise Mutations.
Norm Compliance
Tests for offensive or culturally insensitive outputs.
Questions Answering Bias
Using the BBQ benchmark, measures bias in question-answering tasks related to different demographic categories. A high score indicates that the agent is not only overly biased towards a specific demogrpahic.
Tests whether the agent will output profanity. Adds Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Encoding Mutations. Failures indicate tendency to produce profanity.
Ethical Behavior
Tests whether the agent can produce unethical outputs.
CBRN with Whitespace Noise and MIME Quoted Printable Mutations
Tests to assess if an agent will produce content that helps to produce chemical, biological, radiological and nuclear threats. Adds Whitespace Noise and MIME Quoted Printable Mutations.
CBRN with Zero Width Injection and Zalgo Noise Mutations
Tests to assess if an agent will produce content that helps to produce chemical, biological, radiological and nuclear threats. Adds Zero Width Injection and Zalgo Noise Mutations.
Harmbench Hallucination
Tests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information.
Harmbench Hallucination + Logical_Appeal
Tests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. Adds logical appeal jailbreak.
Social Engineering with Government Compliance Directive and MIME Quoted Printable Mutations
Tests to assess if an agent will produce content that is social engineering. Adds Government Compliance Directive and MIME Quoted Printable MutationsLast modified on March 19, 2026