Safety

Policy Compliance

Tests for adherence to common organizational guidelines and policies.

Business Ethics with Recipe Steganography and Chunk Fragmentation Mutations

Tests principles common in business code of conduct. Adds Recipe Steganography and Chunk Fragmentation Mutations.

Business Ethics with Recipe Steganography and Zalgo Noise Mutations

Tests principles common in business code of conduct. Adds Recipe Steganography and Zalgo Noise Mutations.

Norm Compliance

Tests for offensive or culturally insensitive outputs.

Questions Answering Bias

Using the BBQ benchmark, measures bias in question-answering tasks related to different demographic categories. A high score indicates that the agent is not only overly biased towards a specific demogrpahic.

Profanity with Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Encoding Mutations

Tests whether the agent will output profanity. Adds Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Encoding Mutations. Failures indicate tendency to produce profanity.

Ethical Behavior

Tests whether the agent can produce unethical outputs.

CBRN with Whitespace Noise and MIME Quoted Printable Mutations

Tests to assess if an agent will produce content that helps to produce chemical, biological, radiological and nuclear threats. Adds Whitespace Noise and MIME Quoted Printable Mutations.

CBRN with Zero Width Injection and Zalgo Noise Mutations

Tests to assess if an agent will produce content that helps to produce chemical, biological, radiological and nuclear threats. Adds Zero Width Injection and Zalgo Noise Mutations.

Harmbench Hallucination

Tests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information.

Harmbench Hallucination + Logical_Appeal

Tests whether an agent will help a user to generate misinformation. A high score indicates that the agent refuses to help produce false information. Adds logical appeal jailbreak. Tests to assess if an agent will produce content that is social engineering. Adds Government Compliance Directive and MIME Quoted Printable Mutations

Overview

Trust Score

Evaluation

Runtime Defense

Reference

Policy Compliance

Business Ethics with Recipe Steganography and Chunk Fragmentation Mutations

Business Ethics with Recipe Steganography and Zalgo Noise Mutations

Norm Compliance

Questions Answering Bias

Profanity with Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Encoding Mutations

Ethical Behavior

CBRN with Whitespace Noise and MIME Quoted Printable Mutations

CBRN with Zero Width Injection and Zalgo Noise Mutations

Harmbench Hallucination

Harmbench Hallucination + Logical_Appeal

​Policy Compliance

​Business Ethics with Recipe Steganography and Chunk Fragmentation Mutations

​Business Ethics with Recipe Steganography and Zalgo Noise Mutations

​Norm Compliance

​Questions Answering Bias

​Profanity with Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Encoding Mutations

​Ethical Behavior

​CBRN with Whitespace Noise and MIME Quoted Printable Mutations

​CBRN with Zero Width Injection and Zalgo Noise Mutations

​Harmbench Hallucination

​Harmbench Hallucination + Logical_Appeal

​Social Engineering with Government Compliance Directive and MIME Quoted Printable Mutations

Policy Compliance

Business Ethics with Recipe Steganography and Chunk Fragmentation Mutations

Business Ethics with Recipe Steganography and Zalgo Noise Mutations

Norm Compliance

Questions Answering Bias

Profanity with Comment Injection, Entropy Reduction Checksum Marker, Whitespace Noise, and HTML Encoding Mutations

Ethical Behavior

CBRN with Whitespace Noise and MIME Quoted Printable Mutations

CBRN with Zero Width Injection and Zalgo Noise Mutations

Harmbench Hallucination

Harmbench Hallucination + Logical_Appeal

Social Engineering with Government Compliance Directive and MIME Quoted Printable Mutations