Skip to main content
Language model outputs are surprisingly fragile: small changes in the input prompts can lead to degradation in the quality of the final output. Vijil evaluates the robustness of an LLM using one module of 30 tests.

Probe and Scenario List

Harness/scenarioProbeProbe DescriptionDetector
RobustnessBERT attack on MNLIProbes effects of BERT-based perturbation on MNLI taskRobustness pairwise comparison
BERT attack on MNLI-MMProbes effects of BERT-based perturbation on MNLI-MM taskRobustness pairwise comparison
BERT attack on QNLIProbes effects of BERT-based perturbation on QNLI taskRobustness pairwise comparison
BERT attack on QQPProbes effects of BERT-based perturbation on QQP taskRobustness pairwise comparison
BERT attack on RTEProbes effects of BERT-based perturbation on RTE taskRobustness pairwise comparison
BERT attack on SST2Probes effects of BERT-based perturbation on SST2 taskRobustness pairwise comparison
SemAttack on MNLIProbes effects of SemAttack perturbation on MNLI taskRobustness pairwise comparison
SemAttack on MNLI-MMProbes effects of SemAttack perturbation on MNLI-MM taskRobustness pairwise comparison
SemAttack on QNLIProbes effects of SemAttack perturbation on QNLI taskRobustness pairwise comparison
SemAttack on QQPProbes effects of SemAttack perturbation on QQP taskRobustness pairwise comparison
SemAttack on RTEProbes effects of SemAttack perturbation on RTE taskRobustness pairwise comparison
SemAttack on SST2Probes effects of SemAttack perturbation on SST2 taskRobustness pairwise comparison
SememePSO attack on MNLIProbes effects of SememePSO perturbation on MNLI taskRobustness pairwise comparison
SememePSO attack on MNLI-MMProbes effects of SememePSO perturbation on MNLI-MM taskRobustness pairwise comparison
SememePSO attack on QNLIProbes effects of SememePSO perturbation on QNLI taskRobustness pairwise comparison
SememePSO attack on QQPProbes effects of SememePSO perturbation on QQP taskRobustness pairwise comparison
SememePSO attack on RTEProbes effects of SememePSO perturbation on RTE taskRobustness pairwise comparison
SememePSO attack on SST2Probes effects of SememePSO perturbation on SST2 taskRobustness pairwise comparison
TextBugger attack on MNLIProbes effects of TextBugger perturbation on MNLI taskRobustness pairwise comparison
TextBugger attack on MNLI-MMProbes effects of TextBugger perturbation on MNLI-MM taskRobustness pairwise comparison
TextBugger attack on QNLIProbes effects of TextBugger perturbation on QNLI taskRobustness pairwise comparison
TextBugger attack on QQPProbes effects of TextBugger perturbation on QQP taskRobustness pairwise comparison
TextBugger attack on RTEProbes effects of TextBugger perturbation on RTE taskRobustness pairwise comparison
TextBugger attack on SST2Probes effects of TextBugger perturbation on SST2 taskRobustness pairwise comparison
TextFooler attack on MNLIProbes effects of TextFooler perturbation on MNLI taskRobustness pairwise comparison
TextFooler attack on MNLI-MMProbes effects of TextFooler perturbation on MNLI-MM taskRobustness pairwise comparison
TextFooler attack on QNLIProbes effects of TextFooler perturbation on QNLI taskRobustness pairwise comparison
TextFooler attack on QQPProbes effects of TextFooler perturbation on QQP taskRobustness pairwise comparison
TextFooler attack on RTEProbes effects of TextFooler perturbation on RTE taskRobustness pairwise comparison
TextFooler attack on SST2Probes effects of TextFooler perturbation on SST2 taskRobustness pairwise comparison

Adversarial GLUE

The Adversarial GLUE (AdvGLUE) benchmark performs adversarial robustness evaluation of language models. It is an adversarial version of the GLUE benchmark, covering
  • Six natural language understanding tasks: sentiment classification (SST-2), duplicate question detection (QQP), multi-genre natural language inference (MNLI), mismatched MNLI (MNLI-MM), question-answering (QNLI), and entailment (RTE).
  • Five attack vectors: BERT-ATTACK, SemAttack, SememePSO, TextBugger, and TextFooler. These attack vectors perturb the input prompts for a task to versions that have small spelling mistakes or that are rephrases with the same meaning.
Vijil adapted the prompts in AdvGLUE to our purposes, keeping only those perturbed input prompts that truly preserve the meanings of the original prompts. Taking all combinations of tasks and attacks, Vijil probes whether the modelโ€™s performance declines on a task when the input prompts are perturbed by an attack vector.