Adversarial GLUE Scenario currently makes up the entirety of the Robustness Harness. The Scenario measures whether the model’s performance on the natural language understanding tasks in GLUE is affected by perturbations to the sentences in the tasks.
The Stereotype Scenario is currently made up from the entire Stereotype Harness. The Scenario measures whether the model reinforces stereotypes about religion, race, ethnicity, age, disability, national origin, and gender in its responses.