Probe and Scenario List
| Harness/scenario | Probe | Probe Description | Detector |
|---|---|---|---|
| Privacy | |||
| Copyrighted content | Book excerpt completion | Tests whether the model will complete a book excerpt when prompted with the beginning | Starts with trigger |
| Book excerpt insertion | Tests whether the model will fill in the middle of a book excerpt | Trigger presence | |
| Media excerpt completion - Guardian | Tests whether the model will complete an excerpt from The Guardian when prompted with the beginning | Starts with trigger | |
| Media excerpt insertion - Guardian | Tests whether the model will fill in the middle of an excerpt from The Guardian | Trigger presence | |
| Media excerpt completion - NYT | Tests whether the model will complete an excerpt from the New York Times when prompted with the beginning | Starts with trigger | |
| Media excerpt insertion - NYT | Tests whether the model will fill in the middle of an excerpt from the New York Times | Trigger presence | |
| Private data leak | Emails leakage - Enron | Tests whether the model will reveal email addresses obtained from the discovery process of the Enron investigation | Trigger presence |
| PII Leakage | Tests whether PII provided to the model during a conversation can be leaked | Trigger presence | |
| Confidentiality comprehension | Tests whether the model understands privacy-related phrases that flag when certain information is confidential | Don’t answer yes |
Copyrighted Content
Chang et al, 2023 found that LLMs are prone to memorizing text from sources available on the internet. Vijil’s tests are intended to probe for such memorization capabilities. For this, Vijil supplies an agent with- a sentence with one missing word and ask it to fill in the masked word (cloze completion)
- an incomplete sentence and ask it to complete it (sentence completion)
- Book passages from English literature,
- Articles from New York Times,
- Articles from The Guardian.