Privacy

LLM outputs can accidentally disseminate confidential information. Such sensitive data leakage poses substantial privacy (and security) risks in terms of reputational harm and potential legal consequences. Vijil tests if an LLM is prone to leaking confidential data from both training data and inference-stage data.

Probe and Scenario List

Harness/scenario	Probe	Probe Description	Detector
Privacy
Copyrighted content	Book excerpt completion	Tests whether the model will complete a book excerpt when prompted with the beginning	Starts with trigger
	Book excerpt insertion	Tests whether the model will fill in the middle of a book excerpt	Trigger presence
	Media excerpt completion - Guardian	Tests whether the model will complete an excerpt from The Guardian when prompted with the beginning	Starts with trigger
	Media excerpt insertion - Guardian	Tests whether the model will fill in the middle of an excerpt from The Guardian	Trigger presence
	Media excerpt completion - NYT	Tests whether the model will complete an excerpt from the New York Times when prompted with the beginning	Starts with trigger
	Media excerpt insertion - NYT	Tests whether the model will fill in the middle of an excerpt from the New York Times	Trigger presence
Private data leak	Emails leakage - Enron	Tests whether the model will reveal email addresses obtained from the discovery process of the Enron investigation	Trigger presence
	PII Leakage	Tests whether PII provided to the model during a conversation can be leaked	Trigger presence
	Confidentiality comprehension	Tests whether the model understands privacy-related phrases that flag when certain information is confidential	Don’t answer yes

Copyrighted Content

Chang et al, 2023 found that LLMs are prone to memorizing text from sources available on the internet. Vijil’s tests are intended to probe for such memorization capabilities. For this, Vijil supplies an agent with

a sentence with one missing word and ask it to fill in the masked word (cloze completion)
an incomplete sentence and ask it to complete it (sentence completion)

Vijil performs two such evaluations for each of the following three sources of text:

Book passages from English literature,
Articles from New York Times,
Articles from The Guardian.

Private Data Leak

These probes test for whether models will reveal private data from either their training data or their inference-time data.

Private Training Data

Vijil uses Decoding Trust’s Enron dataset to test whether models will leak private data from the Enron emails.

Inference-stage Private Data

Vijil uses Decoding Trust’s synthetic prompts to test whether models will leak private data like social security numbers and SSH keys if these are provided at the inference stage. For example, if a user enters a query to a chatbot that contains private data, will this data be easily extractable later in the same conversation?

Privacy Understanding

The privacy understanding probe, adapted from Decoding Trust, measures whether the model understands when not to reveal private information. The prompts ask the model whether it is appropriate to reveal private information in various settings.

Get Started

Core Concepts

Manage Agents

Protect Agents

Evaluate Agents

Tutorials

References

Probe and Scenario List

Copyrighted Content

Private Data Leak

Private Training Data

Inference-stage Private Data

Privacy Understanding

Get Started

Core Concepts

Manage Agents

Protect Agents

Evaluate Agents

Tutorials

References

​Probe and Scenario List

​Copyrighted Content

​Private Data Leak

​Private Training Data

​Inference-stage Private Data

​Privacy Understanding

Probe and Scenario List

Copyrighted Content

Private Data Leak

Private Training Data

Inference-stage Private Data

Privacy Understanding