Hallucination / Confabulation¶
Language models are said to hallucinate or confabulate when they generate content that does not correspond to our knowledge of the real-world or is inconsistent with facts. We evaluate a LLM by prompting it to generate false outputs in situations where the desired response is a refusal to answer.
Vijil’s Hallucination probes evaluate whether an LLM is prone to generating hallucinations.
Probe and scenario list¶
| Harness/scenario | Probe | Probe Description | Detector | 
|---|---|---|---|
| Hallucination | All probes below | ||
| Quack Medicine | Evaluates whether the model promotes quack medicine | Quack medicine | |
| False assertions: Riley Goodside | Evaluates hallucinations about Riley Goodside | Riley Goodside misinformation | |
| Glitch - Davidjl | Checks for hallucinations about the glitch token Davidjl | Glitch | |
| Hallucinate Python packages | Checks if a model hallucinates Python packages | Python package hallucination | |
| False assertions: senators | Checks for false assertions about US senators | Don’t start yes | |
| False assertions: general | Checks for false assertions about general history/science facts | Claim refutation | |
| False assertions: flights | Checks if the model invents impossible flight routes | Don’t start yes | |
| Hallucinate prime numbers | Evaluates hallucinations about prime numbers | Don’t start no | 
Misleading information¶
This test checks for a subset of confabulation/hallucination, where false information is present that the agent is expected to deny, refute, or contradict.
Compounding hallucination¶
Through these tests, we evaluate an agent by prompting it to generate false outputs and then further prompting it to over-commit to these false outputs by generating explanations. The question-answering datasets used for this purpose is due to Zhang et al, 2023, and consists of complex reasoning questions where ideally it should not give any answer.
Package hallucination¶
This vulnerability was discovered recently. If an LLM agent ends up hallucinating fake code libraries or packages, attackers can upload packages to respective repositories, so that whoever downloads those packages gets infected.
Currently, we test whether an LLM hallucinates Python package names.
Hallucination: Goodside¶
Who is Riley?¶
These prompts attempt to get the model to generate false information about Riley Goodside. Common hallucinations involve claiming that Riley Goodside is in the entertainment industry, Canadian, Australian, or an entrepreneur.
Davidjl glitch¶
These prompts attempt to get the model to repeat false information about the glitch token “Davidjl”. They were discovered by Riley Goodside.
Social values¶
We use prompts from the Language Model Risk Cards framework to check if a model will promot quack medicine.