Hallucination / Confabulation

Language models are said to hallucinate or confabulate when they generate content that does not correspond to knowledge of the real-world or is inconsistent with facts. You evaluate a LLM by prompting it to generate false outputs in situations where the desired response is a refusal to answer. Vijil’s Hallucination Probes evaluate whether an LLM is prone to generating hallucinations.

Probe and Scenario List

Harness/Scenario	Probe	Probe Description	Detector
Hallucination	All Probes below
Social values	Quack Medicine	Evaluates whether the model promotes quack medicine	Quack medicine
Hallucination: Goodside	False assertions: Riley Goodside	Evaluates hallucinations about Riley Goodside	Riley Goodside misinformation
	Glitch - Davidjl	Checks for hallucinations about the glitch token Davidjl	Glitch
Package hallucination	Hallucinate Python packages	Checks if a model hallucinates Python packages	Python package hallucination
Compounding hallucination	False assertions: senators	Checks for false assertions about USA senators	Don’t start yes
	False assertions: general	Checks for false assertions about general history/science facts	Claim refutation
	False assertions: flights	Checks if the model invents impossible flight routes	Don’t start yes
	Hallucinate prime numbers	Evaluates hallucinations about prime numbers	Don’t start no

Misleading Information

This test checks for a subset of confabulation/hallucination, where false information is present that the agent is expected to deny, refute, or contradict.

Compounding Hallucination

Through these tests, you evaluate an agent by prompting it to generate false outputs and then further prompting it to over-commit to these false outputs by generating explanations. The question-answering datasets used for this purpose is due to Zhang et al, 2023, and consists of complex reasoning questions where ideally it should not give any answer.

Package Hallucination

This vulnerability was discovered recently. If an LLM agent ends up hallucinating fake code libraries or packages, attackers can upload packages to respective repositories, so that whoever downloads those packages gets infected. Currently, Vijil test whether an LLM hallucinates Python package names.

Hallucination: Goodside

Who is Riley?

These prompts attempt to get the model to generate false information about Riley Goodside. Common hallucinations involve claiming that Riley Goodside is in the entertainment industry, Canadian, Australian, or an entrepreneur.

Davidjl Glitch

These prompts attempt to get the model to repeat false information about the glitch token “Davidjl”. They were discovered by Riley Goodside. Vijil use prompts from the Language Model Risk Cards framework to check if a model will prompt quack medicine.

Get Started

Core Concepts

Manage Agents

Protect Agents

Evaluate Agents

Tutorials

References

Hallucination / Confabulation

Probe and Scenario List

Misleading Information

Compounding Hallucination

Package Hallucination

Hallucination: Goodside

Who is Riley?

Davidjl Glitch

Get Started

Core Concepts

Manage Agents

Protect Agents

Evaluate Agents

Tutorials

References

​Probe and Scenario List

​Misleading Information

​Compounding Hallucination

​Package Hallucination

​Hallucination: Goodside

​Who is Riley?

​Davidjl Glitch

​Social Values

Probe and Scenario List

Misleading Information

Compounding Hallucination

Package Hallucination

Hallucination: Goodside

Who is Riley?

Davidjl Glitch

Social Values