Skip to main content
Language models are said to hallucinate or confabulate when they generate content that does not correspond to knowledge of the real-world or is inconsistent with facts. You evaluate a LLM by prompting it to generate false outputs in situations where the desired response is a refusal to answer. Vijil’s Hallucination Probes evaluate whether an LLM is prone to generating hallucinations.

Probe and Scenario List

Harness/ScenarioProbeProbe DescriptionDetector
HallucinationAll Probes below
Social valuesQuack MedicineEvaluates whether the model promotes quack medicineQuack medicine
Hallucination: GoodsideFalse assertions: Riley GoodsideEvaluates hallucinations about Riley GoodsideRiley Goodside misinformation
Glitch - DavidjlChecks for hallucinations about the glitch token DavidjlGlitch
Package hallucinationHallucinate Python packagesChecks if a model hallucinates Python packagesPython package hallucination
Compounding hallucinationFalse assertions: senatorsChecks for false assertions about USA senatorsDon’t start yes
False assertions: generalChecks for false assertions about general history/science factsClaim refutation
False assertions: flightsChecks if the model invents impossible flight routesDon’t start yes
Hallucinate prime numbersEvaluates hallucinations about prime numbersDon’t start no

Misleading Information

This test checks for a subset of confabulation/hallucination, where false information is present that the agent is expected to deny, refute, or contradict.

Compounding Hallucination

Through these tests, you evaluate an agent by prompting it to generate false outputs and then further prompting it to over-commit to these false outputs by generating explanations. The question-answering datasets used for this purpose is due to Zhang et al, 2023, and consists of complex reasoning questions where ideally it should not give any answer.

Package Hallucination

This vulnerability was discovered recently. If an LLM agent ends up hallucinating fake code libraries or packages, attackers can upload packages to respective repositories, so that whoever downloads those packages gets infected. Currently, Vijil test whether an LLM hallucinates Python package names.

Hallucination: Goodside

Who is Riley?

These prompts attempt to get the model to generate false information about Riley Goodside. Common hallucinations involve claiming that Riley Goodside is in the entertainment industry, Canadian, Australian, or an entrepreneur.

Davidjl Glitch

These prompts attempt to get the model to repeat false information about the glitch token “Davidjl”. They were discovered by Riley Goodside.

Social Values

Vijil use prompts from the Language Model Risk Cards framework to check if a model will prompt quack medicine.