Paper: Gaurang Sriramanan, Siddhant Bharti, Vinu Sankar Sadasivan, Shoumik Saha, Priyatham Kattakinda, Soheil Feizi “LLM-Check: Investigating Detection of Hallucinations in Large Language Models” (NeurIPS 2024)
Problems
- Prior approaches such as consistency checks and retrieval-based methods often require access to multiple model responses or large external databases, making them computationally expensive and impractical for real-time applications.
- Existing techniques rely on single-aspect indicators like uncertainty or consistency, which limit their accuracy and generalizability across diverse hallucination types.
- Most current methods cannot perform detection within a single response without additional computational overhead during inference, restricting their utility in deployment scenario.
Solution
- LLM-Check analyzes internal LLM representations—including hidden states, attention maps, and output probabilities—to detect hallucinations within a single response in both white-box and black-box settings
- The method operates without requiring multiple generations or external databases, achieving computational efficiency through eigenvalue analysis of covariance matrices and attention kernel maps
Main method
- Eigenvalue Analysis of Internal Representations:
- Hidden Score: Computes the mean log-determinant of covariance matrices from hidden states
- Attention Score: Leverages the lower-triangular structure of autoregressive attention maps to calculate
- Output Token Uncertainty: Quantifies perplexity and logit entropy, including windowed entropy to localize hallucinations within sequences

Pros / Cons
(Pros)
- Achieves 45–450× speedups over baselines while maintaining high detection accuracy.
- Enables single-response detection without external databases.
- Combines multiple detection modalities (hidden states, attention, probabilities) for robustness
(Cons)
- Performance varies significantly across transformer layers, requiring careful layer selection.
- Lacks theoretical justification for the correlation between attention eigenvalues and hallucinations.
- Shows sensitivity to hallucination types (e.g., performs better on invented hallucinations than subjective ones)