LLM-Check: Investigating Detection of Hallucinations in Large Language Models (NeurIPS 2024)

연구/Natural Language Processing

LLM-Check: Investigating Detection of Hallucinations in Large Language Models (NeurIPS 2024)

서히! 2025. 5. 25. 23:23

Paper: Gaurang Sriramanan, Siddhant Bharti, Vinu Sankar Sadasivan, Shoumik Saha, Priyatham Kattakinda, Soheil Feizi “LLM-Check: Investigating Detection of Hallucinations in Large Language Models” (NeurIPS 2024)

Problems

Prior approaches such as consistency checks and retrieval-based methods often require access to multiple model responses or large external databases, making them computationally expensive and impractical for real-time applications.
Existing techniques rely on single-aspect indicators like uncertainty or consistency, which limit their accuracy and generalizability across diverse hallucination types.
Most current methods cannot perform detection within a single response without additional computational overhead during inference, restricting their utility in deployment scenario.

Solution

LLM-Check analyzes internal LLM representations—including hidden states, attention maps, and output probabilities—to detect hallucinations within a single response in both white-box and black-box settings
The method operates without requiring multiple generations or external databases, achieving computational efficiency through eigenvalue analysis of covariance matrices and attention kernel maps

Main method

Eigenvalue Analysis of Internal Representations:
- Hidden Score: Computes the mean log-determinant of covariance matrices from hidden states
- Attention Score: Leverages the lower-triangular structure of autoregressive attention maps to calculate
Output Token Uncertainty: Quantifies perplexity and logit entropy, including windowed entropy to localize hallucinations within sequences

Pros / Cons
(Pros)

Achieves 45–450× speedups over baselines while maintaining high detection accuracy.
Enables single-response detection without external databases.
Combines multiple detection modalities (hidden states, attention, probabilities) for robustness

(Cons)

Performance varies significantly across transformer layers, requiring careful layer selection.
Lacks theoretical justification for the correlation between attention eigenvalues and hallucinations.
Shows sensitivity to hallucination types (e.g., performs better on invented hallucinations than subjective ones)

'연구 > Natural Language Processing' 카테고리의 다른 글

[Paper Review] Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning (5)	2025.07.13
[Paper Review] Entropy-Aware Branching for Improved Mathematical Reasoning (4)	2025.07.06
[Paper Review] LLM-Check: Investigating Detection of Hallucinations in Large Language Models (0)	2025.03.24
[Paper Review] GPT1: Improving Language Understanding by Generative Pre-Training (3)	2025.02.05
[Paper Review] Transformer: Attention Is All You Need (0)	2025.01.24

현재글LLM-Check: Investigating Detection of Hallucinations in Large Language Models (NeurIPS 2024)

서히의 우당탕탕 코딩일기

https://github.com/seohee0925

코딩테스트, 프로그래머스, 빅분기, 빅데이터학회, 영진닷컴, 빅분기_실기, programmers, 학회, 빅분기스터디, BDA, 딥러닝, python, Deep Learning, 파이썬, AI, 빅데이터 연합동아리, 코테, 이기적, BDA학회, BITAmin,

Today :
Yesterday :

일	월	화	수	목	금	토
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30	31

서히의 우당탕탕 코딩일기