Large Language Models for Information Retrieval
This survey investigates how Large Language Models (LLMs) are transforming the field of Information Retrieval (IR), moving beyond traditional keyword-based search to enable semantic understanding, conversational interactions, and end-to-end answer generation.
LLMs integrate into each stage of the IR pipeline, enhancing capabilities and introducing new challenges.
Introduction
Search is no longer just about retrieving links. While classical IR focused on ranking documents, modern systems powered by LLMs understand, summarize, and directly answer user queries. Tools like Bing, Brave, Perplexity, and Gemini showcase this shift, providing instant summaries alongside search results.
Why This Survey Matters
LLMs not only generate text but also transform how information is accessed and retrieved. Classical IR—built on indexing and keyword matching—struggles to grasp user intent. In contrast, LLMs offer:
- Semantic understanding
- Conversational context
- End-to-end answer generation
From IR Pipeline to LLM-Enhanced Modules
The survey examines how LLMs augment each module of the traditional IR pipeline:
- Query Rewriter
- Retriever
- Reranker
- Reader
Each module benefits from LLM capabilities but also inherits challenges like hallucinations and efficiency trade-offs.
Background
Traditional IR uses:
- Keyword matching (e.g., BM25)
- Vector space models (e.g., cosine similarity)
- Statistical language models
Neural IR improves on this with dense embeddings (e.g., BERT). LLMs go even further, enabling understanding and text generation.
Large Language Models: A Quick Overview
LLMs are transformer-based models:
- Encoder-only (understanding)
- Decoder-only (generation)
- Encoder-decoder (flexible)
Learning styles:
- In-context learning
- Fine-tuning
- RAG (retrieval-augmented generation)
Query Rewriting
Query rewriting is the first step in the IR pipeline, refining user queries. LLMs enhance this through:
- Prompting: zero/few-shot reformulation.
- Fine-tuning: domain-specific transformations.
- Knowledge distillation: compressing LLM capabilities into smaller models.
Query2Doc
Query2Doc reframes query rewriting as a text generation task using few-shot prompting (e.g., MSMARCO examples) to create pseudo-documents, improving retrieval performance by simulating relevant passages.
Concept Drift and Trade-Offs
LLMs may introduce unrelated details during rewriting (concept drift), diluting the original intent. Expansion may help weak retrievers but harm strong ones if the corpus diverges from training data. Query rewriting must balance quality with retriever performance.
Retriever
LLMs enhance retrieval via:
- Data augmentation: generating synthetic queries and relevance labels (e.g., InPairs, ART).
- Model enhancement: building dense retrievers (e.g., GTR, RepLLaMA) and generative retrievers (e.g., DSI, LLM-URL).
Reranker
Rerankers refine candidate documents:
- Supervised rerankers: fine-tuned on labeled data (e.g., monoBERT, T5, RankLLaMA).
- Unsupervised rerankers: use prompting to rank documents (pointwise, listwise, pairwise).
Reader
Readers generate answers from top-ranked documents:
- Passive Readers: receive documents from the retriever and generate answers (e.g., RAG, RETRO, FLARE).
- Active Readers: autonomously generate follow-up queries (e.g., Self-Ask).
- Compressors: reduce content for LLM input (e.g., LeanContext, TCRA).
Retrieval-Augmented Generation (RAG)
RAG combines retriever and generator in one architecture, balancing factual grounding with fluent generation. However, it risks hallucinations if the retriever’s context is insufficient.
Search Agents
LLM search agents like WebGPT and ReAct emulate human browsing, searching and synthesizing information autonomously. They represent the frontier of interactive search.
Conclusion
LLMs are reshaping IR across all components, offering more interactive, personalized, and factual information access. Future work should address personalization, efficiency, and factuality.
References
- Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Haonan Chen, Zheng Liu, Zhicheng Dou, and Ji-Rong Wen. Large Language Models for Information Retrieval: A Survey. 2024. arXiv:2308.07107.