Large Language Models for Information Retrieval

This survey investigates how Large Language Models (LLMs) are transforming the field of Information Retrieval (IR), moving beyond traditional keyword-based search to enable semantic understanding, conversational interactions, and end-to-end answer generation.

LLMs integrate into each stage of the IR pipeline, enhancing capabilities and introducing new challenges.

Introduction

Search is no longer just about retrieving links. While classical IR focused on ranking documents, modern systems powered by LLMs understand, summarize, and directly answer user queries. Tools like Bing, Brave, Perplexity, and Gemini showcase this shift, providing instant summaries alongside search results.

Why This Survey Matters

LLMs not only generate text but also transform how information is accessed and retrieved. Classical IR—built on indexing and keyword matching—struggles to grasp user intent. In contrast, LLMs offer:

  • Semantic understanding
  • Conversational context
  • End-to-end answer generation

From IR Pipeline to LLM-Enhanced Modules

The survey examines how LLMs augment each module of the traditional IR pipeline:

  1. Query Rewriter
  2. Retriever
  3. Reranker
  4. Reader

Each module benefits from LLM capabilities but also inherits challenges like hallucinations and efficiency trade-offs.

Background

Traditional IR uses:

  • Keyword matching (e.g., BM25)
  • Vector space models (e.g., cosine similarity)
  • Statistical language models

Neural IR improves on this with dense embeddings (e.g., BERT). LLMs go even further, enabling understanding and text generation.

Large Language Models: A Quick Overview

LLMs are transformer-based models:

  • Encoder-only (understanding)
  • Decoder-only (generation)
  • Encoder-decoder (flexible)

Learning styles:

  • In-context learning
  • Fine-tuning
  • RAG (retrieval-augmented generation)

Query Rewriting

Query rewriting is the first step in the IR pipeline, refining user queries. LLMs enhance this through:

  • Prompting: zero/few-shot reformulation.
  • Fine-tuning: domain-specific transformations.
  • Knowledge distillation: compressing LLM capabilities into smaller models.

Query2Doc

Query2Doc reframes query rewriting as a text generation task using few-shot prompting (e.g., MSMARCO examples) to create pseudo-documents, improving retrieval performance by simulating relevant passages.

Concept Drift and Trade-Offs

LLMs may introduce unrelated details during rewriting (concept drift), diluting the original intent. Expansion may help weak retrievers but harm strong ones if the corpus diverges from training data. Query rewriting must balance quality with retriever performance.

Retriever

LLMs enhance retrieval via:

  • Data augmentation: generating synthetic queries and relevance labels (e.g., InPairs, ART).
  • Model enhancement: building dense retrievers (e.g., GTR, RepLLaMA) and generative retrievers (e.g., DSI, LLM-URL).

Reranker

Rerankers refine candidate documents:

  • Supervised rerankers: fine-tuned on labeled data (e.g., monoBERT, T5, RankLLaMA).
  • Unsupervised rerankers: use prompting to rank documents (pointwise, listwise, pairwise).

Reader

Readers generate answers from top-ranked documents:

  • Passive Readers: receive documents from the retriever and generate answers (e.g., RAG, RETRO, FLARE).
  • Active Readers: autonomously generate follow-up queries (e.g., Self-Ask).
  • Compressors: reduce content for LLM input (e.g., LeanContext, TCRA).

Retrieval-Augmented Generation (RAG)

RAG combines retriever and generator in one architecture, balancing factual grounding with fluent generation. However, it risks hallucinations if the retriever’s context is insufficient.

Search Agents

LLM search agents like WebGPT and ReAct emulate human browsing, searching and synthesizing information autonomously. They represent the frontier of interactive search.

Conclusion

LLMs are reshaping IR across all components, offering more interactive, personalized, and factual information access. Future work should address personalization, efficiency, and factuality.

References

  • Yutao Zhu, Huaying Yuan, Shuting Wang, Jiongnan Liu, Wenhan Liu, Chenlong Deng, Haonan Chen, Zheng Liu, Zhicheng Dou, and Ji-Rong Wen. Large Language Models for Information Retrieval: A Survey. 2024. arXiv:2308.07107.
© 2025 Manuel Di Agostino. All rights reserved.