The document titled “Query Rewriting for Retrieval-Augmented Large Language Models” is summarized in detail by each section as follows:

  1. Introduction: This paper proposes a new framework called “Rewrite-Retrieve-Read” to enhance large language models (LLMs) based on the retrieve-then-read pipeline. Unlike traditional retrieve-then-read approaches that use the original input text for query generation, this framework focuses on reconstructing queries to bridge the gap between necessary knowledge and the model’s performance limitations. The new framework involves restructuring queries, retrieving relevant context using a search engine, and then having the LLM process it for comprehension. Additionally, a small language model is trained as a rewriter to adapt queries and improve LLM performance.

  2. Related Work: This section discusses how the new framework complements existing techniques compared to previous research. It explores methods for incorporating external knowledge to address knowledge gaps and “hallucination” issues in LLMs, which generate incorrect information. Collaboration between traditional retrieval enhancement methods and LLMs treated as black boxes is emphasized due to the common use of large LLMs as black boxes accessible only through inference APIs.

  3. Methodology: This section explains the detailed structure of the “Rewrite-Retrieve-Read” framework, which involves rewriting queries based on input, conducting searches, and having LLMs interpret the results to provide answers. A trainable scheme is proposed to optimize queries using reinforcement learning based on LLM feedback, enabling flexible and adaptive context optimization even with black-box models.

  4. Implementation: The implementation details of the framework are elaborated in this section:

  • Rewriter: Models like ChatGPT and T5 are used as rewriters to prompt query generation with specific prompt formats.
  • Retriever: The Bing search engine API is utilized as a retriever to extract information from the web, eliminating the need for index construction and maintenance.
  • Reader: Models like ChatGPT and Vicuna-13B are used as readers to interpret retrieved context and generate answers, allowing task-specific reading through pre-training.
  1. Experiments: The effectiveness of the proposed method is validated through experiments using open-domain question-answering tasks (HotpotQA, AmbigNQ, PopQA) and multiple-choice questions (MMLU). A comparison is made with baselines such as standard learning without direct retrieval, post-retrieval reading, using a fixed LLM as a rewriter, and employing a trainable rewriter. Results show consistent performance improvement in both open-domain QA and multiple-choice QA tasks through query rewriting, with trainable rewriters enhancing adaptability between LLMs and retrievers, leading to performance enhancement.

  2. Analysis: Further analysis of the experimental results verifies how the proposed method achieved performance improvement.> Training Process: Indicates the improvement in performance as training progresses, showcasing enhanced performance of the rewriter at each stage of warm-up and reinforcement learning.

Evaluation of Retrieval Results: It is evident that rewriting improves the accuracy of retrieval, consequently enhancing the performance of the reader. By analyzing the hit rate based on retrieval results (whether the correct answer is included in the retrieved context), it discusses how the improvement in hit rate affects the performance of the reader.

Case Study: By providing specific examples, it demonstrates how rewriting optimizes queries and consequently improves the response accuracy of the reader.

Conclusion: This paper demonstrates that the new framework “Rewrite-Retrieve-Read” significantly enhances the performance of LLM with retrieval reinforcement. By increasing the accuracy of retrieval through query rewriting, LLM can adapt flexibly even when it is a black box. Additionally, introducing small trainable models enables the optimization of retriever and reader.

Limitations: Lastly, it mentions the limitations of this approach. It discusses the trade-offs between generalization and specialization of tasks, constraints associated with tool usage (as search engines are often paid, and the lack of restrictions on incorporating knowledge makes control difficult), and the advantages and disadvantages of web search compared to dense retrievers.

This paper demonstrates the potential for improvement in retrieval enhancement through query rewriting and proposes a new approach that enables adaptive performance improvement even for black box LLMs.