The paper discusses a new method called “Analogical Prompting” to enhance the inference capabilities of Large Language Models (LLMs) using analogies. This summary provides detailed explanations for each section:

  1. Introduction:

    • Large Language Models (LLMs) have shown excellent performance in complex tasks like solving mathematical problems and generating code. The “Chain-of-Thought (CoT) Prompting” method has achieved high accuracy by generating intermediate inference steps when solving problems. However, existing CoT methods face challenges such as the need for manual preparation of specific inference examples. This paper proposes a new method called Analogical Prompting, which aims to improve LLM’s ability to generate inference examples for problem-solving autonomously by self-generating experiences from similar past problems. This approach eliminates the need for manual preparation of examples and enables adaptive inference tailored to each problem.
  2. Related Research:

    • 2.1 Large Language Models and Prompting:
      • With the development of Large Language Models (LLMs), prompting methods designed for tasks have evolved. LLMs, with their vast parameters, have improved their ability to learn and infer from a few examples or simple instructions. This paper’s method leverages the prompting framework by using self-generated examples to guide inference.
    • 2.2 Chain-of-Thought Prompting:
      • CoT Prompting is a method that enhances answer accuracy in LLMs by generating intermediate inference steps. While 0-shot CoT uses general instructions like “think step by step,” Few-shot CoT requires multiple examples but manual labeling. This study aims to overcome this challenge by using self-generated examples.
  3. Preliminary Knowledge:

    • This study focuses on the “problem-solving task,” where LLMs generate answers to given problems. Prompting methods involve transforming input text for LLMs to generate answers based on it. For example, in 0-shot prompting, only the problem is inputted, and in 0-shot CoT, the instruction “think step by step” is added. The goal of this paper is to design prompting methods for generating more accurate answers.
  4. Proposed Method:

    • 4.1 Generation of Examples through Self-Generation:
      • In Analogical Prompting, LLMs are prompted to create self-generated examples that recall past related problems during problem-solving. This allows LLMs to learn within the context and derive answers to new problems. Specifically, the instruction “Explain by recalling related problems” is added to prompt the LLM to solve the original problem after generating multiple examples.
    • 4.2 Self-Generated Knowledge + Example Generation:
      • In complex tasks like code generation, self-generated examples alone may be insufficient, so additional high-level knowledge is generated. Instructions like “Tutorial: Provide core concepts of the problem” prompt the LLM to generate knowledge first and then specific examples. This facilitates the generation of solutions based on specific algorithms or core concepts.
  5. Experimental Setup:

    • 5.1 Tasks:
      • The study evaluates the approach using various tasks that require inference, such as solving mathematical problems, code generation, and logical and temporal reasoning tasks using BIG-Bench. It evaluates the accuracy of mathematical problem-solving using GSM8K and MATH datasets and measures the accuracy of code generation.
    • 5.2 Models:
      • The study compares the accuracy improvements in GPT-3.5-turbo, GPT-4, and PaLM2 models.
    • 5.3 Comparison Methods:> 6. Experiment Results

6.1 Key Findings In the context of solving mathematical problems, the proposed method utilizing self-generated examples demonstrated higher accuracy than 0-shot and Few-shot CoT. Particularly in the MATH task, customizing examples to match various types of reasoning problems was found to be effective.

6.2 Complementary Relationship between Knowledge and Examples In code generation tasks, the self-generated knowledge complementing examples led to further accuracy improvement. This was particularly effective for challenging algorithmic problems.

6.3 Comparison between Generation and Acquisition Self-generated examples, as the model strengthens, prove to be more effective compared to acquisition-based Few-shot CoT using labeled data. Generated examples provide more tailored instances for the problems, making them more effective with stronger models than acquisition-based methods.

6.4 Impact of Model Size With larger models (e.g., text-davinci-003), the proposed method shows increased superiority compared to traditional methods.

6.5 Number of Examples Generated It was confirmed that having 3 to 5 appropriate examples is effective.

6.6 Qualitative Analysis While generated examples are often helpful in problem-solving, there are cases, especially with difficult problems, where relying solely on generated examples may not lead to a solution, indicating the need for further improvements.

  1. Conclusion In this paper, a new prompting method called “Analogical Prompting” was proposed to generate customized inference examples for each problem without labels. This method overcomes the challenges of 0-shot and Few-shot CoT and outperforms traditional methods in various inference tasks.
  1. Limitations and Future Research The proposed method has limitations such as increased computational load in inference due to requiring more token generation. Additionally, the need for more powerful LLMs is necessary as some generated examples may not be effective. Future research is needed to develop methods for generating examples with more generalized problem-solving capabilities.