GRAD: Generative Retrieval-Aligned Demonstration Sampler for Efficient Few-Shot Reasoning

Nicolas Baldwin

EMNLP 2025

•

November 07, 2025

•

Suzhou, China

Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.

Large Language Models (LLMs) excel at reasoning but often struggle with multi-step tasks due to limited contextual understanding. While Retrieval-Augmented Generation (RAG) provides external context, it relies on static datasets and struggles to generalize across diverse queries. In this work, we propose a Generative Retrieval-Aligned Demonstrator (GRAD), a dynamic demonstration-based approach where an LLM model is trained to generate input-specific concise demonstrations. By tailoring demonstrations to each input, our method offers better contextual support than traditional RAG approaches. GRAD is grounded in a generative strategy allowing adaptive prompting without needing external retrieval. We demonstrate the superiority of GRAD under the budget constraints, where we limit both the number of tokens per demonstration and the output response budget. Trained solely on math dataset, GRAD consistently outperforms strong baselines ranging from math reasoning to advanced STEM questions on Qwen2.5-14B. This showcases the robust generalization of GRAD to out-of-distribution (OOD) STEM tasks, such as physics, chemistry and computer science. Since demonstrations can be generated by a different, smaller model, GRAD might reduce training cost while still maintaining similar accuracy. This work introduces a scalable demonstration generator model that is the first step towards a new few-shot learning paradigm in constrained, limited-resource settings.