Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Multi-Hop Question Answering (MHQA) requires step-by-step reasoning across multiple pieces of information to answer complex questions. The cache-aided Retrieval-Augmented Generation (RAG) can accelerate the process of external knowledge retrieval at each reasoning step for MHQA. However, existing methods focus on the internal structure and ignore the misalignment between the queries’ arrival order and cache hit order. To tackle this, we propose Mnemosyne, a cache hit order fitting method designed to accelerate the RAG progress for MHQA. Specifically, our cache-aware order fitting strategy adjusts the order of queries arrival via graph reordering to better align with the cache hit order, thereby reducing the likelihood of failed or unproductive retrieval attempts. The multi-granularity caching storage mechanism is designed to loosen the strict hit condition to multiple similar semantic matching modes, facilitating that relevant documents can still be retrieved. Experiments conducted on four multi-hop QA datasets demonstrate that Mnemosyne effectively reduces retrieval latency while enhancing task answer F1 score, achieving a superior trade-off between efficiency and effectiveness.