Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Dense retrieval models commonly use flat indexes to achieve high-precision retrieval by computing exact distances between embedding vectors. However, flat indexes are memory-intensive and inefficient, limiting their scalability in large-scale retrieval tasks. In contrast, quantized indexes enable faster retrieval with significantly lower memory usage, but their accuracy tends to decrease. Therefore, we propose a scalable and efficient training method for the dual-encoder models to improves the retrieval accuracy on quantized indexes. Our approach combines the direct gradient update to the cached target embeddings with large scale negative sampling based on similarity, significantly reducing computational overhead and GPU memory usage. Target embeddings are initialized with a pre-trained encoder and stored in a memory buffer, which is directly updated via backpropagation, thus avoiding the repeated re-encoding of the full corpus. To build a rich set of negatives, we retrieve the top-$k$ most similar targets for each query from cached embeddings using the quantized index, including both query-specific and cross-batch top-$k$ results. This design effectively approximates the truncated softmax distribution. The experiments show that our method achieves performs exceptionally well on quantized indexes, providing a practical and scalable solution for real-world retrieval systems.
