Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Large Language Model (LLM)-based self-refinement methods have significantly enhanced data analysis performance, especially in correcting errors for Text-to-SQL. However, their effectiveness diminishes when addressing SQL semantic errors, since LLM hallucinations cause persistent biases in the semantic understanding of questions, leading to an uncorrectable situation. To solve this problem, we propose \textbfTest-driven \textbfSelf-refinement for Text-to-SQL \textbf(TS-SQL). It leverages a collaborative LLM agent framework to automatically synthesize high-quality test cases, including test data and test code. The test cases are further employed to provide execution feedback for LLM self-refinement towards SQL semantic errors. Rigorous evaluation shows the superiority of TS-SQL: for BIRD-dev, TS-SQL improves at least \textbf6\% over existing SQL self-refinement methods; for Spider-dev, TS-SQL identifies and corrects \textbf131 gold SQL errors, exposing system flaws in benchmark rigor. For reproducibility, we release the modified Spider-dev benchmark to foster further research.