Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
We introduce SQLSpace, a human-interpretable, generalizable, compact representation for text-to-SQL examples, where natural language questions are translated to executable SQL queries. This representation is derived semi-automatically with minimal human intervention. We demonstrate its utility in evaluation by closely analyzing (i) the composition of widely-used benchmarks and (ii) model performance at a granular level beyond overall accuracy scores. Our analyses not only reveal example subsets that challenge all models, including those with strongest overall performance, but more importantly, specific example classes where smaller, cheaper models perform comparably to frontier models. Finally, we show a practical application of SQLSpace at inference time, using our representation to predict which natural language questions will likely yield incorrect SQL from a text-to-SQL model, and rewriting such questions to improve accuracy.