Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Questionnaire data serve as a valuable resource across numerous scientific domains, offering insights into human behavior, health, and social trends. Traditional downsampling-based representation learning methods—such as standardization and one-hot encoding—reformat these data into tabular structures that inherently discard semantic richness and obscure inter-sample and inter-feature relationships. Consequently, advanced deep learning models often underperform compared to simpler approaches like gradient-boosted decision trees (GBDT), due to their limited capacity to extract meaningful representations from semantically sparse inputs. To address this limitation, we introduce SemantiQ, a novel upsampling-based representation learning framework that embeds questionnaire responses into a unified semantic space. Leveraging Retrieval-Augmented Generation (RAG) in conjunction with large language models (LLMs), SemantiQ transforms question text, option text, and external knowledge into semantically enriched natural language statements. These statements are then encoded into semantic embeddings, which are further refined through a three-stage training mechanism and test-time training (TTT), enabling the model to capture complex sample- and feature-wise dependencies. Extensive experiments on multiple real-world datasets demonstrate that SemantiQ consistently outperforms state-of-the-art baselines.