
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

workshop paper
ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models
keywords:
conversational qa
synthetic data
large language models
knowledge graphs
question answering
The evolving landscape of Large Language Models (LLMs) and conversational assistants has ushered in a need for dynamic, up-to-date, scalable, and configurable conversational datasets to train and evaluate systems. Ideally, these datasets are tailored for different user interaction settings, such as text and voice, all of which introduce distinct nuances and modeling challenges. Knowledge Graphs (KGs), with their structured and continuously evolving nature, serve as an ideal reservoir for harnessing current and precise knowledge. While there exist human-curated conversational datasets grounded on KGs, it is hard to rely solely on them, as the information needs of users are in constant flux. Addressing this lacuna, we introduce ConvKGYarn, a scalable and effective method to generate up-to-date, configurable synthetic conversational KGQA datasets. Qualitative psychometric analyses elucidate the effectiveness of ConvKGYarn in generating high-quality conversational data that rivals a popular conversational KGQA dataset on various metrics while making strides in additional desirable properties like adhering to human interaction configurations and functioning at a much larger scale. We further demonstrate the utility of ConvKGYarn by testing LLMs on varied conversations to explore model behavior on conversational KGQA sets with different configurations grounded on the same fact set from the KG. Through our work, we aim to fortify the underpinnings of KGQA and evaluate the parametric knowledge of LLMs.