Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/6rsd-4v33

poster

ACL 2024

August 12, 2024

Bangkok, Thailand

Developing PUGG for Polish: A Modern Approach to KBQA, MRC, and IR Dataset Construction

keywords:

polish

llm

dataset construction

mrc

wikidata

knowledge base question answering

kbqa

kg

machine reading comprehension

large language models

ir

natural language processing

knowledge graph

information retrieval

question answering

dataset

Advancements in AI and natural language processing have revolutionized machine-human language interactions, with question answering (QA) systems playing a pivotal role. The knowledge base question answering (KBQA) task, utilizing structured knowledge graphs (KG), allows for handling extensive knowledge-intensive questions. However, a significant gap exists in KBQA datasets, especially for low-resource languages. Many existing construction pipelines for these datasets are outdated and inefficient in human labor, and modern assisting tools like Large Language Models (LLM) are not utilized to reduce the workload. To address this, we have designed and implemented a modern, semi-automated approach for creating datasets, encompassing tasks such as KBQA, Machine Reading Comprehension (MRC), and Information Retrieval (IR), tailored explicitly for low-resource environments. We executed this pipeline and introduced the PUGG dataset, the first Polish KBQA dataset, and novel datasets for MRC and IR. Additionally, we provide a comprehensive implementation, insightful findings, detailed statistics, and evaluation of baseline models.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

ICC : Quantifying Image Caption Concreteness for Multimodal Dataset Curation
poster

ICC : Quantifying Image Caption Concreteness for Multimodal Dataset Curation

ACL 2024

+1
Moran Yanuka and 3 other authors

12 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved