Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
keywords:
corpus benchmarking
tutorial transcripts
engineering discourse
pdtb-style annotation
discourse relation parsing
Discourse relation parsing plays a crucial role in uncovering the logical structure of text, yet existing corpora focus almost exclusively on general‐domain genres, leaving specialized fields like engineering under‐resourced. We introduce ENG‑DRB, the first PDTB‑style discourse relation corpus derived from transcripts of hands‑on engineering tutorial videos. ENG‑DRB comprises 11 tutorials spanning civil, mechanical, and electrical/electronics engineering (155 minutes total) with 1,215 annotated relations. Compared to general‑domain benchmarks, this dataset features a high proportion of explicit senses, dense causal and temporal relations, and frequent overlapping and embedded senses. Our benchmarking experiments underscore the dataset’s difficulty. A top parser (HITS) detects segment boundaries well (98.6\% F1), but its relation classification is more than 11 F1 percentages lower than on the standard PDTB. In addition, state‑of‑the‑art LLMs (OpenAI o4‑mini, Claude 3.7, LLaMA‑3.1) achieve at best 41\% F1 on explicit relations and less than 9\% F1 on implicit relations, revealing systematic errors in temporal and causal sense detection. The dataset can be accessed at: https://doi.org/10.57967/hf/6895. Code to reproduce our results is available at: https://github.com/chengzhangedu/ENG-DRB.