Would you like to see your presentation here, made available to a global audience of researchers?
Add your own presentation or have us affordably record your next conference.
Knowledge graphs (KGs) enhance pretrained language models by incorporating additional knowledge, improving their performance in specialized fields, for example, helping models learn domain-specific relationships between documents that might otherwise be missed. In the process industry, text logs contain crucial information about daily operations, such as events, instructions, and incident reports, and are often structured as sparse KGs. This paper explores how SciNCL, a graph-aware neighborhood contrastive learning methodology originally designed for scientific publications, can be adapted to the process industry domain. We use several KGs to train graph embedding (GE) models, which we then use to generate synthetic training datasets for a domain-specific text encoder. Our experiments demonstrate that language models fine-tuned with triplets derived from GE outperform a state-of-the-art mE5-large text encoder by 12-13.5% (6.68-7.54p) on the proprietary process industry text embedding benchmark (PITEB) while being 3-5 times smaller in size.