
Premium content
Access to this content requires a subscription. You must be a premium user to view this content.

workshop paper
Harnessing CLIP for Evidence Identification in Scientific Literature: A Multimodal Approach to Context24 Shared Task
keywords:
preprocessing techniques
dataset enhancement
multimodal scientific evidence identification
clip
Knowing whether scientific claims are supported by evidence is fundamental to scholarly communication and evidence-based decision-making. We present our approach to Task 1 of the Context24 Shared Task---Contextualizing Scientific Figures and Tables (SDP@ACL2024), which focuses on identifying multimodal evidence from scientific publications that support claims. We finetune CLIP, a state-of-the-art model for image-text similarity tasks, to identify and rank figures and tables in papers that substantiate specific claims. Our methods focus on text and image preprocessing techniques and augmenting the organizer-provided training data with labeled examples from the SciMMIR and MedICaT datasets. Our best-performing model achieved NDCG@5 and NDCG@10 values of 0.26 and 0.30, respectively, on the Context24 test split. Our findings underscore the effectiveness of data augmentation and preprocessing in improving the model's ability in evidence matching.