Lecture image placeholder

Premium content

Access to this content requires a subscription. You must be a premium user to view this content.

Monthly subscription - $9.99Pay per view - $4.99Access through your institutionLogin with Underline account
Need help?
Contact us
Lecture placeholder background
VIDEO DOI: https://doi.org/10.48448/7fgs-qg54

poster

ACL 2024

August 12, 2024

Bangkok, Thailand

Boosting Textural NER with Synthetic Image and Instructive Alignment

keywords:

text-image alignment

multimodal name entity recognition

Named entity recognition (NER) is a pivotal task reliant on textual data, often impeding the disambiguation of entities due to the absence of context. To tackle this challenge, conventional methods often incorporate images crawled from the internet as auxiliary information. However, the images often lack sufficient entities or would introduce noise. Even with high-quality images, it is still challenging to efficiently use images as auxiliaries (i.e., fine-grained alignment with texts). We introduce a novel method named InstructNER to address these issues. Leveraging the rich real-world knowledge and image synthesis capabilities of a large pre-trained stable diffusion (SD) model, InstructNER transforms the text-only NER into a multimodal NER (MNER) task. A selection process automatically identifies the best synthetic image by comparing fine-grained similarities with internet-crawled images through a visual bag-of-words strategy. Note, during the image synthesis, a cross-attention matrix between synthetic images and raw text emerges, which inspires a soft attention guidance alignment (AGA) mechanism. AGA optimizes the MNER task and concurrently facilitates instructive alignment in MNER. Empirical experiments on prominent MNER datasets show that our method surpasses all text-only baselines, improving F1-score by 1.4% to 2.3%. Remarkably, even when compared to fully multimodal baselines, our approach maintains competitive. Furthermore, we open-source a comprehensive synthetic image dataset and the code to supplement existing raw dataset. The code and datasets are available in https://github.com/Heyest/InstructNER.

Downloads

SlidesTranscript English (automatic)

Next from ACL 2024

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism
poster

A Shocking Amount of the Web is Machine Translated: Insights from Multi-Way Parallelism

ACL 2024

+2
Brian Thompson and 4 other authors

12 August 2024

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved