Thailand

Large language models (LLMs) excel in various tasks but are primarily trained on text data, limiting their application scope. Expanding LLM capabilities to include vision-language understanding is vital, yet training them on multimodal data from scratch is challenging and costly. Existing instruction tuning methods, e.g., LLAVA, often connects a pretrained CLIP vision encoder and LLMs via fully fine-tuning LLMs to bridge the modality gap. However, full fine-tuning is plagued by catastrophic forgetting, i.e., forgetting previous knowledge, and high training costs particularly in the era of increasing tasks and modalities. To solve this issue, we introduce MoExtend, an effective framework designed to streamline the modality adaptation and extension of Mixture-of-Experts (MoE) models. MoExtend seamlessly integrates new experts into pre-trained MoE models, endowing them with novel knowledge without the need to tune pretrained models such as MoE and vision encoders. This approach enables rapid adaptation and extension to new modal data or tasks, effectively addressing the challenge of accommodating new modalities within LLMs. Furthermore, MoExtend avoids tuning pretrained models, thus mitigating the risk of catastrophic forgetting. Experimental results demonstrate the efficacy and efficiency of MoExtend in enhancing the multimodal capabilities of LLMs, contributing to advancements in multimodal AI research.

ACL 2024

MoExtend: Tuning New Experts for Modality and Task Extension

modality extension

poster

### Welcome!
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. Our Virtual Poster Sessions will take place online Thursday, August 22, 2024.

You are required to register for this event. **Please register [here](https://2024.aclweb.org/registration). **

If you have already registered, please check your inbox for an email from Underline granting you access to ACL 2024 content.

Please register!

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. More information will be announced soon.

Collaborative argumentation holds significant potential for enhancing students' learning outcomes within classroom settings. Consequently, researchers have explored the application of artificial intelligence (AI) to automatically analyze argumentation in these contexts. Despite the remarkable performance of deep learning models in this task, their lack of interpretability poses a critical challenge, leading to teachers' skepticism and limited utilization. To cultivate trust among teachers, this PhD thesis proposal aims to leverage explainable AI techniques to provide explanations for these deep learning models. Specifically, the study develops two deep learning models for automated analysis of argument moves (claim, evidence, and warrant) and specificity levels (low, medium, and high) within collaborative argumentation. To address the interpretability issue, four explainable AI methods are proposed: gradient sensitivity, gradient input, integrated gradient, and LIME. Computational experiments demonstrate the efficacy of these methods in elucidating model predictions by computing word contributions, with LIME delivering exceptional performance. Moreover, a quasi-experiment is designed to evaluate the impact of model explanations on user trust and knowledge, serving as a future study of this PhD proposal. By tackling the challenges of interpretability and trust, this PhD thesis proposal aims to contribute to fostering user trust in AI and facilitating the practical implementation of AI in educational contexts.

On the Interpretability of Deep Learning Models for Collaborative Argumentation Analysis in Classrooms

Acquiring large-scale parallel corpora is crucial for NLP tasks such as Neural Machine Translation, and web crawling has become a popular methodology for this purpose. Previous studies have been conducted based on sentence unit division (SUD) when aligning documents in various languages which are obtained through web crawling. Among them, the TK-PERT method achieved state-of-the-art results and well addressed the boilerplate text in web crawling data through a down-weighting approach. However, there remains a problem about how to better handle long-text encoding. Thus, we introduce the strategy of Overlapping Fixed-Length Segmentation (OFLS) in place of SUD, and observe a pronounced enhancement when performing the same approach for document alignment. In this paper, we compare the SUD and OFLS using three previous methods, Mean-Pool, TK-PERT, and Optimal Transport, on the WMT16 document alignment shared task for French-English, as well as on our self-established Japanese-English dataset MnRN. As a result, for the WMT16 task, various SUD based methods showed an increase in recall by 1\% ~ 10\% after reproduction with OFLS. For MnRN data, OFLS demonstrated significant improvements in accuracy and also exhibited faster document embedding speed.

Document Alignment based on Overlapping Fixed-Length Segments

Providing example sentences that are diverse and aligned with learners' proficiency levels is essential for fostering effective language acquisition. This study examines the use of Pre-trained Language Models (PLMs) to produce example sentences targeting L2 Japanese learners. We utilize PLMs in two ways: as quality scoring components in a retrieval system that draws from a newly curated corpus of Japanese sentences, and as direct sentence generators using zero-shot learning. We evaluate the quality of sentences by considering multiple aspects such as difficulty, diversity, and naturalness, with a panel of raters consisting of learners of Japanese, native speakers -- and GPT-4. Our findings suggest that there is inherent disagreement among participants on the ratings of sentence qualities, except for difficulty. Despite that, the retrieval approach was preferred by all evaluators, especially for beginner and advanced target proficiency, while the generative approaches received lower scores on average. Even so, our experiments highlight the potential for using PLMs to enhance the adaptability of sentence suggestion systems and therefore improve the language learning journey.

Automatically Suggesting Diverse Example Sentences for L2 Japanese Learners Using Pre-Trained Language Models

Coreference Resolution (CR) and Zero Pronoun Resolution (ZPR) are vital for extracting meaningful information from text. However, limited research and datasets pose significant challenges in Thai language. To address this, we developed an annotated joint CR and ZPR dataset. Additionally, we introduced the Z-coref model, capable of simultaneously handling CR and ZPR tasks by adjusting the span definition of a prior CR architecture to include token gaps. The proposed model trained on our dataset outperformed the state-of-the-art in resolving both coreference resolution and zero-pronoun resolution, while taking less time to train.

Z-coref: Thai Coreference and Zero Pronoun Resolution

Large Language Models (LLMs) have significant potential for facilitating intelligent end-user applications in healthcare. However, hallucinations remain an inherent problem with LLMs, making it crucial to address this issue with extensive medical knowledge and data. In this work, we propose a Retrieve-and-Medically-Augmented-Generation with Knowledge Reduction (ReMAG-KR) pipeline, employing a carefully curated knowledge base using cross-encoder re-ranking strategies. The pipeline is tested on medical MCQ-based QA datasets as well as general QA datasets. It was observed that when the knowledge base is reduced, the model's performance decreases by 2-8%, while the inference time improves by 47%.

ReMAG-KR: Retrieval and Medically Assisted Generation with Knowledge Reduction for Medical Question Answering

Retrieving relevant plots from the book for a query is a critical task, which can improve the reading experience and efficiency of readers. Readers usually only give an abstract and vague description as the query based on their own understanding, summaries, or speculations of the plot, which requires the retrieval model to have a strong ability to estimate the abstract semantic associations between the query and candidate plots. However, existing information retrieval (IR) datasets cannot reflect this ability well. In this paper, we propose PlotRetrieval, a labeled dataset to train and evaluate the performance of IR models on the novel task Plot Retrieval. Text pairs in PlotRetrieval have less word overlap and more abstract semantic association, which can reflect the ability of the IR models to estimate the abstract semantic association, rather than just traditional lexical or semantic matching. Extensive experiments across various lexical retrieval, sparse retrieval, dense retrieval, and cross-encoder methods compared with human studies on PlotRetrieval show current IR models still struggle in capturing abstract semantic association between texts. PlotRetrieval can be the benchmark for further research on the semantic association modeling ability of IR models.

Plot Retrieval as an Assessment of Abstract Semantic Association

Recent advancements in multilingual models for automatic speech recognition (ASR) have been able to achieve a high accuracy for languages with extremely limited resources. This study examines ASR modeling for the Mvskoke language, an indigenous language of America. The parameter efficiency of adapter training is contrasted with training entire models, and it is demonstrated how performance varies with different amounts of data. Additionally, the models are evaluated with trigram language model decoding, and the outputs are compared across different types of speech recordings. Results show that training an adapter is both parameter efficient and gives higher accuracy for a relatively small amount of data.

Fine-Tuning ASR models for Very Low-Resource Languages: A Study on Mvskoke

Prompt-based learning has shown its effectiveness in few-shot text classification. A key factor in its success is a verbalizer, which translates output from a language model into a predicted class. Notably, the simplest and widely acknowledged verbalizer employs manual labels to represent the classes. However, manual selection may not yield the optimal words for a given language model, potentially leading to subpar classification performance, especially in mid-to-low resource languages with weaker language models. Therefore, we propose Label-Aware Automatic Verbalizer (LAAV), effectively augmenting manual labels for improved few-shot classification results. Specifically, we utilize the label name along with the conjunction "and" to induce the model to generate more effective words for the verbalizer. Experimental results on four mid-to-low resource Southeast Asian languages demonstrate that LAAV significantly outperforms existing verbalizers.

Label-Aware Automatic Verbalizer for Few-Shot Text Classification in Mid-To-Low Resource Languages

Multiword Expressions (MWEs) make a good case study for linguistic diversity due to their idiosyncratic nature. Defining MWE canonical forms as types, may be measured notably through disparity, based on pairwise distances between types. To this aim, we train static MWE-aware word embeddings for verbal MWEs in 14 languages, and we show interesting properties of these vector spaces. Then use these vector spaces to implement the so-called functional diversity measure. We apply this measure to the results of several MWE identification systems. We find that, although MWE vector spaces are meaningful at a local scale, the disparity measure aggregating them at a global scale strongly correlates with the number of types, which questions its usefulness in presence of simpler diversity metrics such as variety. We make the vector spaces we generated available.

Vector Spaces for Quantifying Disparity of Multiword Expressions in Annotated Text

Any report frames issues to favor a particular interpretation by highlighting or excluding certain aspects of a story. Despite the widespread use of framing in disinformation, framing properties and detection methods remain underexplored outside the English-speaking world. We explore how multilingual framing of the same issue differs systematically. We use eight years of Russia-backed disinformation campaigns, spanning 8k news articles in 4 languages targeting 15 countries. We find that disinformation campaigns consistently and intentionally favor specific framing, depending on the target language of the audience. We further discover how Russian-language articles consistently highlight selected frames depending on the region of the media coverage. We find that the two most prominent models for automatic frame analysis underperform and show high disagreement, highlighting the need for further research.

Downloads

Next from ACL 2024

On the Interpretability of Deep Learning Models for Collaborative Argumentation Analysis in Classrooms

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from ACL 2024

On the Interpretability of Deep Learning Models for Collaborative Argumentation Analysis in Classrooms

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads