Large vision-language models frequently struggle to accurately predict responses provided by multiple human annotators, particularly when those responses exhibit human uncertainty. In this study, we focus on the Visual Question Answering (VQA) task, and we comprehensively evaluate how well the the state-of-the-art vision-language model correlates with the distribution of human responses. To do so, we categorize our samples based on their levels (low, medium, high) of human uncertainty in disagreement (HUD) and employ not only accuracy but also three new human-correlated metrics for the first time in VQA, to better investigate the impact of HUD. To better align models with humans, we also verify the effect of common calibration and human calibration. Our results show that even BEiT3, currently the best model for this task, struggles to capture the multi-label distribution inherent in diverse human responses. Additionally, we observe that the commonly used accuracy-oriented calibration technique adversely affects BEiT3's ability to capture HUD, further widening the gap between model predictions and human distributions. In contrast, we show the benefits of calibrating models towards human distributions for VQA, better aligning model confidence with human uncertainty. Our findings highlight that for VQA, the consistent alignment between human responses and model predictions is understudied and should become the next crucial target of future studies.

Mind the Uncertainty in Human Disagreement: Evaluating Discrepancies Between Model Predictions and Human Responses in VQA

One of the major aspects contributing to the striking performance of large language models (LLMs) is the vast amount of factual knowledge accumulated during pre-training. Yet, many LLMs suffer from self-inconsistency, which raises doubts about their trustworthiness and reliability. This paper focuses on entity type ambiguity, analyzing the proficiency and consistency of state-of-the-art LLMs in applying factual knowledge when prompted with ambiguous entities. To do so, we propose an evaluation protocol that disentangles knowing from applying knowledge, and test state-of-the-art LLMs on 49 ambiguous entities. Our experiments reveal that LLMs struggle with choosing the correct entity reading, achieving an average accuracy of only 85%, and as low as 75% with underspecified prompts. The results also reveal systematic discrepancies in LLM behavior, showing that while the models may possess knowledge, they struggle to apply it consistently, exhibit biases toward preferred readings, and display self-inconsistencies. This highlights the need to address entity ambiguity in the future for more trustworthy LLMs.

To Know or Not To Know? Analyzing Self-Consistency of Large Language Models under Ambiguity

Knights and knaves problems represent a classic genre of logical puzzles where characters either tell the truth or lie. The objective is to logically deduce each character's identity based on their statements. The challenge arises from the truth-telling or lying behavior, which influences the logical implications of each statement. Solving these puzzles requires not only direct deductions from individual statements, but the ability to assess the truthfulness of statements by reasoning through various hypothetical scenarios. As such, knights and knaves puzzles serve as compelling examples of suppositional reasoning. In this paper, we introduce $\textit{TruthQuest}$, a benchmark for suppositional reasoning based on the principles of knights and knaves puzzles. Our benchmark presents problems of varying complexity, considering both the number of characters and the types of logical statements involved. Evaluations on $\textit{TruthQuest}$ show that large language models like Llama 3 and Mixtral-8x7B exhibit significant difficulties solving these tasks. A detailed error analysis of the models' output reveals that lower-performing models exhibit a diverse range of reasoning errors, frequently failing to grasp the concept of truth and lies. In comparison, more proficient models primarily struggle with accurately inferring the logical implications of potentially false statements.

Liar, Liar, Logical Mire: A Benchmark for Suppositional Reasoning in Large Language Models

Recent advances in Large Language Models (LLMs) have sparked wide interest in validating and comprehending the human-like cognitive-behavioral traits LLMs may capture and convey. These cognitive-behavioral traits include typically Attitudes, Opinions, Values (AOVs). However, measuring AOVs embedded within LLMs remains opaque, and different evaluation methods may yield different results. This has led to a lack of clarity on how different studies are related to each other and how they can be interpreted. This paper aims to bridge this gap by providing a comprehensive overview of recent works on the evaluation of AOVs in LLMs. Moreover, we survey related approaches in different stages of the evaluation pipeline in these works. By doing so, we address the potential and challenges with respect to understanding the model, human-AI alignment, and downstream application in social sciences. Finally, we provide practical insights into evaluation methods, model enhancement, and interdisciplinary collaboration, thereby contributing to the evolving landscape of evaluating AOVs in LLMs.

The Potential and Challenges of Evaluating Attitudes, Opinions, and Values in Large Language Models

Human label variation (HLV) is a valuable source of information that arises when multiple human annotators provide different labels for valid reasons. In Natural Language Inference (NLI) earlier approaches to capturing HLV involve either collecting annotations from many crowd workers to represent human judgment distribution (HJD) or use expert linguists to provide detailed explanations for their chosen labels. While the former method provides denser HJD information, obtaining it is resource-intensive. In contrast, the latter offers richer textual information but it is challenging to scale up to many human judges. Besides, large language models (LLMs) are increasingly used as evaluators ("LLM judges") but with mixed results, and few works aim to study HJDs. This study proposes to exploit LLMs to approximate HJDs using a small number of expert labels and explanations. Our experiments show that a few explanations significantly improve LLMs' ability to approximate HJDs with and without explicit labels, thereby providing a solution to scale up annotations for HJD. However, fine-tuning smaller soft-label aware models with the LLM-generated model judgment distributions (MJDs) presents partially inconsistent results: while similar in distance, their resulting fine-tuned models and visualized distributions differ substantially. We show the importance of complementing instance-level distance measures with a global-level shape metric and visualization to more effectively evaluate MJDs against human judgment distributions.

Seeing the Big through the Small: Can LLMs Approximate Human Judgment Distributions on NLI from a Few Explanations?

The open-ended nature of language generation makes the evaluation of autoregressive large language models (LLMs) challenging. One common evaluation approach uses multiple-choice questions to limit the response space. The model is then evaluated by ranking the candidate answers by the log probability of the first token prediction. However, first-tokens may not consistently reflect the final response output, due to model's diverse response styles such as starting with "Sure" or refusing to answer. Consequently, first-token evaluation is not indicative of model behaviour when interacting with users. But by how much? We evaluate how aligned first-token evaluation is with the text output along several dimensions, namely final option choice, refusal rate, choice distribution and robustness under prompt perturbation. Our results show that the two approaches are severely misaligned \emph{on all dimensions}, reaching mismatch rates over 60\%. Models heavily fine-tuned on conversational or safety data are especially impacted. Crucially, models remain misaligned even when we increasingly constrain prompts, i.e., force them to start with an option letter or example template. Our findings i) underscore the importance of inspecting the text output as well and ii) caution against relying solely on first-token evaluation.

"My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models

Natural language processing (NLP) has largely focused on modelling standardized languages. More recently, attention has increasingly shifted to local, non-standardized languages and dialects. However, the relevant speaker populations' needs and wishes with respect to NLP tools are largely unknown. In this paper, we focus on dialects and regional languages related to German – a group of varieties that is heterogeneous in terms of prestige and standardization. We survey speakers of these varieties (N=327) and present their opinions on hypothetical language technologies for their dialects. Although attitudes vary among subgroups of our respondents, we find that respondents are especially in favour of potential NLP tools that work with dialectal input (especially audio input) such as virtual assistants, and less so for applications that produce dialectal output such as machine translation or spellcheckers.

What Do Dialect Speakers Want? A Survey of Attitudes Towards Language Technology for German Dialects

Deductive reasoning plays a pivotal role in the formulation of sound and cohesive arguments. It allows individuals to draw conclusions that logically follow, given the truth value of the information provided. Recent progress in the domain of large language models (LLMs) has showcased their capability in executing deductive reasoning tasks. Nonetheless, a significant portion of research primarily assesses the accuracy of LLMs in solving such tasks, often overlooking a deeper analysis of their reasoning behavior. In this study, we draw upon principles from cognitive psychology to examine inferential strategies employed by LLMs, through a detailed evaluation of their responses to propositional logic problems. Our findings indicate that LLMs display reasoning patterns akin to those observed in humans, including strategies like $\textit{supposition following}$ or $\textit{chain construction}$. Moreover, our research demonstrates that the architecture and scale of the model significantly affect its preferred method of reasoning, with more advanced models tending to adopt strategies more frequently than less sophisticated ones. Importantly, we assert that a model's accuracy, that is the correctness of its final conclusion, does not necessarily reflect the validity of its reasoning process. This distinction underscores the necessity for more nuanced evaluation procedures in the field.

Comparing Inferential Strategies of Humans and Large Language Models in Deductive Reasoning

With the aim of improving the state-of-the-art (SOTA) on a target task, a standard strategy in Natural Language Processing (NLP) research is to design a new model, or modify the existing SOTA, and then benchmark its performance on the target task. We argue in favor of enriching this chain of actions by a preliminary error-guided analysis: First, explore weaknesses by analyzing the hard cases where the existing model fails, and then target the improvement based on those. Interpretable evaluation has received little attention for structured prediction tasks. Therefore we propose the first in-depth analysis suite for Relation Classification (RC), and show its effectiveness through a case study. We propose a set of potentially influential attributes to focus on (e.g., entity distance, sentence length). Then, we bucket our datasets based on these attributes, and weight the importance of them through correlations. This allows us to identify highly challenging scenarios for the RC model. By exploiting the findings of our analysis, with a carefully targeted adjustment to our architecture, we effectively improve the performance over the baseline by >3 Micro-F1.

What's wrong with your model? A Quantitative Analysis of Relation Classification

We introduce Universal NER (UNER), an open, community-driven project to develop gold-standard NER benchmarks in many languages. The overarching goal of UNER is to provide high-quality, cross-lingually consistent annotations to facilitate and standardize multilingual NER research. UNER v1 contains 19 datasets annotated with named entities in a cross-lingual consistent schema across 13 diverse languages. In this paper, we detail the dataset creation and composition of UNER; we also provide initial modeling baselines on both in-language and cross-lingual learning settings. We will release the data, code, and fitted models to the public.

Universal NER: A Gold-Standard Multilingual Named Entity Recognition Benchmark

Mainstream cross-lingual task-oriented dialogue (ToD) systems leverage the transfer learning paradigm by training a joint model for intent recognition and slot-filling in English and applying it, zero-shot, to other languages.
We address a gap in prior research, which often overlooked the transfer to lower-resource colloquial varieties due to limited test data.
Inspired by prior work on English varieties, we craft and manually evaluate perturbation rules that transform German sentences into colloquial forms and use them to synthesize test sets in four ToD datasets.
Our perturbation rules cover 18 distinct language phenomena, enabling us to explore the impact of each perturbation on slot and intent performance.
Using these new datasets, we conduct an experimental evaluation across six different transformers.
Here, we demonstrate that when applied to colloquial varieties, ToD systems maintain their intent recognition performance, losing 6% (4.62 percentage points) in accuracy on average. 
However, they exhibit a significant drop in slot detection, with a decrease of 31% (21 percentage points) in slot F$_1$ score.
Our findings are further supported by a transfer experiment from Standard American English to synthetic Urban African American Vernacular English.

Exploring the Robustness of Task-oriented Dialogue Systems for Colloquial German Varieties

In Natural Language Processing, entity linking (EL) has centered around Wikipedia, but yet remains underexplored for the job market domain. Disambiguating skill mentions can help us get insight into the current labor market demands. In this work, we are the first to explore EL in this domain, specifically targeting the linkage of occupational skills to the ESCO taxonomy (le Vrang et al., 2014). Previous efforts linked coarse-grained (full) sentences to a corresponding ESCO skill. In this work, we link more fine-grained span-level mentions of skills. We tune two high-performing neural EL models, a bi-encoder (Wu et al., 2020) and an autoregressive model (Cao et al., 2021), on a synthetically generated mention--skill pair dataset and evaluate them on a human-annotated skill-linking benchmark. Our findings reveal that both models are capable of linking implicit mentions of skills to their correct taxonomy counterparts. Empirically, BLINK outperforms GENRE in strict evaluation, but GENRE performs better in loose evaluation (accuracy@k).\footnote{The source code can be found at \url{https://anonymous.4open.science/r/el_esco-E629}}

Entity Linking in the Job Market Domain

The labor market is changing rapidly, prompting increased interest in the automatic extraction of occupational skills from text. With the advent of English benchmark job description datasets, there is a need for systems that handle their diversity well. We tackle the complexity in occupational skill datasets tasks---combining and leveraging multiple datasets for skill extraction, to identify rarely observed skills within a dataset, and overcoming the scarcity of skills across datasets. In particular, we investigate the retrieval-augmentation of language models, employing an external datastore for retrieving similar skills in a dataset-unifying manner. Our proposed method, \textbf{N}earest \textbf{N}eighbor \textbf{O}ccupational \textbf{S}kill \textbf{E}xtraction (NNOSE) effectively leverages multiple datasets by retrieving neighboring skills from other datasets in the datastore. This improves skill extraction \emph{without} additional fine-tuning. Crucially, we observe a performance gain in predicting infrequent patterns, with substantial gains of up to 30\% span-F1 in cross-dataset settings.

NNOSE: Nearest Neighbor Occupational Skill Extraction

With the rise of increasingly powerful and user-facing NLP systems, there is growing interest in assessing whether they have a good _representation of uncertainty_ by evaluating the quality of their predictive distribution over outcomes. We identify two main perspectives that drive starkly different evaluation protocols. The first treats predictive probability as an indication of model confidence; the second as an indication of human label variation. We discuss their merits and limitations, and take the position that both are crucial for trustworthy and fair NLP systems, but that exploiting a single predictive distribution is limiting. We recommend tools and highlight exciting directions towards models with disentangled representations of uncertainty about predictions and uncertainty about human labels.

Interpreting Predictive Probabilities: Model Confidence or Human Label Variation?

In Natural Language Generation (NLG) tasks, for any input, multiple communicative goals are plausible, and any goal can be put into words, or produced, in multiple ways. We characterise the extent to which human production varies lexically, syntactically, and semantically across four NLG tasks, connecting human production variability to aleatoric or data uncertainty. We then inspect the space of output strings shaped by a generation system's predicted probability distribution and decoding algorithm to probe its uncertainty. For each test input, we measure the generator's calibration to human production variability. Following this instance-level approach, we analyse NLG models and decoding strategies, demonstrating that probing a generator with multiple samples and, when possible, multiple references, provides the level of detail necessary to gain understanding of a model's representation of uncertainty.

What Comes Next? Evaluating Uncertainty in Neural Text Generators Against Human Production Variability

Representational spaces learned via language modeling are fundamental to Natural Language Processing (NLP), however there has been limited understanding regarding how and when during training various types of linguistic information emerge and interact. Leveraging a novel information theoretic probing suite, which enables direct comparisons of not just task performance, but their representational subspaces, we analyze nine tasks covering syntax, semantics and reasoning, across 2M pre-training steps and five seeds. We identify critical learning phases across tasks and time, during which subspaces emerge, share information, and later disentangle to specialize. Across these phases, syntactic knowledge is acquired rapidly after 0.5% of full training. Continued performance improvements primarily stem from the acquisition of open-domain knowledge, while semantics and reasoning tasks benefit from later boosts to long-range contextualization and higher specialization. Measuring cross-task similarity further reveals that linguistically related tasks share information throughout training, and do so more during the critical phase of learning than before or after. Our findings have implications for model interpretability, multi-task learning, and learning from limited data.

Subspace Chronicles: How Linguistic Information Emerges, Shifts and Interacts during Language Model Training

Poster Session 2

poster

We are pleased to announce the Thirty-Ninth AAAI Conference on Artificial Intelligence (AAAI-25), which will be held in Philadelphia, Pennsylvania at the Pennsylvania Convention Center from February 25 to March 4, 2025.

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.

### [Invited Speakers](https://aaai.org/conference/aaai/aaai-25/aaai-25-invited-speakers/)

Register [here](https://aaai.org/conference/aaai/aaai-25/registration/)

AAAI 2025

The purpose of the AAAI conference series is to promote research in Artificial Intelligence (AI) and foster scientific exchange between researchers, practitioners, scientists, students, and engineers across the entirety of AI and its affiliated disciplines. AAAI-25 will feature technical paper presentations, special tracks, invited speakers, workshops, tutorials, poster sessions, senior member presentations, competitions, and exhibit programs, and a range of other activities to be announced.



This poster session includes Main Conference posters and Findings from the following areas:

Ethics, Bias, and Fairness • Interpretability and Analysis of Models for NLP

In-Person Poster Session F (Jasmine)

## Welcome to EMNLP 2024! 
We are excited to welcome you to one of the most prominent conferences in the field of Natural Language Processing. This year, EMNLP 2024 is being held in a hybrid format,
offering both virtual and in-person participation in beautiful Miami. Due to a record-breaking number of submissions, we've expanded the total number of accepted papers to accommodate more cutting-edge research from around the globe.
### [Conference Handbook](https://drive.google.com/file/d/1WPROgxjLAC96AJL7Ugy0tEnYm7dkrbHt/view?usp=sharing)

You are required to register for this event. **Please register [here](https://2024.emnlp.org/registration/).** The EMNLP 2024 event page on Underline will be open to public one week prior to the event.

Please register!

EMNLP 2024

EMNLP 2024 will take place in Miami, Florida from Nov 12th to Nov 16th, 2024, at the Hyatt Regency Miami Hote and on Underline for remote participants.

This poster session includes Main Conference posters and Findings from the following areas:

Speech Processing and Spoken Language Understanding •  Resources and Evaluation • Human-centered NLP • NLP Applications

In-Person Poster Session D (Riverfront Hall)

Findings In-Person Poster Session 3

### Welcome!
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. Our Virtual Poster Sessions will take place online Thursday, August 22, 2024.

You are required to register for this event. **Please register [here](https://2024.aclweb.org/registration). **

If you have already registered, please check your inbox for an email from Underline granting you access to ACL 2024 content.

ACL 2024

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. More information will be announced soon.

This poster session includes Main Conference posters, TACL and CL , Demos and SRW papers.

In-Person Poster Session 3

This poster session includes Main Conference posters, TACL and Demos.

In-Person Poster Session 2

<head>
	<style>
		.b1 {
			color: #ffffff;
			background-color: #5b1cf4;
			font-size: 19px;
			font-family: system-ui, sans-serif;
			border-radius: 15px;
			padding: 8px 40px;
			cursor: pointer
		}
		.b1:hover {
			color: #5b1cf4;
			background-color: #ffffff;
		}
	</style>
</head>
<button name="button" class="b1" onclick="window.open('https://zoom.us/rec/play/NNqoKP-Gtep_ykqn4s0k410Vs8cd7E93J1Fjfh-zFfZ0L9QviyAu3hrgHuC37Qasayr1uzaETc3JPFH-.z6F6LJHgO177ZLoX?autoplay=true')">Live Session Recording 1</button>
 
<button name="button" class="b1" onclick="window.open('https://zoom.us/rec/play/Pm3qwGk5pSrSRtlAfCb3unAJec-MtswpsOwHmmfKvpc2LNRv5eAMHi0oVg0jxyDeZF7s1zbSvSqXkSsE.IRhi2HDSoFfUUJJx?autoplay=true')">Live Session Recording 2</button>
 

**About this workshop:** SEM brings together researchers interested in the semantics of (many and diverse!) natural languages and its computational modeling. The conference embraces data-driven, neural, and probabilistic approaches, as well as symbolic approaches and everything in between; practical applications as well as theoretical contributions are welcome. The long-term goal of SEM is to provide a stable forum for the growing number of NLP researchers working on all aspects of semantics of (many and diverse!) natural languages. [**Read more**](https://sites.google.com/view/starsem2024) [**Schedule**](https://sites.google.com/view/starsem2024/schedule) 
[![](https://assets.underline.io/markdown_image/1/image/12613369ab2f6dd75393ffa8794ccdd3.png)](https://app.gather.town/app/xZsGnDNvfEKqoBmF/NAACL%202024%20Workshops?spawnToken=oX98M2y1RZu1Phsqx6AX)

W1: *SEM 2024: The 13th Joint Conference on Lexical and Computational Semantics (SEM)

workshop paper

## Welcome to 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics!
This year, the conference is in Mexico City. NAACL was actually already planned for Mexico City in 2021, but due to the pandemic the entire conference was moved online. This year, finally, we get to go! So it is my sincere pleasure to welcome you to Mexico City, whether in person or virtually. Having the conference in Mexico City is a good opportunity to emphasize that NAACL is our flagship conference for ACL members not only in North America but also in Central and South America, even though NAACL has been bearing “North” in its name. At this year’s conference, we have a theme to match, with a theme track on the Languages of Latin America to showcase the linguistic diversity of the region.
 
The opportunity to present at NAACL should not depend on a researcher’s travel budget, or their family status. This is why it is so important to make virtual participation at NAACL as good an experience as possible – but we want to also provide a good experience for in-person participants. As a community, we are still working out the best way to do that. This year at NAACL, we are trying out a big virtual poster session ahead of the conference, with the hope that this will make make for a lively and interactive experience. At the same time, we are reducing virtual oral presentations, which seem to be particularly tricky to make to work well. A big thanks to the NAACL program chairs and to Luciana Benotti for all their ideas and work to improve the virtual experience. And participants, virtual as well as in-person: Please let us know what worked for you and what didn’t, so we can continue to improve hybrid conferences. 
 
I have been lucky to work with many amazing people. Without their insight, dedication and patience, and without the many hours of work they put in, NAACL would not have been possible. A huge thank you to the program chairs Helena Gomez, Kevin Duh, and Steve Bethard – you are the best! 
 
Finally, I would like to thank all authors, invited speakers and panelists, area chairs and reviewers, the volunteers organizing and chairing sessions, and all attendees, in-person and virtual. Thank you for helping us make NAACL 2024 come to life. 
 
Welcome and hope you all enjoy the conference! 
Katrin Erk 
The University of Texas at Austin 
NAACL-2024 General Chair 
*You can read the full Welcome message in the [Conference Handbook (downloadable)](https://drive.google.com/file/d/1H1NvW0VASQjkSCYgw3mr-3l4yYDSxz6B/view?usp=sharing)*

To access the event page you need to register [**here.**](http://acl.swoogo.com/naacl2024) 
Your access to the event page is limited based on your registration type. If you registered for workshops only, you will gain access to full workshops content on the day of the workshops program.

NAACL 2024

2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics

This poster session includes Main Conference posters and Demos from the following areas: 
Discourse and Pragmatics • Generation • Machine Translation • Resources and Evaluations and Evaluation • Special Theme: Languages of Latin America

In-Person Poster and Demo Session E

Welcome to the 18th Conference of the European Chapter of the Association for Computational Linguistics (EACL). Continuing its mission of expanding and involving the science community of all European countries, EACL has selected the Malta community for the 18th EACL. Considering the importance of physical interaction among researchers, the conference will be held at the Hotel Radisson Blu, St. Julians, in Malta, from 17 to 22 of March, 2024. The conference will also feature the possibility to participate virtually. As the flagship European conference in the field of computational linguistics, EACL welcomes European and international researchers covering a broad spectrum of research areas that are concerned with computational approaches to natural language. 
#### [Conference Handbook](https://drive.google.com/file/d/13Wn38q0yev6U3RTgnf4Tn5T9GuAuZ-xW/view?usp=sharing) *downloadable*

To access the EACL 2024 event page you need to register [**here**](https://acl.swoogo.com/EACL2024). 

EACL 2024

As the flagship European conference in the field of computational linguistics, EACL welcomes European and international researchers covering a broad spectrum of research areas that are concerned with computational approaches to natural language.

In-Person Poster and Demo Session D

NLP Applications

technical paper

In-Person Poster and Demo Session B

Interpretability, Interactivity, and Analysis of Models for NLP 2

**Welcome to EMNLP 2023!**

On behalf of the EMNLP 2023 Organizing Committee, I extend a warm and heartfelt welcome to all of you to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). It is with immense pleasure and excitement that we gather here in Singapore, a vibrant hub of innovation and technological advancement.

The conference program is packed with insightful presentations, thought-provoking workshops, and engaging networking opportunities. In addition to the technical sessions, we have also planned several social events that will provide you with the opportunity to connect with your colleagues.

The conference is held in person at the Resorts World Convention Centre in Singapore, and available on-line with the help of Underline.

For those in Singapore, I hope that you will find time to explore the exciting Sentosa Island, Gardens by the Bay, Singapore Botanic Gardens and the many other attractions unique to Singapore.

Of course, we are grateful to our sponsors and partners for their generous support of EMNLP 2023, whose contributions make it possible for us to host this world-class event.

Yuji Matsumoto (RIKEN AIP) 
EMNLP 2023 General Chair

To access the **EMNLP 2023** event page on Underline, you need to register for the Conference. 
Please follow **[this link](https://2023.emnlp.org/registration/)** for more details.

EMNLP 2023

EMNLP 2023 took place in Singapore from Dec 6th to Dec 10th, 2023.

**Are you attending this poster session virtually?** 
In-person printed posters are available for in-person attendees only.
 
**Are you attending this poster session in person?** 
Hybrid posters are displayed in the East Foyer.

Barbara Plank

26

22

1

SHORT BIO

Presentations