Much theoretical work has described the ability of transformers to represent formal languages. However, linking theoretical results to empirical performance is not straightforward due to the complex interplay between the architecture, the learning algorithm, and training data. To test whether theoretical lower bounds imply \emph{learnability} of formal languages, we turn to recent work relating transformers to $n$-gram language models (LMs). We study transformers' ability to learn random $n$-gram LMs of two kinds: ones with arbitrary next-symbol probabilities and ones where those are defined with shared parameters. We find that classic estimation techniques for $n$-gram LMs such as add-$\lambda$ smoothing outperform transformers on the former, while transformers perform better on the latter, outperforming methods specifically designed to learn $n$-gram LMs.

Can Transformer Language Models Learn $n$-gram Language Models?

Recent work by Hewitt et al. (2020) provides an interpretation of the empirical success of recurrent neural networks (RNNs) as language models (LMs). It shows that RNNs can efficiently represent bounded hierarchical structures that are prevalent in human language.This suggests that RNNs' success might be linked to their ability to model hierarchy. However, a closer inspection of \citeposs{hewitt-etal-2020-rnns} construction shows that it is not inherently limited to hierarchical structures. This poses a natural question: What other classes of LMs RNNs can efficiently represent? To this end, we generalize Hewitt et al.'s (2020) construction and show that RNNs can efficiently represent a larger class of LMs than previously claimed---specifically, those that can be represented by a pushdown automaton with a bounded stack and a specific stack update function. Altogether, the efficiency of representing this diverse class of LMs with RNN LMs suggests novel interpretations of their inductive bias.

On Efficiently Representing Regular Languages as RNNs

The performance of modern language models (LMs) has been improved by chain-of-thought (CoT) reasoning, i.e., the process of generating intermediate results that guide the model towards a final answer. A possible explanation for this improvement is that CoT reasoning extends an LM’s computational power, as RNNs and transformers with additional scratch space are known to be Turing complete. Comparing LMs to Turing machines, however, introduces a category error—Turing machines decide language membership, whereas LMs define distributions over strings. To bridge this gap, we formalize CoT reasoning in a probabilistic setting. We present several results on the representational capacity of recurrent and transformer LMs with CoT reasoning, showing that they can represent the same family of distributions over strings as probabilistic Turing machines.

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

For nearly three decades, language models derived from the $n$-gram assumption held the state of the art on the task. The key to their success lay in the application of various smoothing techniques that served to combat overfitting. However, when neural language models toppled $n$-gram models as the best performers, $n$-gram smoothing techniques became less relevant. Indeed, it would hardly be an understatement to suggest that the line of inquiry into $n$-gram smoothing techniques became dormant. This paper re-opens the role classical $n$-gram smoothing techniques may play in the age of neural language models. First, we draw a formal equivalence between label smoothing, a popular regularization technique for neural language models, and add-$\lambda$ smoothing. Second, we derive a generalized framework for converting any $n$-gram smoothing technique into a regularizer compatible with neural language models. Our empirical results find that our novel regularizers are comparable to and, indeed, sometimes outperform label smoothing on language modeling and machine translation.

The Role of n-gram Smoothing in the Age of Neural Networks

Plenty of existing work has analyzed the abilities of the transformer architecture by describing its representational capacity with formal models of computation. However, the focus so far has been on analyzing the architecture in terms of language \emph{acceptance}. We contend that this is an ill-suited problem in the study of \emph{language models} (LMs), which are definitionally \emph{probability distributions} over strings. In this paper, we focus on the relationship between transformer LMs and $n$-gram LMs, a simple and historically relevant class of language models. We show that transformer LMs using the hard or sparse attention mechanisms can exactly represent any $n$-gram LM, giving us a concrete lower bound on their probabilistic representational capacity. This provides a first step towards understanding the mechanisms that transformer LMs can use to represent probability distributions over strings.

Transformers Can Represent n-gram Language Models

The recent successes and spread of large neural language models (LMs) call for a thorough understanding of their abilities. Describing their abilities through LMs’ representational capacity is a lively area of research. Investigations of the representational capacity of neural LMs have predominantly focused on their ability to recognize formal languages. For example, recurrent neural networks (RNNs) as classifiers are tightly linked to regular languages, i.e., languages defined by finite-state automata (FSAs). Such results, however, fall short of describing the capabilities of RNN language models (LMs), which are definitionally distributions over strings. We take a fresh look at the represen- tational capacity of RNN LMs by connecting them to probabilistic FSAs and demonstrate that RNN LMs with linearly bounded precision can express arbitrary regular LMs.

On the Relationship Between Non-deterministic FSLMs and RNN LMs

Studying language models (LMs) in terms of well-understood formalisms allows us to precisely characterize their abilities and limitations.
Previous work has investigated the expressive power of recurrent neural network (RNN) LMs in terms of their capacity to recognize unweighted formal languages.
However, LMs do not describe unweighted formal languages---rather, they define probability distributions over strings.
In this work, we study what classes of such probability distributions RNN LMs can represent,
which allows us to make more direct statements about their capabilities.
We show that simple RNNs are equivalent to a subclass of probabilistic finite-state automata, and can thus model a strict subset of probability distributions expressible by finite-state models.
Furthermore, we study the space complexity of representing finite-state LMs with RNNs.
We show that, to represent an arbitrary deterministic finite-state LM with $N$ states over an alphabet $\Sigma$, an RNN requires $\Omega\left(N |\Sigma|\right)$ neurons.
These results present a first step towards characterizing the classes of distributions RNN LMs can represent and thus help us understand their capabilities and limitations.

Recurrent Neural Language Models as Probabilistic Finite-state Automata

This work investigates the computational expressivity of language models (LMs) based on recurrent neural networks (RNNs). 
Siegelmann and Sontag (1992) famously showed that RNNs with rational weights and hidden states and unbounded computation time are Turing complete. 
However, LMs define weightings over strings in addition to just (unweighted) language membership and the analysis of the computational power of RNN LMs (RLMs) should reflect this. 
We extend the Turing completeness result to the probabilistic case, showing how a rationally weighted RLM with unbounded computation time can simulate any deterministic probabilistic Turing machine (PTM) with rationally weighted transitions. 
Since, in practice, RLMs work in real-time, processing a symbol at every time step, we treat the above result as an upper bound on the expressivity of RLMs. 
We also provide a lower bound by showing that under the restriction to real-time computation, such models can simulate deterministic real-time rational PTMs.

On the Representational Capacity of Recurrent Neural Language Models

This poster session includes Main Conference posters and Findings from the following areas:

Language Modeling • Ethics, Bias, and Fairness • Discourse and Pragmatics • Multilinguality and Language Diversity • Phonology, Morphology, and Word Segmentation • Syntax: Tagging, Chunking and Parsing

In-Person Poster Session B (Riverfront Hall)

poster

## Welcome to EMNLP 2024! 
We are excited to welcome you to one of the most prominent conferences in the field of Natural Language Processing. This year, EMNLP 2024 is being held in a hybrid format,
offering both virtual and in-person participation in beautiful Miami. Due to a record-breaking number of submissions, we've expanded the total number of accepted papers to accommodate more cutting-edge research from around the globe.
### [Conference Handbook](https://drive.google.com/file/d/1WPROgxjLAC96AJL7Ugy0tEnYm7dkrbHt/view?usp=sharing)

You are required to register for this event. **Please register [here](https://2024.emnlp.org/registration/).** The EMNLP 2024 event page on Underline will be open to public one week prior to the event.

Please register!

EMNLP 2024

EMNLP 2024 will take place in Miami, Florida from Nov 12th to Nov 16th, 2024, at the Hyatt Regency Miami Hote and on Underline for remote participants.

Findings In-Person Poster Session 2

### Welcome!
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. Our Virtual Poster Sessions will take place online Thursday, August 22, 2024.

You are required to register for this event. **Please register [here](https://2024.aclweb.org/registration). **

If you have already registered, please check your inbox for an email from Underline granting you access to ACL 2024 content.

ACL 2024

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. More information will be announced soon.

This poster session includes Main Conference posters, TACL and Demos.

In-Person Poster Session 2

This poster session includes Main Conference posters and Demos from the following areas: 
Interpretability and Analysis of Models for NLP • Multilinguality and Language Diversity • Multimodality and Language Grounding to Vision, Robotics and Beyond • Semantics: Sentence-level Semantics, Textual Inference and Other areas • Summarization

In-Person Poster Session 3

## Welcome to 2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics!
This year, the conference is in Mexico City. NAACL was actually already planned for Mexico City in 2021, but due to the pandemic the entire conference was moved online. This year, finally, we get to go! So it is my sincere pleasure to welcome you to Mexico City, whether in person or virtually. Having the conference in Mexico City is a good opportunity to emphasize that NAACL is our flagship conference for ACL members not only in North America but also in Central and South America, even though NAACL has been bearing “North” in its name. At this year’s conference, we have a theme to match, with a theme track on the Languages of Latin America to showcase the linguistic diversity of the region.
 
The opportunity to present at NAACL should not depend on a researcher’s travel budget, or their family status. This is why it is so important to make virtual participation at NAACL as good an experience as possible – but we want to also provide a good experience for in-person participants. As a community, we are still working out the best way to do that. This year at NAACL, we are trying out a big virtual poster session ahead of the conference, with the hope that this will make make for a lively and interactive experience. At the same time, we are reducing virtual oral presentations, which seem to be particularly tricky to make to work well. A big thanks to the NAACL program chairs and to Luciana Benotti for all their ideas and work to improve the virtual experience. And participants, virtual as well as in-person: Please let us know what worked for you and what didn’t, so we can continue to improve hybrid conferences. 
 
I have been lucky to work with many amazing people. Without their insight, dedication and patience, and without the many hours of work they put in, NAACL would not have been possible. A huge thank you to the program chairs Helena Gomez, Kevin Duh, and Steve Bethard – you are the best! 
 
Finally, I would like to thank all authors, invited speakers and panelists, area chairs and reviewers, the volunteers organizing and chairing sessions, and all attendees, in-person and virtual. Thank you for helping us make NAACL 2024 come to life. 
 
Welcome and hope you all enjoy the conference! 
Katrin Erk 
The University of Texas at Austin 
NAACL-2024 General Chair 
*You can read the full Welcome message in the [Conference Handbook (downloadable)](https://drive.google.com/file/d/1H1NvW0VASQjkSCYgw3mr-3l4yYDSxz6B/view?usp=sharing)*

To access the event page you need to register [**here.**](http://acl.swoogo.com/naacl2024) 
Your access to the event page is limited based on your registration type. If you registered for workshops only, you will gain access to full workshops content on the day of the workshops program.

NAACL 2024

2024 Annual Conference of the North American Chapter of the Association for Computational Linguistics

**Are you attending this poster session virtually?** 
In-person printed posters are available for in-person attendees only.
 
**Are you attending this poster session in person?** 
Hybrid posters are displayed in the East Foyer.

PS2 (In-person) Posters, Demo, Industry, Findings

**Welcome to EMNLP 2023!**

On behalf of the EMNLP 2023 Organizing Committee, I extend a warm and heartfelt welcome to all of you to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). It is with immense pleasure and excitement that we gather here in Singapore, a vibrant hub of innovation and technological advancement.

The conference program is packed with insightful presentations, thought-provoking workshops, and engaging networking opportunities. In addition to the technical sessions, we have also planned several social events that will provide you with the opportunity to connect with your colleagues.

The conference is held in person at the Resorts World Convention Centre in Singapore, and available on-line with the help of Underline.

For those in Singapore, I hope that you will find time to explore the exciting Sentosa Island, Gardens by the Bay, Singapore Botanic Gardens and the many other attractions unique to Singapore.

Of course, we are grateful to our sponsors and partners for their generous support of EMNLP 2023, whose contributions make it possible for us to host this world-class event.

Yuji Matsumoto (RIKEN AIP) 
EMNLP 2023 General Chair

To access the **EMNLP 2023** event page on Underline, you need to register for the Conference. 
Please follow **[this link](https://2023.emnlp.org/registration/)** for more details.

Anej Svete

8

SHORT BIO

Presentations

Can Transformer Language Models Learn $n$-gram Language Models?

On Efficiently Representing Regular Languages as RNNs

On the Representational Capacity of Neural Language Models with Chain-of-Thought Reasoning

The Role of n-gram Smoothing in the Age of Neural Networks

Transformers Can Represent n-gram Language Models

On the Relationship Between Non-deterministic FSLMs and RNN LMs

Recurrent Neural Language Models as Probabilistic Finite-state Automata

On the Representational Capacity of Recurrent Neural Language Models

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES