The pruning objective has recently extended beyond accuracy and sparsity to robustness in language models. Despite this, existing methods struggle to enhance robustness against adversarial attacks when continually increasing model sparsity and require a retraining process. As humans step into the era of large language models, these issues become increasingly prominent. This paper proposes that the robustness of language models is proportional to the extent of pre-trained knowledge they encompass. Accordingly, we introduce a post-training pruning strategy designed to faithfully replicate the embedding space and feature space of dense language models, aiming to conserve more pre-trained knowledge during the pruning process. In this setup, each layer's reconstruction error not only originates from itself but also includes cumulative error from preceding layers, followed by an adaptive rectification. Compared to other state-of-art baselines, our approach demonstrates a superior balance between accuracy, sparsity, robustness, and pruning cost with BERT on datasets SST2, IMDB, and AGNews, marking a significant stride towards robust pruning in language models.

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models | VIDEO

Conventional wisdom in pruning Transformer-based language models is that pruning reduces the model expressiveness and thus is more likely to underfit rather than overfit. However, under the trending pretrain-and-finetune paradigm, we postulate a counter-traditional hypothesis, that is: pruning increases the risk of overfitting when performed at the fine-tuning phase. In our work, we aim to address the overfitting problem and improve pruning performance via progressive knowledge distillation with error-bound properties. We show for the first time that reducing the risk of overfitting can help the effectiveness of pruning under the pretrain-and-finetune paradigm.

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Transformer-based pre-trained language models have significantly improved the performance of various natural language processing (NLP) tasks in recent years. While effective and prevalent, these models are usually prohibitively large for resource-limited deployment scenarios. A thread of research has thus been working on applying network pruning techniques under the pretrain-then-finetune paradigm widely adopted in NLP. However, the existing pruning results on benchmark transformers, such as BERT, are not as re- markable as the pruning results in the literature of convolutional neural networks (CNNs). In particular, common wisdom in pruning CNN states that sparse pruning technique compresses a model more than that obtained by reducing the number of channels and layers (Elsen et al., 2020; Zhu and Gupta, 2017), while existing works on sparse pruning of BERT yields inferior results than its small-dense counterparts such as TinyBERT (Jiao et al., 2020). In this work, we aim to fill this gap by studying how knowledge is transferred and lost during the pre-train, fine-tune, and pruning process, and proposing a knowledge-aware sparse pruning process that achieves significantly superior results than existing literature. We show for the first time that sparse pruning compresses a BERT model significantly more than reducing its number of channels and layers. Experiments on multiple data sets of GLUE benchmark show that our method outperforms the leading competitors with a 20-times weight/FLOPs compression and neglectable loss in prediction accuracy.

Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm

Gaussian processes offer an attractive framework for predictive modeling from longitudinal data, \ie irregularly sampled, sparse observations from a set of individuals over time. However, such methods have two key shortcomings: (i) They rely on ad hoc heuristics or expensive trial and error to choose the effective kernels, and (ii) They fail to handle multilevel correlation structure in the data. We introduce Longitudinal deep kernel Gaussian process regression (L-DKGPR) to overcome these limitations by fully automating the discovery of complex multilevel correlation structure from longitudinal data.  Specifically, L-DKGPR eliminates the need for ad hoc heuristics or trial and error using a novel adaptation of deep kernel learning that combines the expressive power of deep neural networks with the flexibility of non-parametric kernel methods. L-DKGPR effectively learns the multilevel correlation with a novel additive kernel that simultaneously accommodates both time-varying and the time-invariant effects. We derive an efficient algorithm to train L-DKGPR using latent space inducing points and variational inference. Results of extensive experiments on several benchmark data sets demonstrate that L-DKGPR significantly outperforms the state-of-the-art longitudinal data analysis (LDA) methods.

Longitudinal Deep Kernel Gaussian Process Regression

Network modeling aims to learn the latent representations of nodes such that the representations preserve both network structures and node attribute information. This problem is fundamental due to its prevalence in numerous domains. However, existing approaches either target the static networks or struggle to capture the complicated temporal dependency, while most real-world networks evolve over time and the success of network modeling hinges on the understanding of how entities are temporally connected. In this paper, we present TRRN, a transformer-style relational reasoning network with dynamic memory updating, to deal with the above challenges. TRRN employs multi-head self-attention to reason over a set of memories, which provides a multitude of shortcut paths for information to flow from past observations to the current latent representations. By utilizing the policy networks augmented with differentiable binary routers, TRRN estimates the possibility of each memory being activated and dynamically updates the memories at the time steps when they are most relevant. We evaluate TRRN with the tasks of node classification and link prediction on four real temporal network datasets. Experimental results demonstrate the consistent performance gains for TRRN over the leading competitors.

Transformer-Style Relational Reasoning with Dynamic Memory Updating for Temporal Network Modeling

We consider the models of deep multi-task learning with recurrent architectures that exploit regularities across tasks to improve the performance of multiple sequence processing tasks jointly. Most existing architectures are painstakingly customized to learn task relationships for different problems, which is not flexible enough to model the dynamic task relationships and lacks generalization abilities to novel test-time scenarios. We propose multi-task recurrent modular networks (MT-RMN) that can be incorporated in any multi-task recurrent models to address the above drawbacks. MT-RMN consists of a shared encoder and multiple task-specific decoders, and recurrently operates over time. For better flexibility, it modularizes the encoder into multiple layers of sub-networks and dynamically controls the connection between these sub-networks and the decoders at different time steps, which provides the recurrent networks with varying degrees of parameter sharing for tasks with dynamic relatedness. For the generalization ability, MT-RMN aims to discover a set of generalizable sub-networks in the encoder that are assembled in different ways for different tasks. The policy networks augmented with the differentiable routers are utilized to make the binary connection decisions between the sub-networks. The experimental results on three multi-task sequence processing datasets consistently demonstrate the effectiveness of MT-RMN.

Multi-Task Recurrent Modular Networks

Modeling how human moves in the space is useful for policy-making in transportation, public safety, and public health. The human movements can be viewed as a dynamic process that human transits between states (e.g., locations) over time. In the human world where intelligent agents like humans or vehicles with human drivers play an important role, the states of agents mostly describe human activities, and the state transition is influenced by both the human decisions and physical constraints from the real-world system (e.g., agents need to spend time to move over a certain distance). Therefore, the modeling of state transition should include the modeling of the agent's decision process and the physical system dynamics. 
In this paper, we propose MoveSD to model state transition in human movement from a novel perspective, by learning the decision model and integrating the system dynamics. MoveSD learns the human movement with Generative Adversarial Imitation Learning and integrates the stochastic constraints from system dynamics in the learning process. To the best of our knowledge, we are the first to learn to model the state transition of moving agents with system dynamics. In extensive experiments on real-world datasets, we demonstrate that the proposed method can generate trajectories similar to real-world ones, and outperform the state-of-the-art methods in predicting the next location and generating long-term future trajectories.

How Do We Move: Modeling Human Movement with System Dynamics

**Are you attending this poster session virtually?** 
In-person printed posters are available for in-person attendees only.
 
**Are you attending this poster session in person?** 
Hybrid posters are displayed in the East Foyer.

PS4 (In-person) Posters, Demo, Industry, Findings

poster

**Welcome to EMNLP 2023!**

On behalf of the EMNLP 2023 Organizing Committee, I extend a warm and heartfelt welcome to all of you to the 2023 Conference on Empirical Methods in Natural Language Processing (EMNLP). It is with immense pleasure and excitement that we gather here in Singapore, a vibrant hub of innovation and technological advancement.

The conference program is packed with insightful presentations, thought-provoking workshops, and engaging networking opportunities. In addition to the technical sessions, we have also planned several social events that will provide you with the opportunity to connect with your colleagues.

The conference is held in person at the Resorts World Convention Centre in Singapore, and available on-line with the help of Underline.

For those in Singapore, I hope that you will find time to explore the exciting Sentosa Island, Gardens by the Bay, Singapore Botanic Gardens and the many other attractions unique to Singapore.

Of course, we are grateful to our sponsors and partners for their generous support of EMNLP 2023, whose contributions make it possible for us to host this world-class event.

Yuji Matsumoto (RIKEN AIP) 
EMNLP 2023 General Chair

To access the **EMNLP 2023** event page on Underline, you need to register for the Conference. 
Please follow **[this link](https://2023.emnlp.org/registration/)** for more details.

EMNLP 2023

EMNLP 2023 took place in Singapore from Dec 6th to Dec 10th, 2023.

Posters: Machine Learning for NLP

# Welcome everyone to ACL 2022!

The 60th Annual Meeting of the Association for Computational Linguistics is taking place May 22-27, 2022 as a hybrid event, in Dublin and online. We are happy to welcome all of you to this anniversary edition with an almost 50-50 in-person and virtual participation. 
The main conference program features oral presentations, in-person and virtual posters and demo sessions, a plenary session for our best paper presentations and awards, three amazing keynote events and two new initiatives of invited talks: Spotlight Talks for Young Rising Stars (STIRS) and The Next Big Idea Talks. Posters (including Findings of ACL 2022) and demos are grouped by areas for both the in-person and the virtual sessions. For the virtual component, the talks will be on Zoom and the posters and the demos will be in GatherTown. The Student Research Workshop will have an oral session and a poster session as part of Poster Session 1. The program also features eight Tutorials and 28 Workshops. 

 
We wish you a wonderful conference! 
[**The ACL 2022 Organizing Committee**](https://www.2022.aclweb.org/organisers)
 
[**Conference Handbook**](https://drive.google.com/file/d/1_BUCMfhMVrjG9E2e71aHdHeE28KSje0l/view?usp=sharing) 
[**Mini Handbook**](https://drive.google.com/file/d/1qlBKl0wzmlVF1oCeMQl3BahLd9nLP5Ce/view?usp=sharing) 
[**Posters and Demo guides**](https://drive.google.com/file/d/1UucMAoCNncIOaH1rMMDa0owuG9qgvJTG/view?usp=sharing)

ACL 2022

The Association for Computational Linguistics (ACL) is the premier international scientific and professional society for people working on computational problems involving human language, a field often referred to as either computational linguistics or natural language processing (NLP). 

**Session Chair: ** Roy Schwartz

7B-Oral: Green NLP

technical paper

2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

**Whova App** 
Stay in touch with your fellow conference attendees via the [Whova App](https://whova.com/portal/webapp/nacon_202106/)

**Conference Structure**
https://2021.naacl.org/blog/conference-structure/

**Walkthrough video of how to NAACL 2021** 

Please take a moment to view this video explaining how to navigate the platform, attend sessions network with other attendees. 


<figure class="video_container">
 <iframe src="https://screencast-o-matic.com/watch/crhwbGVh3vx?v=6&ff=1&title=0&controls=1" width=640 height=350 frameborder="0" allowfullscreen="true"> </iframe>
</figure>

NAACL 2021

Main Track

The purpose of the AAAI conference is to promote research in artificial intelligence (AI) and scientific exchange among AI researchers, practitioners, scientists, and engineers in affiliated disciplines. 

Dongkuan Xu

7

5

SHORT BIO

Presentations

Towards Robust Pruning: An Adaptive Knowledge-Retention Pruning Strategy for Language Models | VIDEO

Sparse Progressive Distillation: Resolving Overfitting under Pretrain-and-Finetune Paradigm

Rethinking Network Pruning -- under the Pre-train and Fine-tune Paradigm

Longitudinal Deep Kernel Gaussian Process Regression

Transformer-Style Relational Reasoning with Dynamic Memory Updating for Temporal Network Modeling

Multi-Task Recurrent Modular Networks

How Do We Move: Modeling Human Movement with System Dynamics

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES