Pretrained large language models (LLMs) are currently state-of-the-art for solving the vast majority of natural language processing tasks. While many real-world applications still require fine-tuning to reach satisfactory levels of performance, many of them are in the low-data regime, making fine-tuning challenging. To address this, we propose LLM2LLM, a targeted and iterative data augmentation strategy that uses a teacher LLM to enhance a small seed dataset by augmenting additional data that can be used for fine-tuning on a specific task. LLM2LLM (1) fine-tunes a baseline student LLM on the initial seed data, (2) evaluates and extracts data points that the model gets wrong, and (3) uses a teacher LLM to generate synthetic data based on these incorrect data points, which are then added back into the training data. This approach amplifies the signal from incorrectly predicted data points by the LLM during training and reintegrates them into the dataset to focus on more challenging examples for the LLM. Our results show that LLM2LLM significantly enhances the performance of LLMs in the low-data regime, outperforming both traditional fine-tuning and other data augmentation baselines. LLM2LLM reduces the dependence on labor-intensive data curation and paves the way for more scalable and performant LLM solutions, allowing us to tackle data-constrained domains and tasks. We achieve improvements up to 24.2% on the GSM8K dataset, 32.6% on CaseHOLD, 32.0% on SNIPS, 52.6% on TREC and 39.8% on SST-2 over regular fine-tuning in the low-data regime using a Llama-2-7B student model. Our code is available at https://github.com/SqueezeAILab/LLM2LLM.

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Multitask prompted finetuning (MTF) has been shown to help large language models generalize to new tasks in a zero-shot setting, but so far explorations of MTF have focused on English data and models. We apply MTF to the pretrained multilingual BLOOM and mT5 model families to produce finetuned variants called BLOOMZ and mT0. We find finetuning large multilingual language models on English tasks with English prompts allows for task generalization to non-English languages that appear only in the pretraining corpus. Finetuning on multilingual tasks with English prompts further improves performance on English and non-English tasks leading to various state-of-the-art zero-shot results. We also investigate finetuning on multilingual tasks with prompts that have been machine-translated from English to match the language of each dataset. We find training on these machine-translated prompts leads to better performance on human-written prompts in the respective languages. Surprisingly, we find models are capable of zero-shot generalization to tasks in languages they have never intentionally seen. We conjecture that the models are learning higher-level capabilities that are both task- and language-agnostic. In addition, we introduce xP3, a composite of supervised datasets in 46 languages with English and machine-translated prompts. Our code, datasets and models will be made publicly available.

<iframe src="https://app.sli.do/event/8PaBoicsuHK7beXFunvrtr/embed/polls/94e70823-eb70-4390-94de-f06fec6a08ea" width="300" height="400"></iframe>

Crosslingual Generalization through Multitask Finetuning

We demonstrate that one-layer randomly weighted neural networks contain subnetworks that can achieve impressive performance without ever modifying the weight values on machine translation tasks. To find subnetworks for one-layer randomly weighted neural networks, we apply different binary masks to the same weight matrix to generate different layers. Hidden in a one-layer randomly weighted Transformerwide/wider, we find subnetworks can achieve 29.45/17.29 BLEU on IWSLT14/WMT14. Using a fixed pre-trained embedding layer, the previously found subnetworks are smaller than, but can match 98%/92% (34.14/25.24 BLEU) the performance of a trained Transformersmall/base on IWSLT14/WMT14. Furthermore, we demonstrate the effectiveness of larger & deeper transformers, as well as the impact of different initialization methods.

What’s Hidden in a One-layer Randomly Weighted Transformer?

We demonstrate that transformers obtain impressive performance even when some of the layers are randomly initialized and never updated. Inspired by old and well-established ideas in machine learning, we explore a variety of non-linear “reservoir” layers interspersed with regular transformer layers, and show improvements in wall-clock compute time until convergence, as well as overall performance, on various machine translation and (masked) language modelling tasks.

Reservoir Transformers

In this paper we apply self-knowledge distillation to text summarization which we argue can alleviate problems with maximum-likelihood training on single reference and noisy datasets. Instead of relying on one-hot annotation labels, our student summarization model is trained with guidance from a teacher which generates smoothed labels to help regularize training. Furthermore, to better model uncertainty during training, we introduce multiple noise signals for both teacher and student models. We demonstrate experimentally on three benchmarks that our framework boosts the performance of both pretrained and non-pretrained summarizers achieving state-of-the-art results.

Noisy Self-Knowledge Distillation for Text Summarization

Incorporating second-order curvature information into machine learning optimization algorithms can be subtle, and doing so naÃ¯vely can lead to high per-iteration costs associated with forming the Hessian and performing the associated linear system solve. To address this, we introduce ADAHESSIAN, a new stochastic optimization algorithm. ADAHESSIAN directly incorporates approximate curvature information from the loss function, and it includes several novel performance-improving features, including: (i) a fast Hutchinson based method to approximate the curvature matrix with low computational overhead; (ii) a spatial averaging to reduce the variance of the second derivative; and (iii) a root-mean-square exponential moving average to smooth out variations of the second-derivative across different iterations. We perform extensive tests on NLP, CV, and recommendation system tasks, and ADAHESSIAN achieves state-of-the-art results. In particular, we find that ADAHESSIAN: (i) outperforms AdamW for transformers by0.13/0.33 BLEU score on IWSLT14/WMT14, 2.7/1.0 PPLon PTB/Wikitext-103; (ii) outperforms AdamW for Squeeze-Bert by 0.41 points on GLUE; (iii) achieves 1.45%/5.55%higher accuracy on ResNet32/ResNet18 on Cifar10/ImageNetas compared to Adam; and (iv) achieves 0.032% better score than Adagrad for DLRM on the Criteo Ad Kaggle dataset. The cost per iteration of ADAHESSIANis comparable to first-order methods, and ADAHESSIAN exhibits improved robustness towards variations in hyperparameter values. The code for ADAHESSIAN is open-sourced and publicly-available [1].

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Virtual Poster Session 1

poster

### Welcome!
The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. <br>Our Virtual Poster Sessions will take place online Thursday, August 22, 2024.<br><br><br>

You are required to register for this event. <br>**Please register [here](https://2024.aclweb.org/registration). **

If you have already registered, please check your inbox for an email from Underline granting you access to ACL 2024 content.


Please register!

ACL 2024

The 62nd Annual Meeting of the Association for Computational Linguistics (ACL 2024) will take place in Bangkok, Thailand from August 11th to 16th, 2024. More information will be announced soon.

Poster Session 1

### Welcome to ACL 2023, the 61st Annual Meeting of the Association for Computational Linguistics! 
<br>The conference will be held in Toronto, Canada, July 9-14, 2023.<br><br>
Following the succession of the recent conferences in our field, ACL 2023 will adopt a hybrid format.
While the impact of Covid has considerably diminished in terms of traveling, obtaining visas to Canada
entails a very long process. Moreover, the global economic conditions pose challenges for many individuals to travel to conferences. Recognizing these circumstances, we know many participants may not be
able to attend the conference in person. Therefore, we are committed to providing a great virtual platform
so everyone has the opportunity to interact with other participants and enjoy the conference. Based on the
current registered participants, approxiately 30% have chosen to attend the conference virtually. Whether
you join us in person or virtually, we sincerely hope everyone has a remarkable conference experience.<br>
This General Chair’s message is where I express my gratitude to the many individuals who have made
enormous contributions to the conference over the past year.

Read [**ACL 2023 General Chair's message**](https://docs.google.com/document/d/1WobYM7norbG4dI48s75HfJoD89qgX5a_F-6U8AteLSA/edit?usp=sharing/) in full.

##### **[Conference Handbook](https://2023.aclweb.org/downloads/acl2023-handbook.pdf)**
<br><br><br><br>

ACL 2023

The Association for Computational Linguistics (ACL) is the premier international scientific and professional society for people working on computational problems involving human language, a field often referred to as either computational linguistics or natural language processing.

Virtual Poster Session I: Interpretability and Analysis of Models for NLP

EMNLP 2021 is planned to be a hybrid event in Punta Cana, Dominican Republic, with both on-site and fully virtual participation possible. The experience for on-site participants would closely approximate a normal pre-COVID *ACL conference, with 5-6 thematically organized parallel sessions and live Q/A and interactive discussion immediately after the talks. Presentations by virtual participants will be equitably interleaved with those of on-site participants, projected on the auditorium screens as if on-site, and also followed immediately by live Q/A and interactive discussion at a time during reasonable waking hours for the virtual presenter. For all participants, on-site and virtual, who are unable to attend a session due to either time-zone issues or because they are participating in another session live, talk recordings and slides will be available online at a minimum after the live presentation (and in many cases before as well), and questions may be submitted in advance on session-specific discussion boards and answered live in session with the usual visual aids if desired.

<iframe style="width:700px;height:400px" src="https://online.fliphtml5.com/ebtyf/ceby/"  seamless="seamless" scrolling="no" frameborder="0" allowtransparency="true" allowfullscreen="true" ></iframe>

Please Note: The EMNLP registration system is not currently connected to the underline site as we are still in the process of building out EMNLP 2021. You will receive access instructions from underline the week of November 1st. 

Access is given only to EMNLP upon registration, if you have not registered please do so [here](https://2021.emnlp.org/registration).

Registered attendees will receive access the week of November 1st.

EMNLP 2021

EMNLP 2021 is planned to be a hybrid event in Punta Cana, Dominican Republic, with both on-site and fully virtual participation possible.

Poster 2F: Interpretability and Analysis of Models for NLP

**Welcome to ACL-IJCNLP 2021!**

The great event is jointly organized by the Association for Computational Linguistics (ACL) and Asian Federation of Natural Language Processing (AFNLP). 

As in previous years, the program of the conference includes a poster session, tutorials, workshops and demonstrations in addition to the main conference.


We were able to keep the registration fees similar to those charged for the virtual ACL 2020. The one fee allows attendance at the main conference and any/all tutorials and workshops. These fees would be $125 Regular Early and $175 Regular Late; $50 Student Early and $75 Student Late. Early registration closes at midnight July 11, 2021 (Eastern Daylight Time).

**Reminder:** It is ACL’s policy that at least one author of each accepted paper (including ACL Finding papers) must register for the conference.

**Reminder2:** Underline site will open closer to the event. If you already registered you will receive access detail

Registration is now open

IJCNLP-AACL 2021

The great event is jointly organized by the Association for Computational Linguistics (ACL) and Asian Federation of Natural Language Processing (AFNLP).

 

<i>**Session Chair: ** Fei Liu </i><br><br>

2E-Oral: Summarization

technical paper

2021 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

**Whova App** 
Stay in touch with your fellow conference attendees via the [Whova App](https://whova.com/portal/webapp/nacon_202106/)

**Conference Structure**
https://2021.naacl.org/blog/conference-structure/

**Walkthrough video of how to NAACL 2021** 

Please take a moment to view this video explaining how to navigate the platform, attend sessions network with other attendees. 


<figure class="video_container">
  <iframe src="https://screencast-o-matic.com/watch/crhwbGVh3vx?v=6&ff=1&title=0&controls=1" width=640  height=350  frameborder="0" allowfullscreen="true"> </iframe>
</figure>

<br>
<br>

NAACL 2021

Main Track

The purpose of the AAAI conference is to promote research in artificial intelligence (AI) and scientific exchange among AI researchers, practitioners, scientists, and engineers in affiliated disciplines. 

Sheng Shen

6

1

SHORT BIO

Presentations

LLM2LLM: Boosting LLMs with Novel Iterative Data Enhancement

Crosslingual Generalization through Multitask Finetuning

What’s Hidden in a One-layer Randomly Weighted Transformer?

Reservoir Transformers

Noisy Self-Knowledge Distillation for Text Summarization

ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES