Ireland

The crystallization of modeling methods around the Transformer architecture has been a boon for practitioners. Simple, well-motivated architectural variations that transfer across tasks and scale, increasing the impact of modeling research. However, with the emergence of state-of-the-art 100B+ parameters models, large language models are increasingly expensive to accurately design and train. Notably, it can be difficult to evaluate how modeling decisions may impact emergent capabilities, given that these capabilities arise mainly from sheer scale.Targeting a multilingual language model in the 100B+ parameters scale, our goal is to identify an architecture and training setup that makes the best use of our 1,000,000 A100-GPU-hours budget. Specifically, we perform an ablation study comparing different modeling practices and their impact on zero-shot generalization. We perform all our experiments on 1.3B models, providing a compromise between compute costs and the likelihood that our conclusions will hold for the target 100B+ model. In addition, we study the impact of various popular pretraining corpora on zero-shot generalization. We also study the performance of a multilingual model and how it compares to the English-only one. Finally, we consider the scaling behaviour of Transformers to chose the target model size, shape, and training setup.

ACL 2022

What Language Model to Train if You Have One Million GPU Hours?

scaling

architecture

transformer

# Welcome everyone to ACL 2022!

The 60th Annual Meeting of the Association for Computational Linguistics is taking place May 22-27, 2022 as a hybrid event, in Dublin and online. We are happy to welcome all of you to this anniversary edition with an almost 50-50 in-person and virtual participation. 
The main conference program features oral presentations, in-person and virtual posters and demo sessions, a plenary session for our best paper presentations and awards, three amazing keynote events and two new initiatives of invited talks: Spotlight Talks for Young Rising Stars (STIRS) and The Next Big Idea Talks. Posters (including Findings of ACL 2022) and demos are grouped by areas for both the in-person and the virtual sessions. For the virtual component, the talks will be on Zoom and the posters and the demos will be in GatherTown. The Student Research Workshop will have an oral session and a poster session as part of Poster Session 1. The program also features eight Tutorials and 28 Workshops. 

 
We wish you a wonderful conference! 
[**The ACL 2022 Organizing Committee**](https://www.2022.aclweb.org/organisers)
 
[**Conference Handbook**](https://drive.google.com/file/d/1_BUCMfhMVrjG9E2e71aHdHeE28KSje0l/view?usp=sharing) 
[**Mini Handbook**](https://drive.google.com/file/d/1qlBKl0wzmlVF1oCeMQl3BahLd9nLP5Ce/view?usp=sharing) 
[**Posters and Demo guides**](https://drive.google.com/file/d/1UucMAoCNncIOaH1rMMDa0owuG9qgvJTG/view?usp=sharing)

The Association for Computational Linguistics (ACL) is the premier international scientific and professional society for people working on computational problems involving human language, a field often referred to as either computational linguistics or natural language processing (NLP). 

workshop paper

We benchmark different lightweight strategies for adding new languages (German and Korean) into the BigScience’s pretrained multilingual language model with 1.3 billion parameters that currently supports 13 languages. We investigate the factors that affect the language adaptability of the model and the trade-offs between computational costs and expected performance.


Adapting BigScience Multilingual Model to Unseen Languages

Large NLP models have recently shown impressive performance in language understanding tasks, typically evaluated by fine-tuning tasks. Alternatively, probing has received increasing attention as being a lightweight method for interpreting the intrinsic mechanisms of large NLP models. In probing, post-hoc classifiers are trained on "out-of-domain" datasets that diagnose specific abilities. While probing the language models has led to insightful findings, they appear disjointed from the development of models. This paper explores the utility of probing deep NLP models to extract a proxy signal widely used in model developments, the fine-tuning performance. We find that it is possible to use the accuracies of only three probing results to predict the fine-tuning performance with errors 40%- 80% smaller than baselines. We further show the possibility of incorporating specialized probing datasets into developing deep NLP models.

Predicting Fine-tuning Performance with Probing

Panel on challenges for scaling large language models

Scaling Large Language Models

The panelists discuss why we need large-scale, open collaboration in machine learning research and share their thoughts, experience, challenges and opportunities around collaborative approaches, projects and organziations including GEM, BigBench, Masakhane, EleutherAI and ML Collective.

Large-scale collaborations

Mind the Gaps and Normal Accidents

Human Values in Recommender Systems: a Multidisciplinary Discussion

Problematic Information on Social Media Platforms: Understanding and Countering

Datasets, Learner Annotation, and Bias in Multiple Languages

Supporting professional fact-checking: how can NLP/AI help?

Model robustness and spurious correlations have received increasing attention in the NLP community, both in methods and evaluation. The term “spurious correlation” has been abused to refer to any undesirable shortcuts learned by the model, judged by domain knowledge. However, in NLP, many features (e.g. word overlap and negation) are not spurious in the sense that the background is a spurious feature to classifying the object in the image. They carry important information that’s needed to make predictions by humans. In this talk, we argue that it is more productive to consider features from the aspects of necessity and sufficiency, and discuss the implications of this categorization in representation, learning, and evaluation.

Downloads

Next from ACL 2022

Adapting BigScience Multilingual Model to Unseen Languages

Similar lecture

Emotion Analysis in Tamil Text using Language Agnostic Embeddings

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

.css-70qvj9{display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}Downloads

Next from ACL 2022

Adapting BigScience Multilingual Model to Unseen Languages

Similar lecture

Emotion Analysis in Tamil Text using Language Agnostic Embeddings

Stay up to date with the latest Underline news!

PRESENTATIONS

CONFERENCES

COMPANY

RESOURCES

Downloads