profile picture

Catherine Arnett

Graduate student @ University of California San Diego

multilingual language models

tokenization

morphology

agent

spanish

abstraction

representation

human-robot interaction

agreement

linguistic structure

offline reinforcement learning

psychlinguistics

segment+ framework

long-form text processing

structured notes

4

presentations

SHORT BIO

Catherine Arnett is a PhD Candidate in the Linguistics Department at UC San Diego. Her dissertation focuses on describing the meaning and usage of reduplication in Mandarin Chinese. She also is interested in the capabilities of multilingual language models and language modeling for low-resource languages.

Presentations

BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training

Catherine Arnett and 3 other authors

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages

Tyler Chang and 3 other authors

Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement

Catherine Arnett

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models

James Michaelov and 3 other authors

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Presentations
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2025 Underline - All rights reserved