
Catherine Arnett
Graduate student @ University of California San Diego
multilingual language models
tokenization
morphology
agent
spanish
abstraction
representation
human-robot interaction
agreement
linguistic structure
offline reinforcement learning
psychlinguistics
segment+ framework
long-form text processing
structured notes
4
presentations
SHORT BIO
Catherine Arnett is a PhD Candidate in the Linguistics Department at UC San Diego. Her dissertation focuses on describing the meaning and usage of reduplication in Mandarin Chinese. She also is interested in the capabilities of multilingual language models and language modeling for low-resource languages.
Presentations

BPE Gets Picky: Efficient Vocabulary Refinement During Tokenizer Training
Catherine Arnett and 3 other authors

When Is Multilinguality a Curse? Language Modeling for 250 High- and Low-Resource Languages
Tyler Chang and 3 other authors

Different Tokenization Schemes Lead to Comparable Performance in Spanish Number Agreement
Catherine Arnett

Structural Priming Demonstrates Abstract Grammatical Representations in Multilingual Language Models
James Michaelov and 3 other authors