
Isaac Caswell
Research Scientist @ Google
data
low-resource
evaluation
text
low resource
low
web
nmt
noise
language
mt
mining
resource
langid
lid
3
presentations
5
number of views
SHORT BIO
I stare at data, specifically from low-resource languages
Presentations

Separating the Wheat from the Chaff with BREAD: An open-source benchmark and metrics to detect redundancy in text
Isaac Caswell and 2 other authors

Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
Julia Kreutzer and 51 other authors

Language ID in the Wild: Unexpected Challenges on the Path to a Thousand-Language Web Text Corpus
Isaac Caswell and 3 other authors