
Tom Kocmi
evaluation
machine translation
large language models
human evaluation
natural langauge generation
diversifying references
error span annotation
metrics
3
presentations
Presentations

AI-Assisted Human Evaluation of Machine Translation
Vilém Zouhar and 2 other authors

Navigating the Metrics Maze: Reconciling Score Magnitudes and Accuracies
Tom Kocmi and 3 other authors

Not All Metrics Are Guilty: Improving NLG Evaluation by Diversifying References
Tianyi Tang and 7 other authors