
Mitesh Khapra
Associate Professor @ IIT Madras
indian languages
benchmark
indic languages
multilinguality
natural language generation
datasets
transformers
dataset
indicnlp
question answering
summarization
neural machine translation
benchmarking
multilingual
pre-training
25
presentations
30
number of views
SHORT BIO
Mitesh M. Khapra is an Associate Professor in the Department of Computer Science and Engineering at IIT Madras. He heads the AI4Bharat Research Lab at IIT Madras which focuses on building datasets, tools, models and applications for Indian languages. His research work has been published in several top conferences and journals including TACL, ACL, NeurIPS, TALLIP, EMNLP, EACL, AAAI, etc. He has also served as Area Chair or Senior PC member in top conferences such as ICLR and AAAI. Prior to IIT Madras, he was a Researcher at IBM Research India for four and a half years, where he worked on several interesting problems in the areas of Statistical Machine Translation, Cross Language Learning, Multimodal Learning, Argument Mining and Deep Learning. Prior to IBM, he completed his PhD and M.Tech from IIT Bombay in Jan 2012 and July 2008 respectively. His PhD thesis dealt with the important problem of reusing resources for multilingual computation. During his PhD he was a recipient of the IBM PhD Fellowship (2011) and the Microsoft Rising Star Award (2011). He is also a recipient of the Google Faculty Research Award (2018), the IITM Young Faculty Recognition Award (2019), the Prof. B. Yegnanarayana Award for Excellence in Research and Teaching (2020) and the Srimathi Marti Annapurna Gurunath Award for Excellence in Teaching (2022).
Presentations

Cross-Lingual Auto Evaluation for Assessing Multilingual LLMs
Sumanth Doddapaneni and 5 other authors

Can Vision-Language Models Evaluate Handwritten Math?
Oikantik Nath and 3 other authors

FairI Tales: Evaluation of Fairness in Indian Contexts with a Focus on Bias and Stereotypes
Janki Navle and 4 other authors

Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 13 Indian Languages
Ashwin Sankar and 8 other authors

Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users
Yash Madhani and 7 other authors

Finding Blind Spots in Evaluator LLMs with Interpretable Checklists
Sumanth Doddapaneni and 3 other authors

How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?
Anushka Singh and 5 other authors

IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Tahir Javed and 20 other authors

Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users
Yash Madhani and 7 other authors

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages
Ananya Sai and 6 other authors

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages
Sumanth Doddapaneni and 6 other authors

Bhasa-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages
Yash Madhani and 2 other authors

IndicSUPERB: A Speech processing Universal Performance Benchmark for Indian languages
Tahir Javed and 5 other authors

IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages
Raj Dabre and 8 other authors

Samanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic Languages
Gowtham Ramesh and 17 other authors

IndicBART: A Pre-trained Model for Indic Natural Language Generation
Raj Dabre and 5 other authors