Mitesh Khapra
benchmark
indian languages
multilinguality
indic languages
transformers
dataset
indicnlp
datasets
natural language generation
neural machine translation
self-supervised learning
pre-training
annotations
slu
benchmarking
22
presentations
27
number of views
SHORT BIO
Mitesh M. Khapra is an Associate Professor in the Department of Computer Science and Engineering at IIT Madras. He heads the AI4Bharat Research Lab at IIT Madras which focuses on building datasets, tools, models and applications for Indian languages. His research work has been published in several top conferences and journals including TACL, ACL, NeurIPS, TALLIP, EMNLP, EACL, AAAI, etc. He has also served as Area Chair or Senior PC member in top conferences such as ICLR and AAAI. Prior to IIT Madras, he was a Researcher at IBM Research India for four and a half years, where he worked on several interesting problems in the areas of Statistical Machine Translation, Cross Language Learning, Multimodal Learning, Argument Mining and Deep Learning. Prior to IBM, he completed his PhD and M.Tech from IIT Bombay in Jan 2012 and July 2008 respectively. His PhD thesis dealt with the important problem of reusing resources for multilingual computation. During his PhD he was a recipient of the IBM PhD Fellowship (2011) and the Microsoft Rising Star Award (2011). He is also a recipient of the Google Faculty Research Award (2018), the IITM Young Faculty Recognition Award (2019), the Prof. B. Yegnanarayana Award for Excellence in Research and Teaching (2020) and the Srimathi Marti Annapurna Gurunath Award for Excellence in Teaching (2022).
Presentations
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users
Yash Madhani and 7 other authors
How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?
Anushka Singh and 5 other authors
IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages
Tahir Javed and 20 other authors
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users
Yash Madhani and 7 other authors
IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages
Ananya Sai and 6 other authors
Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages
Sumanth Doddapaneni and 6 other authors
Bhasa-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages
Yash Madhani and 2 other authors
IndicSUPERB: A Speech processing Universal Performance Benchmark for Indian languages
Tahir Javed and 5 other authors
IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages
Raj Dabre and 8 other authors
Samanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic Languages
Gowtham Ramesh and 17 other authors
IndicBART: A Pre-trained Model for Indic Natural Language Generation
Raj Dabre and 5 other authors
OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages
Prem Selvaraj and 3 other authors
Input-specific Attention Subnetworks for Adversarial Detection
Emil Biju and 3 other authors
Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons
Akash Kumar Mohankumar and 1 other author
Towards Building ASR Systems for the Next Billion Users
Tahir Javed and 7 other authors
Perturbation CheckLists for Evaluating NLG Evaluation Metrics
Ananya Sai and 4 other authors