profile picture

Mitesh Khapra

benchmark

indian languages

multilinguality

indic languages

transformers

dataset

indicnlp

datasets

natural language generation

neural machine translation

self-supervised learning

pre-training

annotations

slu

benchmarking

22

presentations

27

number of views

SHORT BIO

Mitesh M. Khapra is an Associate Professor in the Department of Computer Science and Engineering at IIT Madras. He heads the AI4Bharat Research Lab at IIT Madras which focuses on building datasets, tools, models and applications for Indian languages. His research work has been published in several top conferences and journals including TACL, ACL, NeurIPS, TALLIP, EMNLP, EACL, AAAI, etc. He has also served as Area Chair or Senior PC member in top conferences such as ICLR and AAAI. Prior to IIT Madras, he was a Researcher at IBM Research India for four and a half years, where he worked on several interesting problems in the areas of Statistical Machine Translation, Cross Language Learning, Multimodal Learning, Argument Mining and Deep Learning. Prior to IBM, he completed his PhD and M.Tech from IIT Bombay in Jan 2012 and July 2008 respectively. His PhD thesis dealt with the important problem of reusing resources for multilingual computation. During his PhD he was a recipient of the IBM PhD Fellowship (2011) and the Microsoft Rising Star Award (2011). He is also a recipient of the Google Faculty Research Award (2018), the IITM Young Faculty Recognition Award (2019), the Prof. B. Yegnanarayana Award for Excellence in Research and Teaching (2020) and the Srimathi Marti Annapurna Gurunath Award for Excellence in Teaching (2022).

Presentations

Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users

Yash Madhani and 7 other authors

How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Anushka Singh and 5 other authors

IndicVoices: Towards building an Inclusive Multilingual Speech Dataset for Indian Languages

Tahir Javed and 20 other authors

Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users

Yash Madhani and 7 other authors

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation Metrics for Indian Languages

Ananya Sai and 6 other authors

Towards Leaving No Indic Language Behind: Building Monolingual Corpora, Benchmark and Models for Indic Languages

Sumanth Doddapaneni and 6 other authors

Bhasa-Abhijnaanam: Native-script and romanized Language Identification for 22 Indic languages

Yash Madhani and 2 other authors

IndicSUPERB: A Speech processing Universal Performance Benchmark for Indian languages

Tahir Javed and 5 other authors

IndicNLG Benchmark: Multilingual Datasets for Diverse NLG Tasks in Indic Languages

Raj Dabre and 8 other authors

Samanantar: The Largest Publicly Available Parallel Corpora Collection For 11 Indic Languages

Gowtham Ramesh and 17 other authors

IndicBART: A Pre-trained Model for Indic Natural Language Generation

Raj Dabre and 5 other authors

OpenHands: Making Sign Language Recognition Accessible with Pose-based Pretrained Models across Languages

Prem Selvaraj and 3 other authors

Input-specific Attention Subnetworks for Adversarial Detection

Emil Biju and 3 other authors

Active Evaluation: Efficient NLG Evaluation with Few Pairwise Comparisons

Akash Kumar Mohankumar and 1 other author

Towards Building ASR Systems for the Next Billion Users

Tahir Javed and 7 other authors

Perturbation CheckLists for Evaluating NLG Evaluation Metrics

Ananya Sai and 4 other authors

Stay up to date with the latest Underline news!

Select topic of interest (you can select more than one)

PRESENTATIONS

  • All Lectures
  • For Librarians
  • Resource Center
  • Free Trial
Underline Science, Inc.
1216 Broadway, 2nd Floor, New York, NY 10001, USA

© 2023 Underline - All rights reserved