I am a Research Associate at the SPIRE Lab, Indian Institute of Science (IISc) Bengaluru, working under the guidance of Dr. Prasanta Kumar Ghosh. I am a Computer Science & Engineering graduate from National Institute of Technology(NIT), Srinagar. My research centers on Audio-Visual Speech Synthesis, Accent Conversion, Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) systems for Indian languages. Passionate about Machine Learning and its vast real-world applications, I am dedicated to advancing technology that bridges language and accessibility barriers, fostering inclusive solutions in speech processing and synthesis.
RESPIN-S1. 0: A read speech corpus of 10000+ hours in dialects of nine Indian Languages
Saurabh Kumar, Abhayjeet Singh, DEEKSHITHA G, Amartya veer, Jesuraj Bandekar, Savitha Murthy, Sumit Sharma, Sandhya Badiger, Sathvik Udupa, Amala Nagireddi, Srinivasa Raghavan K M, Rohan Saxena, Jai Nanavati, Raoul Nanavati, Janani Sridharan, Arjun Mehta, Ashish S, Sai Mora, Prashanthi Venkataramakrishnan, Gauri Date, Karthika P, Prasanta Ghosh
NeurIPS 2025 Datasets and Benchmarks Track Submissions
[PDF]
Improving Dialect Identification in Indian Languages Using Multimodal Features from Dialect Informed ASR
Saurabh Kumar, Sumit Sharma, Sathvik Udupa, Sandhya Badiger, Abhayjeet Singh, Jesuraja Bandekar, Savitha Murthy, Prasanta Kumar Ghosh
ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[PDF]
LIMMITS’24: Multi-speaker, Multi-lingual Indic TTS with voice cloning
Abhayjeet Singh, Amala Nagireddi, G Deekshitha, Jesuraja Bandekar, R Roopa, Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A Murthy, Pranaw Kumar, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich
LIMMITS’24: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
[PDF]
IEEE Open Journal of Signal Processing
[PDF]
Lightweight, Multi-speaker, Multi-lingual Indic Text-To-Speech
Abhayjeet Singh, Amala Nagireddi, Anjali Jayakumar, G Deekshitha, Jesuraja Bandekar, R Roopa, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, Hema A Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich
ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
[PDF]
LIMMITS’23: IEEE Open Journal of Signal Processing
[PDF]
Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages
Sathvik Udupa, Jesuraja Bandekar, G Deekshitha, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan, Raoul Nanavati
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
[PDF)]
An End-to-End TTS Model in Chhattisgarhi, a Low-Resource Indian Language
Abhayjeet Singh, Anjali Jayakumar, G Deekshitha, Hitesh Kumar, Jesuraja Bandekar, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh
International Conference on Speech and Computer
[PDF)]
An ASR Corpus in Chhattisgarhi, a Low Resource Indian Language
Abhayjeet Singh, Arjun Singh Mehta, KS Ashish Khuraishi, G Deekshitha, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, P Karthika, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, V Prashanthi, Priyanka Pai, Raoul Nanavati, Sai Praneeth Reddy Mora, Srinivasa Raghavan
International Conference on Speech and Computer
[PDF]
SPIRE-SIES: A Spontaneous Indian English Speech Corpus
Abhayjeet Singh, Charu Shah, Rajashri Varadaraj, Sonakshi Chauhan, Prasanta Kumar Ghosh
O-COCOSDA 2023
[PDF]
[Corpus Download]
Gram vaani asr challenge on spontaneous telephone speech recordings in regional variations of hindi
Anish Bhanushali, Grant Bridgman, G Deekshitha, Prasanta Ghosh, Pratik Kumar, Saurabh Kumar, Adithya-Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, NS Vrunda, S Umesh, Sathvik Udupa, VS Lodagala, V Durga Prasad
Interspeech 2022
[PDF]
A study on native American English speech recognition by Indian listeners with varying word familiarity level
Abhayjeet Singh, Achuth Rao MV, Rakesh Vaideeswaran, Chiranjeevi Yarra, Prasanta Kumar Ghosh
O-COCOSDA 2021
[PDF]
Web Interface for estimating articulatory movements in speech production from acoustics and text
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
InterSpeech 2021
[PDF] [Code]
Estimating articulatory movements in speech production with transformer networks
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
InterSpeech 2021.
[PDF] [Code]
Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates
Abhayjeet Singh, Aravind Illa and Prasanta Kumar Ghosh
InterSpeech 2020
[PDF] [Code]
A comparative study of estimating articulatory movements from phoneme sequences and acoustic features
Abhayjeet Singh, Aravind Illa and Prasanta Kumar Ghosh
ICASSP 2020
[PDF] [Code]
REcognizing SPeech in INdian languages (RESPIN)(funded by: Gates Foundation)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Speech recognition in agriculture and finance for the poor is an initiative predominantly to create resources and make them available as a digital public good in the open source domain to spur research and innovation in speech recognition in nine different Indian languages in the area of agriculture and finance.
SYnthesizing SPeech in INdian languages (SYSPIN)(funded by: GIZ, Germany)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Develop and open source a large corpus and models for text-to-speech (TTS) systems in multiple Indian languages.
Accent Conversion
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Conversion of non-native accent to native accent for better recognition of non-native speech.
[Publication]
Vaani (funded by: Google)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Develop and open source a large corpus and models for Automatic Speech Recognition (ASR) systems in multiple Indian languages.
Estimating articulatory movements from phonemes spoken during speech production
Advisor: Prasanta Kumar Ghosh, Aravind Illa (IISc Bengaluru)
Predicting articulatory movements from phonemes using Encoder-Decoder models with Attention mechanism for modelling durations between phonemes and respective articulatory movements.
[Publication 1] [Publication 2] [Publication 3]
ASTNET - Prediction of Articulatory Motion in Speech Production at different rates
Advisor: Prasanta Kumar Ghosh, Aravind Illa (IISc Bengaluru)
Prediction of Articulatory Motion at different rates using Encoder Decoder Model and Dynamic Time Warping Algorithm for Alignment. Predicting articulators at varied speaking rates can be used to enhance performance of ASR systems in real-time.
[Publication]
Sign Language Recognition using CNN
Advisor: Prof. RN Mir & Ab Rouf Khan (NIT Srinagar, India)
Classifying various hand gestures as English language alphabets in real time using Convolutional Neural Networks. [Code]
Language Identification System
Advisor: Advisor: Prof. Arun Balaji Budru (IIIT Delhi)
Detection of various Indian languages using a convolutional recurrent neural network (CRNN).The CRNN model was trained with input as grey scale image of the audio’s spectrogram. [Code]