I am a Research Associate at the SPIRE Lab, Indian Institute of Science (IISc) Bengaluru, working under the guidance of Dr. Prasanta Kumar Ghosh. I hold a degree in Computer Science & Engineering from NIT Srinagar. My research centers on Audio-Visual Speech Synthesis, Accent Conversion, and the development of Text-to-Speech (TTS) and Automatic Speech Recognition (ASR) systems for multiple Indian languages. Passionate about Machine Learning and its vast real-world applications, I am dedicated to advancing technology that bridges language and accessibility barriers, fostering inclusive solutions in speech processing and synthesis.
LIMMITS’24: Multi-speaker, Multi-lingual Indic TTS with voice cloning
Abhayjeet Singh, Amala Nagireddi, G Deekshitha, Jesuraja Bandekar, R Roopa, Sandhya Badiger, Sathvik Udupa, Prasanta Kumar Ghosh, Hema A Murthy, Pranaw Kumar, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich
LIMMITS’24: 2024 IEEE International Conference on Acoustics, Speech, and Signal Processing Workshops (ICASSPW)
[PDF)]
Lightweight, Multi-speaker, Multi-lingual Indic Text-To-Speech
Abhayjeet Singh, Amala Nagireddi, Anjali Jayakumar, G Deekshitha, Jesuraja Bandekar, R Roopa, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, Hema A Murthy, Heiga Zen, Pranaw Kumar, Kamal Kant, Amol Bole, Bira Chandra Singh, Keiichi Tokuda, Mark Hasegawa-Johnson, Philipp Olbrich
LIMMITS’23: IEEE Open Journal of Signal Processing
[PDF)]
Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages
Sathvik Udupa, Jesuraja Bandekar, G Deekshitha, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan, Raoul Nanavati
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
[PDF)]
Gated Multi Encoders and Multitask Objectives for Dialectal Speech Recognition in Indian Languages
Sathvik Udupa, Jesuraja Bandekar, G Deekshitha, Saurabh Kumar, Prasanta Kumar Ghosh, Sandhya Badiger, Abhayjeet Singh, Savitha Murthy, Priyanka Pai, Srinivasa Raghavan, Raoul Nanavati
2023 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
[PDF)]
An End-to-End TTS Model in Chhattisgarhi, a Low-Resource Indian Language
Abhayjeet Singh, Anjali Jayakumar, G Deekshitha, Hitesh Kumar, Jesuraja Bandekar, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh
International Conference on Speech and Computer
[PDF)]
An ASR Corpus in Chhattisgarhi, a Low Resource Indian Language
Abhayjeet Singh, Arjun Singh Mehta, KS Ashish Khuraishi, G Deekshitha, Gauri Date, Jai Nanavati, Jesuraja Bandekar, Karnalius Basumatary, P Karthika, Sandhya Badiger, Sathvik Udupa, Saurabh Kumar, Prasanta Kumar Ghosh, V Prashanthi, Priyanka Pai, Raoul Nanavati, Sai Praneeth Reddy Mora, Srinivasa Raghavan
International Conference on Speech and Computer
[PDF]
SPIRE-SIES: A Spontaneous Indian English Speech Corpus
Abhayjeet Singh, Charu Shah, Rajashri Varadaraj, Sonakshi Chauhan, Prasanta Kumar Ghosh
O-COCOSDA 2023
[PDF]
Gram vaani asr challenge on spontaneous telephone speech recordings in regional variations of hindi
Anish Bhanushali, Grant Bridgman, G Deekshitha, Prasanta Ghosh, Pratik Kumar, Saurabh Kumar, Adithya-Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, NS Vrunda, S Umesh, Sathvik Udupa, VS Lodagala, V Durga Prasad
Interspeech 2022
[PDF]
A study on native American English speech recognition by Indian listeners with varying word familiarity level
Abhayjeet Singh, Achuth Rao MV, Rakesh Vaideeswaran, Chiranjeevi Yarra, Prasanta Kumar Ghosh
O-COCOSDA 2021
[PDF]
Web Interface for estimating articulatory movements in speech production from acoustics and text
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
InterSpeech 2021
[PDF] [Code]
Estimating articulatory movements in speech production with transformer networks
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
InterSpeech 2021.
[PDF] [Code]
Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates
Abhayjeet Singh, Aravind Illa and Prasanta Kumar Ghosh
InterSpeech 2020
[PDF] [Code]
A comparative study of estimating articulatory movements from phoneme sequences and acoustic features
Abhayjeet Singh, Aravind Illa and Prasanta Kumar Ghosh
ICASSP 2020
[PDF] [Code]
REcognizing SPeech in INdian languages (RESPIN)(funded by: Gates Foundation)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Speech recognition in agriculture and finance for the poor is an initiative predominantly to create resources and make them available as a digital public good in the open source domain to spur research and innovation in speech recognition in nine different Indian languages in the area of agriculture and finance.
SYnthesizing SPeech in INdian languages (SYSPIN)(funded by: GIZ, Germany)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Develop and open source a large corpus and models for text-to-speech (TTS) systems in multiple Indian languages.
Accent Conversion
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Conversion of non-native accent to native accent for better recognition of non-native speech.
[Publication]
Vaani (funded by: Google)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Develop and open source a large corpus and models for Automatic Speech Recognition (ASR) systems in multiple Indian languages.
Estimating articulatory movements from phonemes spoken during speech production
Advisor: Prasanta Kumar Ghosh, Aravind Illa (IISc Bengaluru)
Predicting articulatory movements from phonemes using Encoder-Decoder models with Attention mechanism for modelling durations between phonemes and respective articulatory movements.
[Publication 1] [Publication 2] [Publication 3]
ASTNET - Prediction of Articulatory Motion in Speech Production at different rates
Advisor: Prasanta Kumar Ghosh, Aravind Illa (IISc Bengaluru)
Prediction of Articulatory Motion at different rates using Encoder Decoder Model and Dynamic Time Warping Algorithm for Alignment. Predicting articulators at varied speaking rates can be used to enhance performance of ASR systems in real-time.
[Publication]
Sign Language Recognition using CNN
Advisor: Prof. RN Mir & Ab Rouf Khan (NIT Srinagar, India)
Classifying various hand gestures as English language alphabets in real time using Convolutional Neural Networks. [Code]
Language Identification System
Advisor: Advisor: Prof. Arun Balaji Budru (IIIT Delhi)
Detection of various Indian languages using a convolutional recurrent neural network (CRNN).The CRNN model was trained with input as grey scale image of the audio’s spectrogram. [Code]