Abhayjeet Singh | Indian Institute of Sciences(IISc), Bangalore, India

About Me

Inquistive towards the field of Machine Learning and its unnumerable real-world applications. Currently, I am working as a Research Associate at SPIRE Lab, IISc Bengaluru under the guidance of Dr Prasanta Kumar Ghosh. Graduated from NIT Srinagar in Computer Science & Engineering. At SPIRE lab I am working on problems related to Audio-Visual Speech Synthesis, Accent Conversion and Developing TTS & ASR systems for multiple Indian languages.

My Resume

Research Interests

Natural Language Processsing : Speech Systhesis, Speech Recognition, Language Understanding, Voice Conversion and Styling

Challanges/Workshops

LIMMITS’24
Multi-speaker, Multi-lingual Indic TTS with VOICE CLONING, as part of signal processing grand challenge in ICASSP 2024.
MADASR’23
Model ADaptation for ASR in low-resource Indian languages Challenge organized at ASRU 2023 (Results Declared)
LIMMITS’23
Multi-speaker, Multi-lingual Indic TTS challenge, as part of signal processing grand challenge in ICASSP 2023. (Results Declared)
Gram Vaani ASR Challenge
Gram Vaani ASR Challenge - part of low resource ASR development special session in INTERSPEECH 2022. (Results Declared)

Publications

SPIRE-SIES: A Spontaneous Indian English Speech Corpus
Abhayjeet Singh, Charu Shah, Rajashri Varadaraj, Sonakshi Chauhan, Prasanta Kumar Ghosh
O-COCOSDA 2023
[PDF]
Gram vaani asr challenge on spontaneous telephone speech recordings in regional variations of hindi
Anish Bhanushali, Grant Bridgman, G Deekshitha, Prasanta Ghosh, Pratik Kumar, Saurabh Kumar, Adithya-Raj Kolladath, Nithya Ravi, Aaditeshwar Seth, Ashish Seth, Abhayjeet Singh, NS Vrunda, S Umesh, Sathvik Udupa, VS Lodagala, V Durga Prasad
Interspeech 2022
[PDF]
A study on native American English speech recognition by Indian listeners with varying word familiarity level
Abhayjeet Singh, Achuth Rao MV, Rakesh Vaideeswaran, Chiranjeevi Yarra, Prasanta Kumar Ghosh
O-COCOSDA 2021
[PDF]
Web Interface for estimating articulatory movements in speech production from acoustics and text
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
InterSpeech 2021
[PDF] [Code]
Estimating articulatory movements in speech production with transformer networks
Sathvik Udupa, Anwesha Roy, Abhayjeet Singh, Aravind Illa, Prasanta Kumar Ghosh
InterSpeech 2021.
[PDF] [Code]
Attention and Encoder-Decoder based models for transforming articulatory movements at different speaking rates
Abhayjeet Singh, Aravind Illa and Prasanta Kumar Ghosh
InterSpeech 2020
[PDF] [Code]
A comparative study of estimating articulatory movements from phoneme sequences and acoustic features
Abhayjeet Singh, Aravind Illa and Prasanta Kumar Ghosh
ICASSP 2020
[PDF] [Code]

Current Projects

REcognizing SPeech in INdian languages (RESPIN)(funded by: Gates Foundation)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Speech recognition in agriculture and finance for the poor is an initiative predominantly to create resources and make them available as a digital public good in the open source domain to spur research and innovation in speech recognition in nine different Indian languages in the area of agriculture and finance.
SYnthesizing SPeech in INdian languages (SYSPIN)(funded by: GIZ, Germany)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Develop and open source a large corpus and models for text-to-speech (TTS) systems in multiple Indian languages.
Accent Conversion
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Conversion of non-native accent to native accent for better recognition of non-native speech.
[Publication]
Vaani (funded by: Google)
Advisor: Prof. Prasanta Kumar Ghosh (IISc Bangalore)
Develop and open source a large corpus and models for Automatic Speech Recognition (ASR) systems in multiple Indian languages.

Previous Projects

Estimating articulatory movements from phonemes spoken during speech production
Advisor: Prasanta Kumar Ghosh, Aravind Illa (IISc Bengaluru)
Predicting articulatory movements from phonemes using Encoder-Decoder models with Attention mechanism for modelling durations between phonemes and respective articulatory movements.
[Publication 1] [Publication 2] [Publication 3]
ASTNET - Prediction of Articulatory Motion in Speech Production at different rates
Advisor: Prasanta Kumar Ghosh, Aravind Illa (IISc Bengaluru)
Prediction of Articulatory Motion at different rates using Encoder Decoder Model and Dynamic Time Warping Algorithm for Alignment. Predicting articulators at varied speaking rates can be used to enhance performance of ASR systems in real-time.
[Publication]
Sign Language Recognition using CNN
Advisor: Prof. RN Mir & Ab Rouf Khan (NIT Srinagar, India)
Classifying various hand gestures as English language alphabets in real time using Convolutional Neural Networks. [Code]
Language Identification System
Advisor: Advisor: Prof. Arun Balaji Budru (IIIT Delhi)
Detection of various Indian languages using a convolutional recurrent neural network (CRNN).The CRNN model was trained with input as grey scale image of the audio’s spectrogram. [Code]