Huawei India RnD - Data Science Intern (July 2020)

I developed a machine learning model for speech language detection, leveraging mel-spectrogram images and multi-label classification techniques from computer vision. Using CNN architectures like InceptionV3, DenseNet201, and VGG16, I trained models on datasets such as LibriSpeech and CommonVoice, achieving 97% accuracy in detecting European languages. For Automatic Speech Recognition (ASR), I trained the Jasper Network with CTC loss for English and applied transfer learning to Spanish, achieving a competitive 19.78% WER for Spanish ASR. To improve transcription accuracy, I integrated spelling correction models like DeepPavlov and Enchant, reducing English ASR errors by 7%. Additionally, I explored ASR for music lyric transcription by isolating vocals using Spleeter but identified challenges due to background noise and artistic pronunciation variations.

Share on

Twitter Facebook LinkedIn

Vineet Bhat

Share on