Huawei India RnD - Data Science Intern (July 2020)
I developed a machine learning model for speech language detection, leveraging mel-spectrogram images and multi-label classification techniques from computer vision. Using CNN architectures like InceptionV3, DenseNet201, and VGG16, I trained models on datasets such as LibriSpeech and CommonVoice, achieving 97% accuracy in detecting European languages. For Automatic Speech Recognition (ASR), I trained the Jasper Network with CTC loss for English and applied transfer learning to Spanish, achieving a competitive 19.78% WER for Spanish ASR. To improve transcription accuracy, I integrated spelling correction models like DeepPavlov and Enchant, reducing English ASR errors by 7%. Additionally, I explored ASR for music lyric transcription by isolating vocals using Spleeter but identified challenges due to background noise and artistic pronunciation variations.