Sitemap

A list of all the posts and pages found on the site. For you robots out there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

experience

MedPrime Technologies - Data Science Intern (December 2019)

MedPrime Technologies is a pioneering medical device company dedicated to creating customer-centric solutions for global healthcare needs. During my second year of undergrad, I collaborated closely with the CTO through an internship to design a Python-based backend software that assists doctors in predicting the suitability of sperm samples for Human IVF.

Huawei India RnD - Data Science Intern (July 2020)

I developed a machine learning model for speech language detection, leveraging mel-spectrogram images and multi-label classification techniques from computer vision. Using CNN architectures like InceptionV3, DenseNet201, and VGG16, I trained models on datasets such as LibriSpeech and CommonVoice, achieving 97% accuracy in detecting European languages. For Automatic Speech Recognition (ASR), I trained the Jasper Network with CTC loss for English and applied transfer learning to Spanish, achieving a competitive 19.78% WER for Spanish ASR. To improve transcription accuracy, I integrated spelling correction models like DeepPavlov and Enchant, reducing English ASR errors by 7%. Additionally, I explored ASR for music lyric transcription by isolating vocals using Spleeter but identified challenges due to background noise and artistic pronunciation variations.

ABInBev India - Automation Intern (May 2021)

I analyzed €8.4M in correction amounts for the Belgium Business Division, identifying invoicing errors and developing seven KPIs to enhance billing accuracy while cataloging SAP tables for data aggregation. For the Mexico Business Division, I debugged 10+ data metrics and developed knowledge models for Overdues and Accounts Receivable, enabling effective rework tracking. I also designed three global sales order trackers and documented analysis dashboards for business-friendly use.

Accenture Labs India - Research Intern (Oct 2021)

I developed a framework to enhance the diversity of a small in-lab gesture video dataset by researching state-of-the-art synthetic video generation methods and implementing augmentations using OpenCV and Pillow. The automated framework increased video samples by 100 times through systematic application of blurring, random cropping, and affine transformations. Using the synthetic dataset, I trained and evaluated deep learning models like MocoGAN and TecoGAN, identifying GAN-based methods as optimal for limited gold-standard datasets. This approach led to a 10% relative improvement in classification performance over competitive baselines. Additionally, I identified the limitations of mechanical augmentations and recommended diffusion models for more sophisticated data augmentation.

projects

3D Scene-Aware Vision-Language Action Modeling for Robot Manipulation

Masala-CHAI: A Large-Scale SPICE Netlist Dataset for Analog Circuits by Harnessing AI

HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models

Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback

DISCO: A large scale human annotated corpus for disfluency correction in Indo-European languages

VAKTA-SETU: A Speech-to-Speech Machine Translation Service in Select Indic Languages

Adversarial Training for Low-Resource Disfluency Correction

DisfluencyFixer: A tool to enhance Language Learning through Speech To Speech Disfluency Correction

publications

DisfluencyFixer- A tool to enhance Language Learning through Speech To Speech Disfluency Correction

Published in Interspeech (Show and Tell Demonstrations), 2023

In this work, we create a webservice demonstrating capabilities of speech to speech disfluency correction. Given a disfluent speech utterance in English or Hindi, the model first transcribes the speech using a powerful Speech To Text system. The transcribed text is cleaned using our SOTA disfluency correction models followed by speech synthesis.

Download Paper

Adversarial Training for Low-Resource Disfluency Correction

Published in Association for Computational Linguistics (ACL) (Findings), 2023

This paper introduces Seq-GAN-BERT, a novel few shot machine learning approach leveraging a small set of labelled examples and large set of unlabelled examples for low resource disfluency correction.

Download Paper

DISCO: A Large Scale Human Annotated Corpus for Disfluency Correction in Indo-European Languages

Published in Empirical Methods in Natural Language Processing (EMNLP) (Findings), 2023

Through this project, we created a novel dataset for Disfluency Correction in English and 3 under-represented languages - Hindi, German and French. We benchmarked various transformer architectures for sequence labelling styled disfluency correction, and demonstrate key features of the proposed dataset that previous works was lacking.

Download Paper

Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback

Published in ArXiv Preprint, 2024

This paper discusses a novel approach to robotic task planning, using a Two-LLM system for breaking down a complex task into executable plans followed by grounding to the robotic environment. We leverage environmental state information and error messages during execution to guide the LLM planner for task resolution. Our paper achieves improved results in the popular VirtualHome robotic simulation environment

Download Paper

Vineet Bhat

Sitemap

Pages

Posts

experience

projects

publications

teaching