Adversarial Training for Low-Resource Disfluency Correction

We introduce an adversarially-trained sequence-tagging model for Disfluency Correction (DC), leveraging a small labeled dataset with large-scale unlabeled data to enhance performance. Our approach relies on synthetically generated disfluent data, enabling robust DC across Bengali, Hindi, and Marathi, as well as stuttering disfluencies in ASR transcripts. We achieve a 6.15-point F1-score improvement over competitive baselines, establishing a new benchmark for correcting speech impairments and conversational disfluencies. To our knowledge, this is the first use of adversarial training for DC, marking a significant advancement in speech-to-text processing.

Paper Code