FlashSpeech: Efficient Zero-Shot Speech Synthesis
Proactive Detection of Voice Cloning with Localized Watermarking
Download PDF
Audiobox: Unified Audio Generation with Natural Language Processing
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
PDF
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers