BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of dataDownload PDF#Amazon#Text-to-Speech#Large Language Models#Paper#PDF#Emergence·arxiv.org·Feb 15, 2024BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers#Text-to-Speech#Microsoft#Generative Models#Audio#Paper#PDF·arxiv.org·Apr 30, 2023NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers#Text-to-Speech#Paper#PDF#Microsoft·arxiv.org·Jan 9, 2023Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers