VALL-E is a neural codec language model using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather. VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as a prompt. We also extend VALL-E and train a multi-lingual conditional codec language model. VALL-E X can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker’s voice, emotion, and acoustic environment.
Voicery creates natural-sounding Text-to-Speech (TTS) engines and custom brand voices for enterprise. Our solutions leverage cutting-edge deep-learning research optimized for your business use-case and technical infrastructure.
Lovo - AI Voice Generator: Realistic Text to Speech & Voice Cloning
Award-winning AI Voice Generator and text to speech software with 500+ voices in 100 languages. Realistic AI Voices with Online Video Editor. Clone your own voice.
Speech to Note - Voice to Text, Note Speech & Speak Writer Solution
Explore Speech to Note for top-notch voice to text, note speech, and speak writer solutions. Our AI technology powered by GPT-4o ensures easy conversion of your voice into written notes.
Thorsten-Voice: A free to use, offline working, high quality german TTS voice should be available for every project without any license struggling. - thorstenMueller/Thorsten-Voice
This project is aimed at developing and maintaining the NVDA IBMTTS driver. IBMTTS is a synthesizer similar to Eloquence. Please send your ideas and contributions here! - davidacm/NVDA-IBMTTS-Driver