Audio computing

136 bookmarks

Newest

EaseText - Text to Speech, Image to Text, Audio to Text

EaseText software provides simple and convenient offline Image to Text Converter, Audio to Text Converter software.

TTS Synthesizer

·easetext.com·Mar 23, 2025

EaseText - Text to Speech, Image to Text, Audio to Text

Speechelo - The Best Text To Speech Softare

TTS Synthesizer

·speechelo-offer.com·Mar 23, 2025

Speechelo - The Best Text To Speech Softare

Camb.ai: AI Voice Translation & Dubbing for Videos

Camb.ai is AI-driven video content localization platform built for content creators and media producers. Join 100s of video first companies who use Camb.ai to d

Speech recognition #OS Compatibility: web app

·camb.ai·Mar 23, 2025

Camb.ai: AI Voice Translation & Dubbing for Videos

Plachtaa/VALL-E-X: An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/ - Plachtaa/VALL-E-X

TTS Synthesizer #Source Code: GitHub #Type: Open-Source

·github.com·Mar 23, 2025

Plachtaa/VALL-E-X: An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

rsxdalv/tts-generation-webui: TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)

TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS) - rsxdalv/tts-generation-webui

TTS Synthesizer #❤️#Source Code: GitHub #Type: Open-Source

·github.com·Mar 23, 2025

rsxdalv/tts-generation-webui: TTS Generation Web UI (Bark, MusicGen + AudioGen, Tortoise, RVC, Vocos, Demucs, SeamlessM4T, MAGNet, StyleTTS2, MMS, Stable Audio, Mars5, F5-TTS, ParlerTTS)

OpenAI.fm

An interactive demo for developers to try the new text-to-speech model in the OpenAI API

Voice samples

·openai.fm·Mar 22, 2025

OpenAI.fm

Bland AI | Automate Phone Calls with Conversational AI for Enterprises

Transform your enterprise communication with Bland AI. Automate inbound and outbound phone calls using AI that sounds human. Perfect for sales, customer support, and operations with customizable voices and seamless integrations.

·bland.ai·Mar 13, 2025

Bland AI | Automate Phone Calls with Conversational AI for Enterprises

AssemblyAI | AI models to transcribe and understand speech

With AssemblyAI's industry-leading Speech AI models, transcribe speech to text and extract insights from your voice data.

Speech recognition

·assemblyai.com·Feb 19, 2025

AssemblyAI | AI models to transcribe and understand speech

Nuance - Dragon Speech Recognition

Work faster and smarter and speed document creation and automate workflows with the world's best-selling speech recognition solution.

Speech recognition

·nuance.com·Dec 28, 2024

Nuance - Dragon Speech Recognition

Text to Speech: Generate natural sounding voices and voice overs

Download voices as MP3. Create phone announcements, YouTube, Explainer, E-learning Videos and more.

TTS Synthesizer

·voiceovermaker.io·Dec 28, 2024

Text to Speech: Generate natural sounding voices and voice overs

Voicy Speech to Text

Speech to Text Chrome Extension Write with your voice on every website. AI-powered dictation tool.

Speech recognition #Software: Extension

·usevoicy.com·Dec 28, 2024

Voicy Speech to Text

DuRT - Speech Recognition

Description will go into a meta tag in head /

Speech recognition #OS compatibility: macOS

·durt.dudufuture.top·Dec 11, 2024

DuRT - Speech Recognition

‎Transcriptor

‎Convert voice to text in real time! The UI couldn't be simpler! You can edit, search and share all your transcriptions. Your transcriptions are automatically saved to iCloud. Supported languages: English, Arabic, Chinese, Dutch, French, German, Hindi, Indonesian, Italian, Japanese, Korean, Polish…

Speech recognition #OS compatibility: macOS

·apps.apple.com·Dec 10, 2024

‎Transcriptor

ken107/piper-browser-extension · GitHub

Provides Piper neural text-to-speech voices as a browser extension - ken107/piper-browser-extension

TTS model #Source Code: GitHub #Type: Open-Source

·github.com·Dec 4, 2024

ken107/piper-browser-extension · GitHub

TTSMaker - Free Text to Speech Online

TTSMaker is a free text-to-speech tool and an online text reader that can convert text to speech, as an AI voice generator, it supports 100+ languages and 300+ voice styles, powerful neural network makes speech sound more natural, you can listen online, or download audio files in mp3, wav format.

TTS Synthesizer #OS Compatibility: web app

·ttsmaker.com·Dec 4, 2024

TTSMaker - Free Text to Speech Online

abus-aikorea/voice-pro · GitHub

Comprehensive Gradio WebUI for audio processing, powered by Whisper engines (Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer, zero-shot Voice Cloning (E2, F5-TTS), YouTube dow...

Speech recognition #Source Code: GitHub #❤️#Type: Open-Source

·github.com·Nov 30, 2024

abus-aikorea/voice-pro · GitHub

X to Voice | ElevenLabs

Analyze your X profile to generate a unique voice using ElevenLabs' new Voice Design feature

#Type: Open-Source

·xtovoice.com·Nov 2, 2024

X to Voice | ElevenLabs

open-mmlab/Amphion: Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audio, music, and speech generation research and development.

Amphion (/æmˈfaɪən/) is a toolkit for Audio, Music, and Speech Generation. Its purpose is to support reproducible research and help junior researchers and engineers get started in the field of audi...

#Source Code: GitHub #Type: Open-Source

·github.com·Oct 28, 2024

Parrot AI - Celebrity Voice Generator

Parrot AI is the top celebrity voice generator. Create fun audio clips to roast your friends, send birthday messages, and light up your group chat!

TTS Synthesizer

·tryparrotai.com·Oct 26, 2024

Parrot AI - Celebrity Voice Generator

w-okada/voice-changer · GitHub

リアルタイムボイスチェンジャー Realtime Voice Changer. Contribute to w-okada/voice-changer development by creating an account on GitHub.

#Source Code: GitHub #Type: Open-Source

·github.com·Oct 26, 2024

w-okada/voice-changer · GitHub

RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Easily train a good VC model with voice data = 10 mins!

Easily train a good VC model with voice data

#Source Code: GitHub #Type: Open-Source

·github.com·Oct 26, 2024

RVC-Project/Retrieval-based-Voice-Conversion-WebUI: Easily train a good VC model with voice data = 10 mins!

neonbjb/tortoise-tts · GitHub

A multi-voice TTS system trained with an emphasis on quality - neonbjb/tortoise-tts

TTS model #Source Code: GitHub #❤️#Type: Open-Source

·github.com·Oct 19, 2024

neonbjb/tortoise-tts · GitHub

SWivid/F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching" - SWivid/F5-TTS

TTS Synthesizer #Source Code: GitHub #Type: Open-Source

·github.com·Oct 19, 2024

SWivid/F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Applio

At the forefront of innovation as an open-source ecosystem that hosts cutting-edge AI voice cloning technologies.

Speech recognition #Type: Open-Source

·applio.org·Oct 17, 2024

Applio

HoldSpeak - Type 3x faster with AI powered voice-to-text

HoldSpeak is a AI-powered app that allows you to type 3x faster

Speech recognition

·holdspeak.com·Oct 4, 2024

HoldSpeak - Type 3x faster with AI powered voice-to-text

voxforge.org - Free Speech... Recognition (Linux, Windows and Mac)

Speech recognition #Type: Open-Source

·voxforge.org·Oct 2, 2024

voxforge.org - Free Speech... Recognition (Linux, Windows and Mac)

VALL-E

VALL-E is a neural codec language model using discrete codes derived from an off-the-shelf neural audio codec model, and regard TTS as a conditional language modeling task rather. VALL-E emerges in-context learning capabilities and can be used to synthesize high-quality personalized speech with only a 3-second enrolled recording of an unseen speaker as a prompt. We also extend VALL-E and train a multi-lingual conditional codec language model. VALL-E X can generate high-quality speech in the target language via just one speech utterance in the source language as a prompt while preserving the unseen speaker’s voice, emotion, and acoustic environment.

TTS Synthesizer

·microsoft.com·Sep 25, 2024

VALL-E