Unmute: Make LLMs listen and speak
kyutai-labs/moshi: Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. - kyutai-labs/moshi
fixie-ai/ultravox: A fast multimodal LLM for real-time voice
A fast multimodal LLM for real-time voice. Contribute to fixie-ai/ultravox development by creating an account on GitHub.
SpeechBrain: Open-Source Conversational AI for Everyone
LiveKit
Instantly transport audio and video between LLMs and your users.
pipecat-ai/pipecat: Open Source framework for voice and multimodal conversational AI
Open Source framework for voice and multimodal conversational AI - pipecat-ai/pipecat
kyutai: open science AI lab