kyutai-labs/moshi: Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec.
Moshi is a speech-text foundation model and full-duplex spoken dialogue framework. It uses Mimi, a state-of-the-art streaming neural audio codec. - kyutai-labs/moshi
AudioPen | The easiest way to convert messy thoughts into clear text
AudioPen transcribes and summarizes unstructured voice notes into text that’s easy to read and ready to share.
If you like thinking out loud, you'll love Audio Pen. It's like having a personal assistant who records and summarizes your thoughts.
AssemblyAI Speech-to-Text API | Automatic Speech Recognition
Accurately convert speech to text with powerful AI models. Used by Fortune 500s, startups, and developers. Voted Best API of 2020, and funded by Insight Partners, Accel, and Y Combinator.