Introducing OpenVoice: Unprecedented Speed and Accuracy in Voice Cloning
A new open-source AI, developed by researchers at MIT, Tsinghua University, and Canadian startup MyShell, offers voice cloning with unprecedented speed and accuracy. OpenVoice uses just seconds of audio to clone a voice and allows granular control over tone, emotion, accent, rhythm, and more.
Technology Overview
OpenVoice comprises two AI models working together for text-to-speech conversion and voice tone cloning. The first model handles language style, accents, emotion, and other speech patterns, trained on 30,000 audio samples with varying emotions from English, Chinese, and Japanese speakers. The second “tone converter” model learned from over 300,000 samples encompassing 20,000 voices.
Speed and Efficiency
By combining the universal speech model with a user-provided voice sample, OpenVoice can clone voices with very little data. This helps it generate cloned speech significantly faster than alternatives like Meta’s Voicebox.
MyShell and OpenVoice
OpenVoice comes from California-based startup MyShell, founded in 2023. With $5.6 million in early funding and over 400,000 users already, MyShell bills itself as a decentralised platform for creating and discovering AI apps.
OpenVoice Monetization
MyShell open-sourced its voice cloning capabilities while monetizing its broader app ecosystem, standing to increase users across both while advancing an open model of AI development.
Additional MyShell Products
In addition to pioneering instant voice cloning, MyShell offers original text-based chatbot personalities, meme generators, user-created text RPGs, and more. Some content is locked behind a subscription fee. The company also charges bot creators to promote their bots on its platform.
Learn More about OpenVoice and MyShell
Visit MyShell on HuggingFace to try OpenVoice and explore their AI app ecosystem.