
An Open-Source AI Tool for Voice Generation
Are you a scientist, developer or just a tinkerer like me? Are you fascinated with the power of AI to generate and clone a human voice to include in your work? OpenAudio might be what you are looking for. Leveraging the power of Pinokio, it’s easy to download and install OpenAudio on your computer. In this brief introduction, I am using an M3 MacBook Air with 16 GB RAM. Follow these instructions to install Pinokio on your computer and discover how easy AI-generated speech can become. Pinokio is a browser that enables you to install, run, and automate any AI on your computer.
Now that Pinokio is installed, I click on the ‘Discover’ button at the top right side of the application browser and look for OpenAudio, which is the first application listed in the Apps section. Pinokio. is open source with an MIT license, and OpenAudio is open source with an Apache 2.0 license. It is based on FishSpeech and has recently rebranded itself as OpenAudio.

The project has seventy-seven contributors and states on their website that: “We are incredibly excited to unveil OpenAudio S1, a cutting-edge text-to-speech (TTS) model that redefines the boundaries of voice generation. Trained on an extensive dataset of over 2 million hours of audio, OpenAudio S1 delivers unparalleled naturalness, expressiveness, and instruction-following capabilities.”
This model was easy to install on Pinokio, and you can quickly and easily start producing your AI-generated speech with it. Your experience may vary depending on your processor and RAM.

Once installed, you will be presented with this easy-to-use interface.

These four lines of text generated the audio in 77 seconds in WAV format and resulted in 8 seconds of audio in a 684 KB file. There is a download button at the top right of the playback window.
Listen to the audio and judge for yourself.
In addition to text-to-speech synthesis, OpenAudio supports voice cloning. You can use your voice or upload a sample. Five to ten seconds of reference audio is helpful for the generation of the cloned voice. There is a dialogue box at the lower left of the display where this is accomplished, along with other controls that override the default settings.
Use of this model is governed by Creative Commons CC by NC-SA 4.0. The project also includes a caveat:
“We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.”
The model is a text-to-speech model based on VQ-GAN and Llama developed by Fish Audio. There are links to the source code and models. The project maintains a Discord channel and a presence on X. Visit the OpenAudio blog for up-to-date information and research.
Have some fun and install Pinokio and OpenAudio on your computer today. Leverage the power of open source and AI in your projects and join their community of developers if you are inclined.