{"id":11437,"date":"2025-08-11T01:02:00","date_gmt":"2025-08-11T05:02:00","guid":{"rendered":"https:\/\/www.both.org\/?p=11437"},"modified":"2026-02-04T13:53:03","modified_gmt":"2026-02-04T18:53:03","slug":"an-open-source-ai-tool-for-voice-generation","status":"publish","type":"post","link":"http:\/\/www.both.org\/?p=11437","title":{"rendered":"An Open-Source AI Tool for Voice Generation"},"content":{"rendered":"<div class=\"pld-like-dislike-wrap pld-template-1\">\r\n    <div class=\"pld-like-wrap  pld-common-wrap\">\r\n    <a href=\"javascript:void(0)\" class=\"pld-like-trigger pld-like-dislike-trigger  \" title=\"\" data-post-id=\"11437\" data-trigger-type=\"like\" data-restriction=\"cookie\" data-already-liked=\"0\">\r\n                        <i class=\"fas fa-thumbs-up\"><\/i>\r\n                <\/a>\r\n    <span class=\"pld-like-count-wrap pld-count-wrap\">    <\/span>\r\n<\/div><\/div>\n<p>Are you a scientist, developer or just a tinkerer like me? Are you fascinated with the power of AI to generate and clone a human voice to include in your work? OpenAudio might be what you are looking for. Leveraging the power of Pinokio, it\u2019s easy to download and install OpenAudio on your computer. In this brief introduction, I am using an M3 MacBook Air with 16 GB RAM. Follow these instructions to&nbsp;install&nbsp;Pinokio on your computer and discover how easy AI-generated speech can become.&nbsp;<a href=\"https:\/\/pinokio.co\/\">Pinokio<\/a>&nbsp;is a browser that enables you to install, run, and automate any AI on your computer.<\/p>\n\n\n\n<p><span style=\"box-sizing: border-box; margin: 0px; padding: 0px;\">Now that Pinokio is installed, I click on the \u2018Discover\u2019 button at the top right side of the application browser and look for&nbsp;<a href=\"https:\/\/github.com\/fishaudio\" target=\"_blank\">OpenAudio,<\/a>&nbsp;which is the first application listed in the Apps section.<\/span> Pinokio. is open source with an MIT license, and OpenAudio is&nbsp;<a href=\"https:\/\/github.com\/fishaudio\/fish-speech?tab=Apache-2.0-1-ov-file#readme\">open source<\/a>&nbsp;with an Apache 2.0 license. It is based on FishSpeech and has recently rebranded itself as OpenAudio.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"676\" height=\"1364\" src=\"http:\/\/www.both.org\/wp-content\/uploads\/2025\/08\/AppImage.png\" alt=\"Image of OpenAudio to download from Pinokio app store\" class=\"wp-image-11439\" style=\"width:406px;height:auto\"\/><\/figure>\n\n\n\n<p>The project has seventy-seven contributors and states on their website that: \u201cWe are incredibly excited to unveil OpenAudio S1, a cutting-edge text-to-speech (TTS) model that redefines the boundaries of voice generation. Trained on an extensive dataset of over 2 million hours of audio, OpenAudio S1 delivers unparalleled naturalness, expressiveness, and instruction-following capabilities.\u201d<\/p>\n\n\n\n<p>This model was easy to install on Pinokio, and you can quickly and easily start producing your AI-generated speech with it. Your experience may vary depending on your processor and RAM.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"337\" src=\"http:\/\/www.both.org\/wp-content\/uploads\/2025\/08\/OpenAudio.png\" alt=\"OpenAudio Install button\" class=\"wp-image-11440\" style=\"width:773px;height:auto\"\/><\/figure>\n\n\n\n<p>Once installed, you will be presented with this easy-to-use interface.<\/p>\n\n\n\n<figure class=\"wp-block-image aligncenter size-full is-resized\"><img loading=\"lazy\" decoding=\"async\" width=\"768\" height=\"260\" src=\"http:\/\/www.both.org\/wp-content\/uploads\/2025\/08\/FishSpeech.png\" alt=\"Fish speech interface for creating text to speech and voice cloning. \" class=\"wp-image-11442\" style=\"width:924px;height:auto\"\/><\/figure>\n\n\n\n<p>These four lines of text generated the audio in 77 seconds in&nbsp;<strong>WAV<\/strong>&nbsp;format and resulted in 8 seconds of audio in a 684 KB file. There is a download button at the top right of the playback window.<\/p>\n\n\n\n<p>Listen to the audio and judge for yourself.<\/p>\n\n\n\n<figure class=\"wp-block-audio\"><audio controls src=\"http:\/\/www.both.org\/wp-content\/uploads\/2025\/08\/ATO_FishAudio.wav\"><\/audio><\/figure>\n\n\n\n<p>In addition to text-to-speech synthesis, OpenAudio supports voice cloning. You can use your voice or upload a sample. Five to ten seconds of reference audio is helpful for the generation of the cloned voice. There is a dialogue box at the lower left of the display where this is accomplished, along with other controls that override the default settings.<\/p>\n\n\n\n<p>Use of this model is governed by Creative Commons CC by NC-SA 4.0. The project also includes a caveat:<\/p>\n\n\n\n<p>\u201cWe do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.\u201d<\/p>\n\n\n\n<p>The model is a text-to-speech model based on VQ-GAN and Llama developed by&nbsp;<a href=\"https:\/\/fish.audio\/\" target=\"_blank\" rel=\"noreferrer noopener\">Fish Audio<\/a>. There are links to the&nbsp;<a href=\"https:\/\/github.com\/fishaudio\/fish-speech\">source code<\/a>&nbsp;and&nbsp;<a href=\"https:\/\/huggingface.co\/fishaudio\/fish-speech-1.5\">models<\/a>. The project maintains a&nbsp;<a href=\"https:\/\/discord.com\/invite\/Es5qTB9BcN\">Discord<\/a>&nbsp;channel and a presence on&nbsp;<a href=\"https:\/\/openaudio.com\/blogs\/s1\">X<\/a>. Visit the OpenAudio&nbsp;<a href=\"https:\/\/openaudio.com\/blogs\/s1\">blog<\/a>&nbsp;for up-to-date information and research.<\/p>\n\n\n\n<p>Have some fun and install Pinokio and OpenAudio on your computer today. Leverage the power of open source and AI in your projects and join their community of developers if you are inclined.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Are you a scientist, developer or just a tinkerer like me? Are you fascinated with the power of<\/p>\n","protected":false},"author":32,"featured_media":9763,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_lmt_disableupdate":"","_lmt_disable":"","footnotes":""},"categories":[307,5,503],"tags":[],"class_list":["post-11437","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai","category-linux","category-linux-101"],"modified_by":"David Both","_links":{"self":[{"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11437","targetHints":{"allow":["GET"]}}],"collection":[{"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/users\/32"}],"replies":[{"embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=11437"}],"version-history":[{"count":5,"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11437\/revisions"}],"predecessor-version":[{"id":11448,"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/posts\/11437\/revisions\/11448"}],"wp:featuredmedia":[{"embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=\/wp\/v2\/media\/9763"}],"wp:attachment":[{"href":"http:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=11437"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=11437"},{"taxonomy":"post_tag","embeddable":true,"href":"http:\/\/www.both.org\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=11437"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}