puter.ai.txt2speech()

Websites Puter Apps Node.js Workers

Converts text into speech using AI. Supports multiple languages and voices.

Syntax

puter.ai.txt2speech(text, testMode = false)
puter.ai.txt2speech(text, options)
puter.ai.txt2speech(text, language, testMode = false)
puter.ai.txt2speech(text, language, voice, testMode = false)
puter.ai.txt2speech(text, language, voice, engine, testMode = false)

Parameters

text (String) (required)

A string containing the text you want to convert to speech. The text must be less than 3000 characters long. Defaults to AWS Polly provider when no options are provided.

testMode (Boolean) (optional)

When true, the call returns a sample audio so you can perform tests without incurring usage. Defaults to false.

options (Object) (optional)

Additional settings for the generation request. Available options depend on the provider.

Option Type Description
provider String TTS provider to use. 'aws-polly' (default), 'openai', 'elevenlabs', 'gemini', 'xai'
model String Model identifier (provider-specific)
voice String Voice ID used for synthesis (provider-specific)
test_mode Boolean When true, returns a sample audio without using credits

AWS Polly Options

Available when provider: 'aws-polly' (default):

Option Type Description
voice String Voice ID. Defaults to 'Joanna'. See available voices
engine String Synthesis engine. Available: 'standard' (default), 'neural', 'long-form', 'generative'
language String Language code. Defaults to 'en-US'. See supported languages
ssml Boolean When true, text is treated as SSML markup

OpenAI Options

Available when provider: 'openai':

Option Type Description
model String TTS model. Available: 'gpt-4o-mini-tts' (default), 'tts-1', 'tts-1-hd'
voice String Voice ID. Available: 'alloy' (default), 'ash', 'ballad', 'coral', 'echo', 'fable', 'nova', 'onyx', 'sage', 'shimmer'
response_format String Output format. Available: 'mp3' (default), 'wav', 'opus', 'aac', 'flac', 'pcm'
instructions String Additional guidance for voice style (tone, speed, mood, etc.)

For more details about each option, see the OpenAI TTS API reference.

ElevenLabs Options

Available when provider: 'elevenlabs':

Option Type Description
model String TTS model. Available: 'eleven_multilingual_v2' (default), 'eleven_flash_v2_5', 'eleven_turbo_v2_5', 'eleven_v3'
voice String Voice ID. Defaults to '21m00Tcm4TlvDq8ikWAM' (Rachel sample voice)
output_format String Output format. Defaults to 'mp3_44100_128'
voice_settings Object Voice tuning options (stability, similarity boost, speed)

For more details about each option, see the ElevenLabs API reference.

Gemini Options

Available when provider: 'gemini':

Option Type Description
model String TTS model. Available: 'gemini-2.5-flash-preview-tts' (default), 'gemini-2.5-pro-preview-tts', 'gemini-3.1-flash-tts-preview'
voice String Voice name. Defaults to 'Kore'. Available: 'Zephyr', 'Puck', 'Charon', 'Kore', 'Fenrir', 'Leda', 'Orus', 'Aoede', 'Callirrhoe', 'Autonoe', 'Enceladus', 'Iapetus', 'Umbriel', 'Algieba', 'Despina', 'Erinome', 'Algenib', 'Rasalgethi', 'Laomedeia', 'Achernar', 'Alnilam', 'Schedar', 'Gacrux', 'Pulcherrima', 'Achird', 'Zubenelgenubi', 'Vindemiatrix', 'Sadachbia', 'Sadaltager', 'Sulafat'
instructions String Natural language instructions to control speaking style (tone, speed, mood, etc.)

For more details about Gemini TTS, see the Google Gemini TTS documentation.

xAI (Grok) Options

Available when provider: 'xai':

Option Type Description
voice String Voice ID. Available: 'eve' (default, energetic), 'ara' (warm), 'rex' (confident), 'sal' (smooth), 'leo' (authoritative)
language String BCP-47 language code. Defaults to 'en'. Supports 'auto' for auto-detection and 20+ languages
output_format String Output codec. Available: 'mp3' (default), 'wav', 'pcm', 'mulaw', 'alaw'

Text supports inline speech tags like [pause], [laugh] and wrapping tags like <whisper>text</whisper> for expressive delivery. Maximum 15,000 characters per request.

For more details, see the xAI TTS documentation.

Return value

A Promise that resolves to an HTMLAudioElement. The element’s src points at a blob or remote URL containing the synthesized audio.

Examples

Convert text to speech (Shorthand)

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <button id="play">Speak!</button>
    <script>
        document.getElementById('play').addEventListener('click', ()=>{
            puter.ai.txt2speech(`Hello world! Puter is pretty amazing, don't you agree?`).then((audio)=>{
                audio.play();
            });
        });
    </script>
</body>
</html>

Convert text to speech using options

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <button id="play">Speak with options!</button>
    <script>
        document.getElementById('play').addEventListener('click', ()=>{
            puter.ai.txt2speech(`Hello world! This is using a neural voice.`, {
                voice: "Joanna",
                engine: "neural",
                language: "en-US"
            }).then((audio)=>{
                audio.play();
            });
        });
    </script>
</body>
</html>

Use OpenAI voices

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <button id="play">Use OpenAI voice</button>
    <script>
        document.getElementById('play').addEventListener('click', async ()=>{
            const audio = await puter.ai.txt2speech(
                "Hello! This sample uses the OpenAI alloy voice.",
                {
                    provider: "openai",
                    model: "gpt-4o-mini-tts",
                    voice: "alloy",
                    response_format: "mp3",
                    instructions: "Sound cheerful but not overly fast."
                }
            );
            audio.play();
        });
    </script>
</body>
</html>

Use ElevenLabs voices

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <button id="play">Use ElevenLabs voice</button>
    <script>
        document.getElementById('play').addEventListener('click', async ()=>{
            const audio = await puter.ai.txt2speech(
                "Hello! This sample uses an ElevenLabs voice.",
                {
                    provider: "elevenlabs",
                    model: "eleven_multilingual_v2",
                    voice: "21m00Tcm4TlvDq8ikWAM",
                    output_format: "mp3_44100_128"
                }
            );
            audio.play();
        });
    </script>
</body>
</html>

Use Gemini voices

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <button id="play">Use Gemini voice</button>
    <script>
        document.getElementById('play').addEventListener('click', async ()=>{
            const audio = await puter.ai.txt2speech(
                "Hello! This sample uses the Gemini Puck voice.",
                {
                    provider: "gemini",
                    model: "gemini-2.5-flash-preview-tts",
                    voice: "Puck",
                    instructions: "Speak in a friendly, upbeat tone."
                }
            );
            audio.play();
        });
    </script>
</body>
</html>

Use xAI (Grok) voices

<html>
<body>
    <script src="https://js.puter.com/v2/"></script>
    <button id="play">Use xAI voice</button>
    <script>
        document.getElementById('play').addEventListener('click', async ()=>{
            const audio = await puter.ai.txt2speech(
                "Hello! This sample uses the xAI Eve voice.",
                {
                    provider: "xai",
                    voice: "eve",
                    language: "en"
                }
            );
            audio.play();
        });
    </script>
</body>
</html>

Compare different engines

<html>
<head>
    <style>
        body { font-family: Arial, sans-serif; max-width: 600px; margin: 0 auto; padding: 20px; }
        textarea { width: 100%; height: 80px; margin: 10px 0; }
        button { margin: 5px; padding: 10px 15px; cursor: pointer; }
        .status { margin: 10px 0; padding: 5px; font-size: 14px; }
    </style>
</head>
<body>
    <script src="https://js.puter.com/v2/"></script>
    
    <h1>Text-to-Speech Engine Comparison</h1>
    
    <textarea id="text-input" placeholder="Enter text to convert to speech...">Hello world! This is a test of the text-to-speech engines.</textarea>
    
    <div>
        <button onclick="playAudio('standard')">Standard Engine</button>
        <button onclick="playAudio('neural')">Neural Engine</button>
        <button onclick="playAudio('generative')">Generative Engine</button>
    </div>
    
    <div id="status" class="status"></div>

    <script>
        const textInput = document.getElementById('text-input');
        const statusDiv = document.getElementById('status');
        
        async function playAudio(engine) {
            const text = textInput.value.trim();
            
            if (!text) {
                statusDiv.textContent = 'Please enter some text first!';
                return;
            }
            
            if (text.length > 3000) {
                statusDiv.textContent = 'Text must be less than 3000 characters!';
                return;
            }
            
            statusDiv.textContent = `Converting with ${engine} engine...`;
            
            try {
                const audio = await puter.ai.txt2speech(text, {
                    voice: "Joanna",
                    engine: engine,
                    language: "en-US"
                });
                
                statusDiv.textContent = `Playing ${engine} audio`;
                audio.play();
            } catch (error) {
                statusDiv.textContent = `Error: ${error.message}`;
            }
        }
    </script>
</body>
</html>