Text-to-Speech
Note
Before proceeding, you should be familiar with the OpenAI Text-to-Speech and the relevant OpenAI API reference
Download a TTS model
export SPEACHES_BASE_URL="http://localhost:8000"
# Listing all available TTS models
uvx speaches-cli registry ls --task text-to-speech | jq '.data | [].id'
# Downloading a TTS model
uvx speaches-cli model download speaches-ai/Kokoro-82M-v1.0-ONNX
# Check that the model has been installed
uvx speaches-cli model ls --task text-to-speech | jq '.data | map(select(.id == "speaches-ai/Kokoro-82M-v1.0-ONNX"))'
Usage
Curl
export SPEACHES_BASE_URL="http://localhost:8000"
export MODEL_ID="speaches-ai/Kokoro-82M-v1.0-ONNX"
# List available voices
# TODO
export VOICE_ID="af_heart"
# Generate speech
curl "$SPEACHES_BASE_URL/v1/audio/speech" -s -H "Content-Type: application/json" \
--output audio.mp3 \
--data @- << EOF
{
"input": "Hello World!",
"model": "$MODEL_ID",
"voice": "$VOICE_ID"
}
EOF
curl "$SPEACHES_BASE_URL/v1/audio/speech" -s -H "Content-Type: application/json" \
--output audio.wav \
--data @- << EOF
{
"input": "Hello World!",
"model": "$MODEL_ID",
"voice": "$VOICE_ID",
"response_format": "wav"
}
EOF
curl "$SPEACHES_BASE_URL/v1/audio/speech" -s -H "Content-Type: application/json" \
--output audio.mp3 \
--data @- << EOF
{
"input": "Hello World!",
"model": "$MODEL_ID",
"voice": "$VOICE_ID",
"speed": 2.0
}
EOF
Python
from pathlib import Path
import httpx
client = httpx.Client(base_url="http://localhost:8000/")
model_id = "speaches-ai/Kokoro-82M-v1.0-ONNX"
voice_id = "af_heart"
res = client.post(
"v1/audio/speech",
json={
"model": model_id,
"voice": voice_id,
"input": "Hello, world!",
"response_format": "mp3",
"speed": 1,
},
).raise_for_status()
with Path("output.mp3").open("wb") as f:
f.write(res.read())
OpenAI SDKs
Note
Although this project doesn't require an API key, all OpenAI SDKs require an API key. Therefore, you will need to set it to a non-empty value. Additionally, you will need to overwrite the base URL to point to your server.
This can be done by setting the OPENAI_API_KEY
and OPENAI_BASE_URL
environment variables or by passing them as arguments to the SDK.
from pathlib import Path
from openai import OpenAI
openai = OpenAI(base_url="http://localhost:8000/v1", api_key="cant-be-empty")
model_id = "speaches-ai/Kokoro-82M-v1.0-ONNX"
voice_id = "af_heart"
res = openai.audio.speech.create(
model=model_id,
voice=voice_id,
input="Hello, world!",
response_format="mp3",
speed=1,
)
with Path("output.mp3").open("wb") as f:
f.write(res.response.read())
See OpenAI libraries
Limitations
response_format
:opus
andaac
are not supported- Maximum audio generation length is 10 seconds for Piper models