Speech-to-Text

Note

Before proceeding, you should be familiar with the OpenAI Speech-to-Text and the relevant OpenAI API reference

Download a STT model

export SPEACHES_BASE_URL="http://localhost:8000"

# Listing all available STT models
uvx speaches-cli registry ls --task automatic-speech-recognition | jq '.data | [].id'

# Downloading a Systran/faster-distil-whisper-small.en model
uvx speaches-cli model download Systran/faster-distil-whisper-small.en

# Check that the model has been installed
uvx speaches-cli model ls --task text-to-speech | jq '.data | map(select(.id == "Systran/faster-distil-whisper-small.en"))'

Usage

Curl

export SPEACHES_BASE_URL="http://localhost:8000"
export TRANSCRIPTION_MODEL_ID="Systran/faster-distil-whisper-small.en"

curl -s "$SPEACHES_BASE_URL/v1/audio/transcriptions" -F "file=@audio.wav" -F "model=$TRANSCRIPTION_MODEL_ID"

Python

httpx

import httpx

with open('audio.wav', 'rb') as f:
    files = {'file': ('audio.wav', f)}
    response = httpx.post('http://localhost:8000/v1/audio/transcriptions', files=files)

print(response.text)

OpenAI SDKs

Note

Although this project doesn't require an API key, all OpenAI SDKs require an API key. Therefore, you will need to set it to a non-empty value. Additionally, you will need to overwrite the base URL to point to your server.

This can be done by setting the OPENAI_API_KEY and OPENAI_BASE_URL environment variables or by passing them as arguments to the SDK.

PythonCLIOther

from pathlib import Path

from openai import OpenAI

client = OpenAI()

with Path("audio.wav").open("rb") as audio_file:
    transcription = client.audio.transcriptions.create(model="Systran/faster-whisper-small", file=audio_file)

print(transcription.text)

export OPENAI_BASE_URL=http://localhost:8000/v1/
export OPENAI_API_KEY="cant-be-empty"
openai api audio.transcriptions.create -m Systran/faster-whisper-small -f audio.wav --response-format text

See OpenAI libraries.