Audio Support

llms.py supports audio file inputs through audio-capable models, enabling transcription, summarization, and analysis of audio content.

CLI Usage

# Use default audio template (transcribe)
llms --audio ./recording.mp3

# With custom prompt
llms --audio ./meeting.wav "Summarize this meeting"

# Remote audio URL
llms --audio https://example.org/podcast.mp3 "Key points?"

# With specific audio model
llms -m gpt-4o-audio-preview --audio interview.mp3 "Extract topics"
llms -m gemini-2.5-flash --audio talk.mp3 "Transcribe"

# Combined with system prompt
llms -s "You're a transcription specialist" --audio talk.mp3 "Detailed transcript"

Web UI

Click the audio attachment icon
Select an audio file
Add your prompt
Send

Supported Formats

MP3
WAV
M4A (depending on provider)

Audio Sources

Local files: Absolute or relative paths
Remote URLs: HTTP/HTTPS URLs (automatically downloaded)
Base64 Data: Base64-encoded audio

Audio-Capable Models

Popular models that support audio processing:

OpenAI: gpt-4o-audio-preview
Google: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite

Custom Templates

You can create custom request templates for audio processing:

Audio Template Example

{
  "model": "gpt-4o-audio-preview",
  "messages": [
    {
      "role": "user",
      "content": [
        {
          "type": "input_audio",
          "input_audio": {"data": "", "format": "mp3"}
        },
        {
          "type": "text",
          "text": "Please transcribe this audio"
        }
      ]
    }
  ]
}

Usage:

llms --chat audio-request.json --audio recording.mp3

Next Steps

Image Support - Analyze images with vision models
File Support - Work with PDFs and documents
Web UI - Use multi-modal features in the UI
Providers - See which providers support audio