Audio Support
llms.py supports audio file inputs through audio-capable models, enabling transcription, summarization, and analysis of audio content.
CLI Usage
Section titled “CLI Usage”# Use default audio template (transcribe)llms --audio ./recording.mp3
# With custom promptllms --audio ./meeting.wav "Summarize this meeting"
# Remote audio URLllms --audio https://example.org/podcast.mp3 "Key points?"
# With specific audio modelllms -m gpt-4o-audio-preview --audio interview.mp3 "Extract topics"llms -m gemini-2.5-flash --audio talk.mp3 "Transcribe"
# Combined with system promptllms -s "You're a transcription specialist" --audio talk.mp3 "Detailed transcript"Web UI
Section titled “Web UI”- Click the audio attachment icon
- Select an audio file
- Add your prompt
- Send
Supported Formats
Section titled “Supported Formats”- MP3
- WAV
- M4A (depending on provider)
Audio Sources
Section titled “Audio Sources”- Local files: Absolute or relative paths
- Remote URLs: HTTP/HTTPS URLs (automatically downloaded)
- Base64 Data: Base64-encoded audio
Audio-Capable Models
Section titled “Audio-Capable Models”Popular models that support audio processing:
- OpenAI: gpt-4o-audio-preview
- Google: gemini-2.5-pro, gemini-2.5-flash, gemini-2.5-flash-lite
Custom Templates
Section titled “Custom Templates”You can create custom request templates for audio processing:
Audio Template Example
Section titled “Audio Template Example”{ "model": "gpt-4o-audio-preview", "messages": [ { "role": "user", "content": [ { "type": "input_audio", "input_audio": {"data": "", "format": "mp3"} }, { "type": "text", "text": "Please transcribe this audio" } ] } ]}Usage:
llms --chat audio-request.json --audio recording.mp3Next Steps
Section titled “Next Steps”- Image Support - Analyze images with vision models
- File Support - Work with PDFs and documents
- Web UI - Use multi-modal features in the UI
- Providers - See which providers support audio