Image Support
llms.py supports image inputs through vision-capable models, allowing you to analyze, describe, and extract information from images.
CLI Usage
Section titled “CLI Usage”# Use default image templatellms --image ./screenshot.png
# With custom promptllms --image ./photo.jpg "What's in this image?"
# Remote image URLllms --image https://example.org/photo.jpg "Describe this photo"
# With specific vision modelllms -m gemini-2.5-flash --image chart.png "Analyze this chart"llms -m qwen2.5vl --image document.jpg "Extract text"
# Combined with system promptllms -s "You are a data analyst" --image graph.png "What trends do you see?"Web UI
Section titled “Web UI”- Click the image attachment icon
- Drag & drop or select an image
- Or paste an image from clipboard
- Add your prompt
- Send
Supported Formats
Section titled “Supported Formats”- PNG
- WEBP
- JPG/JPEG
- GIF
- BMP
- TIFF
- ICO
Image Sources
Section titled “Image Sources”- Local files: Absolute or relative paths
- Remote URLs: HTTP/HTTPS URLs (automatically downloaded)
- Data URIs: Base64-encoded images
- Clipboard: Paste directly in web UI
Vision-Capable Models
Section titled “Vision-Capable Models”Popular models that support image analysis:
- OpenAI: GPT-4o, GPT-4o-mini, GPT-4.1
- Anthropic: Claude Sonnet 4.0, Claude Opus 4.1
- Google: Gemini 2.5 Pro, Gemini Flash
- Qwen: Qwen2.5-VL, Qwen3-VL, QVQ-max
- Ollama: qwen2.5vl, llava
Custom Templates
Section titled “Custom Templates”You can create custom request templates for image processing:
Image Template Example
Section titled “Image Template Example”{ "model": "qwen2.5vl", "messages": [ { "role": "user", "content": [ { "type": "image_url", "image_url": {"url": ""} }, { "type": "text", "text": "Caption this image" } ] } ]}Usage:
llms --chat image-request.json --image photo.jpgNext Steps
Section titled “Next Steps”- Audio Support - Process audio files
- File Support - Work with PDFs and documents
- Web UI - Use multi-modal features in the UI
- Providers - See which providers support vision