Skip to content

Use Cases

llms.py is versatile and can be used in many different scenarios. Here are some common use cases with practical examples.

Use llms.py as a centralized gateway for all LLM provider access:

from openai import OpenAI
# Point all your code to llms.py
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed"
)
# Use any model from any provider
response = client.chat.completions.create(
model="kimi-k2", # Free via Groq
messages=[{"role": "user", "content": "Hello!"}]
)

Benefits:

  • Single integration point
  • Easy to switch providers
  • Automatic failover
  • Cost optimization

Route to cheapest available providers automatically:

{
"providers": {
"groq": {"enabled": true}, // Free tier, tried first
"openrouter_free": {"enabled": true}, // Free tier, tried second
"openai": {"enabled": true} // Paid, tried last
}
}

Benefits:

  • Minimize API costs
  • Use free tiers first
  • Fallback to paid when needed

Easily switch between models for comparison:

Terminal window
# Test different models
llms -m grok-4 "Explain quantum computing"
llms -m claude-sonnet-4-0 "Explain quantum computing"
llms -m gemini-2.5-pro "Explain quantum computing"
# Compare response times
llms --check groq
llms --check anthropic
llms --check google

Benefits:

  • Quick model comparison
  • Performance testing
  • Quality evaluation

Use local models for development, cloud for production:

Terminal window
# Development: Use Ollama (free, private)
export ENV=development
llms --enable ollama
llms -m llama3.3 "test query"
# Production: Use premium models
export ENV=production
llms --enable openai anthropic
llms -m gpt-4o "production query"

Combine local Ollama models with cloud APIs in ComfyUI:

  1. Install llms.py in ComfyUI environment
  2. Enable both Ollama and cloud providers
  3. Use llms.py node in workflows
  4. Automatic routing based on availability

Benefits:

  • Zero dependency conflicts
  • Mix local and cloud models
  • Cost control
  • Provider flexibility

No dependency management headaches:

Terminal window
# Just one dependency
pip install llms-py
# Works immediately in ComfyUI

Avoid lock-in to any single LLM provider:

{
"providers": {
"openai": {"enabled": true},
"anthropic": {"enabled": true},
"google": {"enabled": true},
"ollama": {"enabled": true}
}
}

Benefits:

  • No vendor lock-in
  • Easy migration
  • Negotiating leverage
  • Risk mitigation

Distribute load across multiple providers:

Terminal window
# Enable multiple providers for same model
llms --enable groq openrouter openai
# Automatic load distribution with failover

Benefits:

  • Higher throughput
  • Better reliability
  • Load balancing
  • Reduced rate limiting

Keep sensitive data local while using cloud for general tasks:

Terminal window
# Sensitive data: Use local Ollama
llms -m llama3.3 --file sensitive-doc.pdf "Analyze"
# General tasks: Use cloud
llms -m gpt-4o "General query"

Benefits:

  • Data sovereignty
  • Privacy compliance
  • Flexible deployment
  • Cost optimization

Intelligent routing to optimize costs:

{
"providers": {
"groq": {"enabled": true}, // Free tier
"google_free": {"enabled": true}, // Free tier
"openai": {"enabled": false} // Disabled to control costs
}
}

Track spending with analytics:

  • View cost analytics by day/month
  • Monitor per-model costs
  • Track per-provider spending

Compare capabilities of different models:

Terminal window
# Test reasoning
llms -m o3 "Solve this logic puzzle..."
llms -m qwq-plus "Solve this logic puzzle..."
# Test vision
llms -m gpt-4o --image chart.png "Analyze"
llms -m gemini-2.5-pro --image chart.png "Analyze"
# Test multilingual
llms -m qwen3-max "Translate to Chinese..."
llms -m gemini-2.5-pro "Translate to Chinese..."

Track costs for research projects:

Terminal window
# Enable analytics
llms --serve 8000
# View cost analytics in UI
# Export data for analysis

Easy experimentation with different providers:

Terminal window
# Try new models quickly
llms --enable new_provider
llms -m new-model "test"
# Compare with existing
llms -m existing-model "test"

Use different models for different tasks:

Terminal window
# Brainstorming: Fast, creative model
llms -m gemini-2.5-flash "Blog post ideas about AI"
# Drafting: Balanced model
llms -m claude-sonnet-4-0 "Write blog post about..."
# Editing: Detail-oriented model
llms -m gpt-4o "Improve this text..."

Analyze images for content creation:

Terminal window
# Describe images
llms -m qwen2.5vl --image photo.jpg "Describe for alt text"
# Extract text
llms -m gemini-2.5-pro --image screenshot.png "Extract text"
# Analyze charts
llms -m gpt-4o --image chart.png "Summarize data"

Transcribe and summarize audio content:

Terminal window
# Transcribe
llms -m gpt-4o-audio-preview --audio interview.mp3 "Transcribe"
# Summarize
llms -m gemini-2.5-flash --audio podcast.mp3 "Key points"

Process and analyze documents:

Terminal window
# Summarize PDFs
llms -m gpt-5 --file report.pdf "Executive summary"
# Extract data
llms -m qwen3-max --file invoice.pdf "Extract line items"
# Compare documents
llms -m claude-opus-4-1 --file doc1.pdf "Compare with doc2"

Analyze code and generate documentation:

Terminal window
# Explain code
llms -m grok-4 "Explain this Python code: ..."
# Generate docs
llms -m claude-sonnet-4-0 "Document this function: ..."
# Code review
llms -m gpt-4o "Review this code for issues: ..."

Learn new topics with AI help:

Terminal window
# Explanations
llms -s "You are a patient teacher" "Explain quantum computing"
# Practice
llms "Give me a Python coding challenge"
# Feedback
llms "Review my solution: ..."

Boost productivity with AI:

Terminal window
# Email drafting
llms "Draft email to client about project delay"
# Meeting notes
llms --audio meeting.mp3 "Summarize and extract action items"
# Research
llms "Summarize recent developments in AI"

Use local models for private queries:

Terminal window
# Enable only Ollama
llms --disable openai anthropic google
llms --enable ollama
# All queries stay local
llms -m llama3.3 "Private query"
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:8000/v1",
model="kimi-k2"
)
# Use in chains, agents, etc.
from llama_index.llms.openai import OpenAI
llm = OpenAI(
api_base="http://localhost:8000/v1",
model="kimi-k2"
)
# Use in indexes, queries, etc.
import requests
response = requests.post(
"http://localhost:8000/v1/chat/completions",
json={
"model": "kimi-k2",
"messages": [{"role": "user", "content": "Hello!"}]
}
)
print(response.json()["choices"][0]["message"]["content"])