Server Mode
llms.py can run as an OpenAI-compatible HTTP server, allowing you to use it as a drop-in replacement for the OpenAI API with any compatible client.
Starting the Server
Section titled “Starting the Server”# Start on port 8000llms --serve 8000
# With verbose loggingllms --serve 8000 --verbose
# With custom configllms --serve 8000 --config /path/to/config.jsonThis launches:
- Web UI at
http://localhost:8000 - OpenAI-compatible API at
http://localhost:8000/v1/chat/completions
API Endpoints
Section titled “API Endpoints”Chat Completions
Section titled “Chat Completions”Endpoint: POST /v1/chat/completions
Request:
curl -X POST http://localhost:8000/v1/chat/completions \ -H "Content-Type: application/json" \ -d '{ "model": "kimi-k2", "messages": [ {"role": "user", "content": "Hello!"} ] }'Response:
{ "id": "chatcmpl-123", "object": "chat.completion", "created": 1677652288, "model": "kimi-k2", "choices": [{ "index": 0, "message": { "role": "assistant", "content": "Hello! How can I help you today?" }, "finish_reason": "stop" }], "usage": { "prompt_tokens": 9, "completion_tokens": 12, "total_tokens": 21 }}OpenAI Compatibility
Section titled “OpenAI Compatibility”The server implements the OpenAI Chat Completions API, making it compatible with:
Python OpenAI SDK
Section titled “Python OpenAI SDK”from openai import OpenAI
client = OpenAI( base_url="http://localhost:8000/v1", api_key="not-needed" # API key not required)
response = client.chat.completions.create( model="kimi-k2", messages=[ {"role": "user", "content": "Hello!"} ])
print(response.choices[0].message.content)LangChain
Section titled “LangChain”from langchain_openai import ChatOpenAI
llm = ChatOpenAI( base_url="http://localhost:8000/v1", api_key="not-needed", model="kimi-k2")
response = llm.invoke("Hello!")print(response.content)LlamaIndex
Section titled “LlamaIndex”from llama_index.llms.openai import OpenAI
llm = OpenAI( api_base="http://localhost:8000/v1", api_key="not-needed", model="kimi-k2")
response = llm.complete("Hello!")print(response.text)Any OpenAI-Compatible Client
Section titled “Any OpenAI-Compatible Client”Any tool or library that supports OpenAI’s API can use llms.py by:
- Setting the base URL to
http://localhost:8000/v1 - Using any model name from your enabled providers
- API key is optional (not validated)
Request Parameters
Section titled “Request Parameters”Supported parameters in chat completion requests:
model: Model name (required)messages: Array of message objects (required)temperature: 0.0 to 2.0 (default: varies by model)max_tokens/max_completion_tokens: Maximum response lengthtop_p: 0.0 to 1.0 for nucleus samplingfrequency_penalty: -2.0 to 2.0presence_penalty: -2.0 to 2.0stop: String or array of stop sequencesseed: For reproducible outputsstream: Enable streaming responseslogprobs: Include log probabilitiestop_logprobs: Number of top logprobs (0-20)reasoning_effort: For reasoning modelsenable_thinking: Enable thinking mode (Qwen)
Model Routing
Section titled “Model Routing”The server automatically routes requests to available providers:
- Checks enabled providers in order (from
llms.json) - Finds first provider that supports the requested model
- If request fails, tries next available provider
- Returns error if no providers can handle the request
Example Routing
Section titled “Example Routing”If kimi-k2 is available in:
groq(free tier) - tried firstopenrouter(paid) - tried if groq fails
This ensures cost optimization and automatic failover.
Docker Deployment
Section titled “Docker Deployment”Using Docker
Section titled “Using Docker”docker run -p 8000:8000 \ -e GROQ_API_KEY=$GROQ_API_KEY \ ghcr.io/servicestack/llms:latestUsing Docker Compose
Section titled “Using Docker Compose”version: '3.8'
services: llms: image: ghcr.io/servicestack/llms:latest ports: - "8000:8000" environment: - OPENROUTER_API_KEY=${OPENROUTER_API_KEY} - GROQ_API_KEY=${GROQ_API_KEY} volumes: - llms-data:/home/llms/.llms restart: unless-stopped
volumes: llms-data:Start with:
docker compose up -dHealth Checks
Section titled “Health Checks”The Docker image includes health checks:
# Check container healthdocker ps
# View health check logsdocker inspect --format='{{json .State.Health}}' llms-serverNext Steps
Section titled “Next Steps”- Web UI - Use the web interface
- Docker Guide - Detailed Docker setup
- API Reference - Complete API documentation