Skip to content

API Reference

llms.py implements the OpenAI Chat Completions API, making it compatible with any OpenAI-compatible client.

When running in server mode:

http://localhost:8000/v1

API key authentication is optional and not validated. You can use any value or omit it entirely.

Endpoint: POST /v1/chat/completions

Description: Create a chat completion using any configured model.

{
"model": "kimi-k2",
"messages": [
{
"role": "system",
"content": "You are a helpful assistant."
},
{
"role": "user",
"content": "Hello!"
}
],
"temperature": 0.7,
"max_tokens": 150
}
ParameterTypeRequiredDescription
modelstringYesModel name from enabled providers
messagesarrayYesArray of message objects
temperaturenumberNo0.0 to 2.0, controls randomness
max_tokensintegerNoMaximum tokens in response
max_completion_tokensintegerNoAlternative to max_tokens
top_pnumberNo0.0 to 1.0, nucleus sampling
frequency_penaltynumberNo-2.0 to 2.0, penalize frequency
presence_penaltynumberNo-2.0 to 2.0, penalize presence
stopstring/arrayNoStop sequences
seedintegerNoFor reproducible outputs
streambooleanNoEnable streaming responses
logprobsbooleanNoInclude log probabilities
top_logprobsintegerNo0-20, number of top logprobs
reasoning_effortstringNolow, medium, high
enable_thinkingbooleanNoEnable thinking mode (Qwen)
{
"role": "user|assistant|system",
"content": "string or array"
}

Text Content:

{
"role": "user",
"content": "Hello!"
}

Multi-Modal Content:

{
"role": "user",
"content": [
{
"type": "text",
"text": "What's in this image?"
},
{
"type": "image_url",
"image_url": {
"url": "data:image/jpeg;base64,..."
}
}
]
}
{
"id": "chatcmpl-123",
"object": "chat.completion",
"created": 1677652288,
"model": "kimi-k2",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Hello! How can I help you today?"
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 9,
"completion_tokens": 12,
"total_tokens": 21
}
}
FieldTypeDescription
idstringUnique completion ID
objectstringAlways “chat.completion”
createdintegerUnix timestamp
modelstringModel used
choicesarrayArray of completion choices
usageobjectToken usage information
Terminal window
curl -X POST http://localhost:8000/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "kimi-k2",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
from openai import OpenAI
client = OpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed"
)
response = client.chat.completions.create(
model="kimi-k2",
messages=[
{"role": "user", "content": "Hello!"}
]
)
print(response.choices[0].message.content)
import OpenAI from 'openai';
const client = new OpenAI({
baseURL: 'http://localhost:8000/v1',
apiKey: 'not-needed',
});
const response = await client.chat.completions.create({
model: 'kimi-k2',
messages: [
{ role: 'user', content: 'Hello!' }
],
});
console.log(response.choices[0].message.content);
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
base_url="http://localhost:8000/v1",
api_key="not-needed",
model="kimi-k2"
)
response = llm.invoke("Hello!")
print(response.content)
from llama_index.llms.openai import OpenAI
llm = OpenAI(
api_base="http://localhost:8000/v1",
api_key="not-needed",
model="kimi-k2"
)
response = llm.complete("Hello!")
print(response.text)
{
"error": {
"message": "Model 'unknown-model' not found in any enabled provider",
"type": "invalid_request_error",
"code": "model_not_found"
}
}
{
"error": {
"message": "All providers failed for model 'kimi-k2'",
"type": "api_error",
"code": "provider_error"
}
}

Requests are automatically routed to available providers:

  1. Finds all enabled providers that support the requested model
  2. Tries providers in order (as defined in llms.json)
  3. If a provider fails, tries the next one
  4. Returns error if all providers fail

This provides:

  • Automatic failover: High availability
  • Cost optimization: Free/cheap providers first
  • Load distribution: Spread across providers