Server API

basert serve exposes an OpenAI-compatible HTTP API. Point any OpenAI client at the base URL and set the API key to your --api-key value. See Serving an API for launch flags.

Authentication

If --api-key is set, every request must send:

Authorization: Bearer <api-key>

Endpoints

Method · Path	Purpose
`POST /v1/chat/completions`	Chat completions (streaming via `"stream": true`). Tool/function calling supported.
`POST /v1/completions`	Text completions.
`POST /v1/embeddings`	Embedding vectors.
`POST /v1/rerank`	Rerank documents against a query.
`POST /v1/audio/transcriptions`	Whisper-class transcription.
`POST /v1/tokenize`	Tokenize text (count/inspect tokens).
`GET /v1/models`	List loaded models.
`POST /v1/models/load` · `POST /v1/models/unload`	Load/unload a model at runtime.
`POST /v1/lora/load` · `POST /v1/lora/unload`	Manage LoRA adapters.
`POST /v1/files` · `GET /v1/files/...`	File storage (requires `--files-dir`).
`POST /v1/batches` · `GET /v1/batches/...`	Batch jobs (requires `--files-dir`).
`GET /health`	Liveness probe.
`GET /metrics` · `GET /v1/metrics`	Server metrics.

Chat completions

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{
        "model": "Qwen3-4B",
        "messages": [
          {"role": "system", "content": "You are concise."},
          {"role": "user", "content": "What is RoPE?"}
        ],
        "temperature": 0.7,
        "max_tokens": 256
      }'

Streaming

Set "stream": true to receive Server-Sent Events (data: {…} chunks ending with data: [DONE]):

curl -N http://127.0.0.1:8080/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"model":"Qwen3-4B","messages":[{"role":"user","content":"Hi"}],"stream":true}'

Tool calling

Pass tools (OpenAI function-calling schema); the model emits tool_calls in the response (streamed incrementally when stream is set).

Embeddings

curl http://127.0.0.1:8080/v1/embeddings \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"model":"my-embed-model","input":["hello world","second doc"]}'

Using the OpenAI SDKs

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="$API_KEY")
resp = client.chat.completions.create(
    model="Qwen3-4B",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(resp.choices[0].message.content)

NOTE

The exact request/response fields follow the OpenAI schema. Endpoints like /v1/files, /v1/batches, and LoRA management depend on server flags (--files-dir) and the loaded model's capabilities.