Quickstart

This walks you from zero to a running model. It assumes you've completed Installation and have basert and the engine tools on your PATH.

Pull a model

basert pull downloads a model from HuggingFace and converts it to .base on the fly (or fetches a pre-converted model from the catalog):

basert pull Qwen/Qwen3-4B

Models are stored in ~/.cache/baseRT/models (override with $BASERT_MODELS_DIR). List what you have:

basert list                 # installed models
basert list --remote        # also show catalog models you haven't pulled

You can also pull a smaller model to start:

basert pull Qwen/Qwen3-0.6B

TIP

Gated or private repos

Set HF_TOKEN (or log in with the HuggingFace CLI) before pulling gated models. BaseRT uses the standard token chain.

Chat

basert chat Qwen/Qwen3-4B

You're dropped into an interactive prompt. Type /clear to reset the conversation, quit to exit. Common flags:

basert chat Qwen/Qwen3-4B --temp 0.7 --top-p 0.9 --max-tokens 512 --max-context 8192

See Chat & completion for all flags (sampling, thinking mode, KV-cache width, paged-KV).

One-shot completion

For scripting, basert complete runs a single prompt and prints the result:

basert complete Qwen/Qwen3-4B --prompt "Write a haiku about Metal GPUs." --max-tokens 64

Serve an OpenAI-compatible API

basert serve --model Qwen/Qwen3-4B --api-key "$(uuidgen)" --port 8080

Call it like any OpenAI endpoint:

curl http://127.0.0.1:8080/v1/chat/completions \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{
        "model": "Qwen3-4B",
        "messages": [{"role": "user", "content": "Hello!"}]
      }'

Stream tokens with "stream": true. See Serving an API and the Server API reference for endpoints, batching, and deployment.

From your language

import baseRT

m = baseRT.Model("~/.cache/baseRT/models/Qwen/Qwen3-4B/default-q4/model.base")
for chunk in m.stream("Explain RoPE in one sentence.", max_tokens=128):
    print(chunk, end="", flush=True)

See Bindings for Python, Node, Rust, and Swift.

Convert a local checkpoint

Already have a GGUF / HF / MLX checkpoint? Convert it directly:

basert convert ./path/to/checkpoint \
    --profile base-convert/profiles/default-q4.json \
    --output models/my-model.base