Base Compute · Documentation

BaseRT

A fast LLM inference runtime for Apple Silicon. Pull a model from HuggingFace, chat with it, or serve an OpenAI-compatible API — all through one CLI.

$ basert serve basecompute/gemma-4-E4B-it

Read the docs GitHub

InstallInstallationGet the engine and the basert CLI onto your PATH.Start hereQuickstartPull a model and chat in under a minute.ServeServing an APIRun the OpenAI-compatible server for chat, completions, and more.ReferenceCLI referenceEvery basert command and flag.IntegrateBindingsPython, Node, Rust, and Swift over a stable C API.ReferenceServer APIThe OpenAI-compatible HTTP endpoints in detail.