Base Compute · Documentation

BaseRT

A fast LLM inference runtime for Apple Silicon. Pull a model from HuggingFace, chat with it, or serve an OpenAI-compatible API — all through one CLI.

$ basert serve basecompute/gemma-4-E4B-it