A fast LLM inference runtime for Apple Silicon. Pull a model from HuggingFace, chat with it, or serve an OpenAI-compatible API — all through one CLI.
$ basert serve basecompute/gemma-4-E4B-it