Python

ctypes-based bindings over the BaseRT C API. Python 3.9+.

Install

cd bindings/python
pip install -e .

Point the bindings at the engine dylib if it isn't in build/:

export BASERT_LIB_PATH=/path/to/build   # dir containing libbaseRT.dylib

Generate text

import baseRT

m = baseRT.Model("models/your-model.base")
print(m.generate_text("The capital of France is", max_tokens=64, temperature=0.7))

Stream tokens

for chunk in m.stream("Explain RoPE in one sentence.", max_tokens=128):
    print(chunk, end="", flush=True)

Lower-level control

tokens = m.encode("Once upon a time")
stats = m.generate(tokens, max_tokens=256, temperature=0.7,
                   top_k=40, top_p=0.9)        # streaming via callback under the hood
print(stats.generated_tokens, stats.prefill_tokens_per_sec)

Other Model methods: config(), memory_bytes(), position(), reset(), decode_token(), prefill() / prefill_image(), decode_step(), embed() / embed_text() / embedding_dim(), format_chat(system, user) / chat_template(), token_count(), tensors(), and transcribe() for Whisper-class models.

Embeddings

m = baseRT.Model("models/embedding-model.base")
vec = m.embed_text("hello world")
print(m.embedding_dim(), len(vec))

Transcription

m = baseRT.Model("models/whisper-base.base")
# pass PCM samples / audio per the transcribe() signature

See bindings/python for the full surface and tests.