Quant profiles

A profile is a reusable JSON file that maps tensor-name globs to per-tensor quant rules. basert convert --profile <path> applies it; the bundle records the profile name for audit.

The generic profiles ship in base-convert/profiles/ (default-q4, default-q8, and scale-dtype variants). Full guidance is in profiles/PROFILES.md.

Schema

{
  "name": "my-profile-v1",          // recorded in the bundle's quant_profile
  "arch": "llama",                   // checked against the model's arch
  "calibration": {                    // optional; omit for RTN-only
    "method": "awq",
    "tokens": 1024,
    "dataset": "wikitext-103"
  },
  "rules": [                          // first match wins, per tensor
    { "pattern": "model.embed_tokens.weight", "dtype": "bf16" },
    { "pattern": "model.layers.*.self_attn.{q,k,v,o}_proj.weight",
      "dtype": "base_q4", "scale_dtype": "bf16", "group_size": 64 },
    { "pattern": "lm_head.weight", "dtype": "base_q8" },
    { "pattern": "**.weight", "dtype": "base_q4" }    // catch-all
  ]
}

Glob syntax

Token	Matches
`*`	anything except `.` (within one name segment)
`**`	anything, including `.` (any number of segments)
`{a,b,c}`	alternation (expanded at load time)

Rules are evaluated top-down; the first matching rule wins. Include a catch-all (**.weight) so every tensor is covered, or pair the profile with --target as the fallback.

Per-rule fields

Field	Required	Notes
`pattern`	yes	Tensor-name glob.
`dtype`	yes	`base_q2`…`base_q8`, `bf16`, `f16`, `f32`.
`group_size`	no	Defaults to the dtype's canonical group size.
`scale_dtype`	no	`bf16` (default) / `f16` / `e8m0` / `e4m3` (q8 only).
`symmetric`	no	Default `false` (asymmetric).

Tips

Keep norms, routers, and (often) embeddings at bf16/f16 — they're small and precision-sensitive.
Validate by converting a small model and running basert inspect to confirm the per-tensor dtypes resolved as intended.