Kimi-K2.6-Med-JANGTQ
~720B-A32B MoE — 167 GB on disk (down from Kimi K2.6's ~610 GB / ~1T base) — 35% routed-expert prune + 2-bit JANGTQ quantization, runnable on a 256 GB Apple Silicon Mac.
- Source: moonshotai/Kimi-K2.6 (Moonshot AI's MLA / MoE flagship, INT4 pack-quantized base)
- Prune: 35% of routed experts removed via REAP saliency ranking, computed over the v3 calibration corpus (24% code / 20% agentic / 20% general / 13% cyber / 8% science / 8% CN / 5% systems / 2% longctx)
- Quantization: JANGTQ2 — 2-bit MXTQ codebook (Hadamard-rotated, Lloyd-Max optimized) on routed-expert weights + 8-bit affine on attention / dense MLP / shared experts / embed / lm_head + fp16 passthrough on norms, router gate, biases
- Bundle size: 167 GB across 178 shards
- Runs on: Mac Studio M3 Ultra 256 GB (recommended)
Variants
The Kimi K2.6 × JANGTQ line, three variants:
| Variant | Prune | Experts kept | Size | HF |
|---|---|---|---|---|
| Kimi-K2.6-Small | 45% | 211 of 384 | 153 GB | JANGQ-AI/Kimi-K2.6-Small-JANGTQ |
| Kimi-K2.6-Med (this card) | 35% | 250 of 384 | 167 GB | JANGQ-AI/Kimi-K2.6-Med-JANGTQ |
| Kimi-K2.6-Large | 25% | 288 of 384 | ~190 GB | (building) |
More experts → less prune damage on long-tail / rare-domain prompts; the trade-off is RAM. Med is the sweet spot if you have 256 GB.
What's in the bundle
| Module | Bundle dtype |
|---|---|
| Routed experts (250 × 3 mats × 60 layers) | 2-bit MXTQ + sidecar |
| Shared expert | 8-bit affine g=64 |
| Attention (q/k/v/o, MLA latents) | 8-bit affine g=64 |
Dense MLP (first first_k_dense_replace=1 layer) |
8-bit affine g=64 |
| embed_tokens / lm_head | 8-bit affine g=64 |
| RMSNorms / router gate / e_score_correction_bias | fp16 / int64 passthrough |
jangtq_runtime.safetensors sidecar (36 KB) for Swift runtime — covers
(in_features={2048,7168}, seed=42, bits=2) codebook + sign-flip vectors.
Loading
pip install jang-tools mlx-lm
from jang_tools import load_jangtq
model, tokenizer = load_jangtq("JANGQ-AI/Kimi-K2.6-Med-JANGTQ")
vMLX (Swift) auto-loads from config.json's
routed_expert_bits=2 + mxtq_seed=42 + the sidecar.
Credits
JANGTQ codec, REAP calibration, JANGQ-AI bundle, conversion tooling: Jinho Jang (eric@jangq.ai).
Source model and its license belong to Moonshot AI (modified-MIT).
- Downloads last month
- 624
Quantized
Model tree for JANGQ-AI/Kimi-K2.6-Med-JANGTQ
Base model
moonshotai/Kimi-K2.6