Kimi-K2.6-Med-JANGTQ

~720B-A32B MoE — 167 GB on disk (down from Kimi K2.6's ~610 GB / ~1T base) — 35% routed-expert prune + 2-bit JANGTQ quantization, runnable on a 256 GB Apple Silicon Mac.

  • Source: moonshotai/Kimi-K2.6 (Moonshot AI's MLA / MoE flagship, INT4 pack-quantized base)
  • Prune: 35% of routed experts removed via REAP saliency ranking, computed over the v3 calibration corpus (24% code / 20% agentic / 20% general / 13% cyber / 8% science / 8% CN / 5% systems / 2% longctx)
  • Quantization: JANGTQ2 — 2-bit MXTQ codebook (Hadamard-rotated, Lloyd-Max optimized) on routed-expert weights + 8-bit affine on attention / dense MLP / shared experts / embed / lm_head + fp16 passthrough on norms, router gate, biases
  • Bundle size: 167 GB across 178 shards
  • Runs on: Mac Studio M3 Ultra 256 GB (recommended)

Variants

The Kimi K2.6 × JANGTQ line, three variants:

Variant Prune Experts kept Size HF
Kimi-K2.6-Small 45% 211 of 384 153 GB JANGQ-AI/Kimi-K2.6-Small-JANGTQ
Kimi-K2.6-Med (this card) 35% 250 of 384 167 GB JANGQ-AI/Kimi-K2.6-Med-JANGTQ
Kimi-K2.6-Large 25% 288 of 384 ~190 GB (building)

More experts → less prune damage on long-tail / rare-domain prompts; the trade-off is RAM. Med is the sweet spot if you have 256 GB.

What's in the bundle

Module Bundle dtype
Routed experts (250 × 3 mats × 60 layers) 2-bit MXTQ + sidecar
Shared expert 8-bit affine g=64
Attention (q/k/v/o, MLA latents) 8-bit affine g=64
Dense MLP (first first_k_dense_replace=1 layer) 8-bit affine g=64
embed_tokens / lm_head 8-bit affine g=64
RMSNorms / router gate / e_score_correction_bias fp16 / int64 passthrough

jangtq_runtime.safetensors sidecar (36 KB) for Swift runtime — covers (in_features={2048,7168}, seed=42, bits=2) codebook + sign-flip vectors.

Loading

pip install jang-tools mlx-lm
from jang_tools import load_jangtq
model, tokenizer = load_jangtq("JANGQ-AI/Kimi-K2.6-Med-JANGTQ")

vMLX (Swift) auto-loads from config.json's routed_expert_bits=2 + mxtq_seed=42 + the sidecar.

Credits

JANGTQ codec, REAP calibration, JANGQ-AI bundle, conversion tooling: Jinho Jang (eric@jangq.ai).

Source model and its license belong to Moonshot AI (modified-MIT).

Downloads last month
624
Safetensors
Model size
45B params
Tensor type
U32
·
F16
·
U8
·
MLX
Hardware compatibility
Log In to add your hardware

Quantized

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for JANGQ-AI/Kimi-K2.6-Med-JANGTQ

Finetuned
(11)
this model

Collection including JANGQ-AI/Kimi-K2.6-Med-JANGTQ