Kimi-K2.6-Med-JANGTQ

~720B-A32B MoE — 167 GB on disk (down from Kimi K2.6's ~610 GB / ~1T base) — 35% routed-expert prune + 2-bit JANGTQ quantization, runnable on a 256 GB Apple Silicon Mac.

Source: moonshotai/Kimi-K2.6 (Moonshot AI's MLA / MoE flagship, INT4 pack-quantized base)
Prune: 35% of routed experts removed via REAP saliency ranking, computed over the v3 calibration corpus (24% code / 20% agentic / 20% general / 13% cyber / 8% science / 8% CN / 5% systems / 2% longctx)
Quantization: JANGTQ2 — 2-bit MXTQ codebook (Hadamard-rotated, Lloyd-Max optimized) on routed-expert weights + 8-bit affine on attention / dense MLP / shared experts / embed / lm_head + fp16 passthrough on norms, router gate, biases
Bundle size: 167 GB across 178 shards
Runs on: Mac Studio M3 Ultra 256 GB (recommended)

Variants

The Kimi K2.6 × JANGTQ line, three variants:

Variant	Prune	Experts kept	Size	HF
Kimi-K2.6-Small	45%	211 of 384	153 GB	`JANGQ-AI/Kimi-K2.6-Small-JANGTQ`
Kimi-K2.6-Med (this card)	35%	250 of 384	167 GB	`JANGQ-AI/Kimi-K2.6-Med-JANGTQ`
Kimi-K2.6-Large	25%	288 of 384	~190 GB	(building)

More experts → less prune damage on long-tail / rare-domain prompts; the trade-off is RAM. Med is the sweet spot if you have 256 GB.

What's in the bundle

Module	Bundle dtype
Routed experts (250 × 3 mats × 60 layers)	2-bit MXTQ + sidecar
Shared expert	8-bit affine g=64
Attention (q/k/v/o, MLA latents)	8-bit affine g=64
Dense MLP (first `first_k_dense_replace=1` layer)	8-bit affine g=64
embed_tokens / lm_head	8-bit affine g=64
RMSNorms / router gate / e_score_correction_bias	fp16 / int64 passthrough

jangtq_runtime.safetensors sidecar (36 KB) for Swift runtime — covers (in_features={2048,7168}, seed=42, bits=2) codebook + sign-flip vectors.

Loading

pip install jang-tools mlx-lm

from jang_tools import load_jangtq
model, tokenizer = load_jangtq("JANGQ-AI/Kimi-K2.6-Med-JANGTQ")

vMLX (Swift) auto-loads from config.json's routed_expert_bits=2 + mxtq_seed=42 + the sidecar.

Credits

JANGTQ codec, REAP calibration, JANGQ-AI bundle, conversion tooling: Jinho Jang (eric@jangq.ai).

Source model and its license belong to Moonshot AI (modified-MIT).

Downloads last month: 624

Safetensors

Model size

45B params

Tensor type

U32

F16

MLX

Hardware compatibility

Quantized

Model tree for JANGQ-AI/Kimi-K2.6-Med-JANGTQ

Base model

moonshotai/Kimi-K2.6

Finetuned

(11)

this model

Collection including JANGQ-AI/Kimi-K2.6-Med-JANGTQ

JANG TurboQuantized Models

Collection

Using TurboQuant as a Quantization method • 10 items • Updated 2 days ago • 2