Qwopus3.5-9B-v3-abliterated-TQ3_4S

Pure TQ3_4S GGUF built locally from huihui-ai/Huihui-Qwopus3.5-9B-v3-abliterated.

Base source:

  • huihui-ai/Huihui-Qwopus3.5-9B-v3-abliterated
  • upstream family: Jackrong/Qwopus3.5-9B-v3

Artifacts:

  • pure GGUF size: 4,491,580,736 bytes (~4.17 GiB)
  • source F16 GGUF size: 17,920,693,568 bytes

Workflow used:

  1. Download safetensors from Hugging Face.
  2. Convert to F16 GGUF with convert_hf_to_gguf.py from turbo-tan/llama.cpp-tq3.
  3. Quantize with llama-quantize --pure from the same repo.

Important implementation note:

  • The public llama.cpp-tq3 checkout needed local fixes so llama-quantize could actually quantize TQ3_4S end-to-end:
    • expose TQ3_1S and TQ3_4S in tools/quantize/quantize.cpp
    • map LLAMA_FTYPE_MOSTLY_TQ3_1S/TQ3_4S in src/llama-quant.cpp
    • wire GGML_TYPE_TQ3_4S quantization in ggml/src/ggml.c

Exact quantization command used:

./build/bin/llama-quantize --pure \
  /path/to/Qwopus3.5-9B-v3-abliterated-f16.gguf \
  /path/to/Qwopus3.5-9B-v3-abliterated-TQ3_4S.gguf \
  TQ3_4S \
  16

Runtime:

  • target runtime: turbo-tan/llama.cpp-tq3

Example server command:

./build/bin/llama-server \
  -m /path/to/Qwopus3.5-9B-v3-abliterated-TQ3_4S.gguf \
  --host 127.0.0.1 --port 8080 \
  -ngl 99 -np 2 --kv-unified -c 32768 \
  -ctk q8_0 -ctv q8_0 -fa on \
  --jinja --reasoning on --reasoning-format deepseek --reasoning-budget 2048 \
  --alias qwopus-local

Local smoke test:

  • OpenAI-compatible server started successfully
  • /health returned {"status":"ok"}
  • /v1/models returned model id qwopus-local
  • completion prompt Write only the word ok. returned ok

Perplexity:

  • dataset: wikitext/wiki.test.raw
  • successful evaluation command:
./build/bin/llama-perplexity \
  -m /path/to/Qwopus3.5-9B-v3-abliterated-TQ3_4S.gguf \
  -f /path/to/wiki.test.raw \
  -ngl 0 -c 2048 -b 512 -ub 512 -fa off --no-kv-offload
  • final estimate: PPL = 10.7488 +/- 0.07717

Notes:

  • Lower perplexity is better.
  • The first high-throughput GPU eval attempt crashed late with a CUDA sync error; the above conservative eval settings completed successfully and are the reported numbers here.
Downloads last month
543
GGUF
Model size
9B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

We're not able to determine the quantization variants.

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for navanchauhan/Huihui-Qwopus3.5-9B-v3-abliterated-TQ3_4S

Finetuned
Qwen/Qwen3.5-9B
Quantized
(17)
this model