Updates

02/01/2026: moonshotai has published an updated chat_template.jinja, I have updated the GGUFs in this repository so please re-download the first shard (00001) for your desired quant.

  • The default system prompt might cause confusion to users and unexpected behaviours, so we remove it.
  • The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.

Model

This is a text-and-image-only GGUF quantization of moonshotai/Kimi-K2.5. This means that video input is not present in this GGUF, and will not be available until support is added upstream in llama.cpp.

MMPROJ files for image vision input have been provided, but you will need to pull and compile the code from my PR (https://github.com/ggml-org/llama.cpp/pull/19170) to use the vision component.

To produce this quant, I modified the config.json to remove the text_config key in the config and de-indented and deduplicated the inner values for that so they sit at the top level of the JSON, updated the arch to DS and updated the model_type to "kimi_k2". I also removed the mm and vision_tower entries from the model.safetensor.index.json.

I used jukofyork's instructions here to modify llama.cpp and recompiled, decompressed the HF safetensors files into a BF16 gguf, then produced a quant:

./build/bin/llama-quantize \
    --tensor-type attn_kv_a_mqa=q8_0 \
    --tensor-type attn_k_b=q8_0 \
    --tensor-type attn_v_b=q8_0 \
    --tensor-type _exps=q4_0 \
    Kimi-K2.5-BF16.gguf Kimi-K2.5-Q4_X.gguf Q8_0

This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.

Quant Size Mixture PPL Uploaded?
Q4_X 543.62 GiB (4.55 BPW) Q8_0 / Q4_0 1.8248 +/- 0.00699 โœ…
Q4_K 543.62 GiB (4.55 BPW) Q8_0 / Q4_K 1.8256 +/- 0.00700 โŒ
Downloads last month
995
GGUF
Model size
1T params
Architecture
deepseek2
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for AesSedai/Kimi-K2.5-GGUF

Quantized
(8)
this model