Updates
02/01/2026: moonshotai has published an updated chat_template.jinja, I have updated the GGUFs in this repository so please re-download the first shard (00001) for your desired quant.
- The default system prompt might cause confusion to users and unexpected behaviours, so we remove it.
- The token <|media_start|> is incorrect; it has been replaced with <|media_begin|> in the chat template.
Model
This is a text-and-image-only GGUF quantization of moonshotai/Kimi-K2.5. This means that video input is not present in this GGUF, and will not be available until support is added upstream in llama.cpp.
MMPROJ files for image vision input have been provided, but you will need to pull and compile the code from my PR (https://github.com/ggml-org/llama.cpp/pull/19170) to use the vision component.
To produce this quant, I modified the config.json to remove the text_config key in the config and de-indented and deduplicated the inner values for that so they sit at the top level of the JSON, updated the arch to DS and updated the model_type to "kimi_k2". I also removed the mm and vision_tower entries from the model.safetensor.index.json.
I used jukofyork's instructions here to modify llama.cpp and recompiled, decompressed the HF safetensors files into a BF16 gguf, then produced a quant:
./build/bin/llama-quantize \
--tensor-type attn_kv_a_mqa=q8_0 \
--tensor-type attn_k_b=q8_0 \
--tensor-type attn_v_b=q8_0 \
--tensor-type _exps=q4_0 \
Kimi-K2.5-BF16.gguf Kimi-K2.5-Q4_X.gguf Q8_0
This Q4_X quant is the "full quality" equivalent since the conditional experts are natively INT4 quantized directly from the original model, and the rest of the model is Q8_0. I also produced and tested a Q8_0 / Q4_K quant, the model size was identical and the PPL was barely higher. Their performance was about the same so I've only uploaded the Q4_X variant.
| Quant | Size | Mixture | PPL | Uploaded? |
|---|---|---|---|---|
| Q4_X | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_0 | 1.8248 +/- 0.00699 | โ |
| Q4_K | 543.62 GiB (4.55 BPW) | Q8_0 / Q4_K | 1.8256 +/- 0.00700 | โ |
- Downloads last month
- 995
Model tree for AesSedai/Kimi-K2.5-GGUF
Base model
moonshotai/Kimi-K2.5