Qwen/Qwen3-Next-80B-A3B-Thinking has MMLU_PRO 82.7 but you guys get 0.7271

by hlxxxxxx - opened Sep 22, 2025

Discussion

hlxxxxxx

Sep 22, 2025

what is the differences?

SlavikF

Oct 4, 2025

quantization?

hlxxxxxx

Oct 11, 2025

quantization?

not possible, I mean the mmlu pro benchmark of baseline

wenhuach

Intel org 5 days ago

•

edited 5 days ago

we use lm-eval-harness to test the model, which is widely adopted the community. As we only care about the gap bettween bf16 and int4 models, we have no extra brandwidth to root cause the issue. Instead, you could submit an issue to lm-eval-harness

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment