Qwen3-VL-235B-A22B-Instruct — MLX mxfp4

MLX-format conversion of Qwen/Qwen3-VL-235B-A22B-Instruct (BF16 full precision) for Apple Silicon inference.

Quantization

Parameter	Value
Format	MLX safetensors
Quantization	mxfp4
Bits per weight	4.279
Group size	32
Shards	24
Total size	~117 GB

Usage

pip install mlx-vlm

# Text generation
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4 \
    --prompt "What model are you?" \
    --max-tokens 128

# Vision
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4 \
    --image photo.jpg \
    --prompt "Describe this image in detail." \
    --max-tokens 256

Hardware Requirements

Apple Silicon with ≥128 GB unified memory (tested on M3 Ultra 512 GB)
macOS 15+, MLX 0.30.4+

Model Details

Architecture: Qwen3-VL (Vision-Language Model) with Mixture of Experts (128 experts, top-k routing)
Parameters: 235B total, ~22B active per token
Capabilities: Text, image, and video understanding
Source: Converted from BF16 full precision checkpoint using patched mlx-vlm with per-tensor materialization to avoid Metal GPU timeout on large models

Conversion

Converted with mlx-vlm (patched for 235B+ model support):

python -m mlx_vlm convert \
    --hf-path Qwen/Qwen3-VL-235B-A22B-Instruct \
    -q --q-bits 4 --q-mode mxfp4 --q-group-size 32 \
    --mlx-path Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4

Patches required for models >100B: per-tensor lazy weight materialization before quantization to prevent Metal command buffer timeout. See LibraxisAI/mlx-vlm for the fixes.

Model tree for LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4

Base model

Qwen/Qwen3-VL-235B-A22B-Instruct

Quantized

(22)

this model

LibraxisAI
/

Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4