Qwen3-VL-235B-A22B-Instruct — MLX mxfp4

MLX-format conversion of Qwen/Qwen3-VL-235B-A22B-Instruct (BF16 full precision) for Apple Silicon inference.

Quantization

Parameter Value
Format MLX safetensors
Quantization mxfp4
Bits per weight 4.279
Group size 32
Shards 24
Total size ~117 GB

Usage

pip install mlx-vlm

# Text generation
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4 \
    --prompt "What model are you?" \
    --max-tokens 128

# Vision
python -m mlx_vlm generate \
    --model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4 \
    --image photo.jpg \
    --prompt "Describe this image in detail." \
    --max-tokens 256

Hardware Requirements

  • Apple Silicon with ≥128 GB unified memory (tested on M3 Ultra 512 GB)
  • macOS 15+, MLX 0.30.4+

Model Details

  • Architecture: Qwen3-VL (Vision-Language Model) with Mixture of Experts (128 experts, top-k routing)
  • Parameters: 235B total, ~22B active per token
  • Capabilities: Text, image, and video understanding
  • Source: Converted from BF16 full precision checkpoint using patched mlx-vlm with per-tensor materialization to avoid Metal GPU timeout on large models

Conversion

Converted with mlx-vlm (patched for 235B+ model support):

python -m mlx_vlm convert \
    --hf-path Qwen/Qwen3-VL-235B-A22B-Instruct \
    -q --q-bits 4 --q-mode mxfp4 --q-group-size 32 \
    --mlx-path Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4

Patches required for models >100B: per-tensor lazy weight materialization before quantization to prevent Metal command buffer timeout. See LibraxisAI/mlx-vlm for the fixes.

See Also


Created by M&K (c)2026 The LibraxisAI Team

Downloads last month
65
Safetensors
Model size
45B params
Tensor type
U8
·
U32
·
BF16
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4

Quantized
(22)
this model