Qwen3-VL-235B-A22B-Instruct — MLX mxfp4
MLX-format conversion of Qwen/Qwen3-VL-235B-A22B-Instruct (BF16 full precision) for Apple Silicon inference.
Quantization
| Parameter | Value |
|---|---|
| Format | MLX safetensors |
| Quantization | mxfp4 |
| Bits per weight | 4.279 |
| Group size | 32 |
| Shards | 24 |
| Total size | ~117 GB |
Usage
pip install mlx-vlm
# Text generation
python -m mlx_vlm generate \
--model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4 \
--prompt "What model are you?" \
--max-tokens 128
# Vision
python -m mlx_vlm generate \
--model LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4 \
--image photo.jpg \
--prompt "Describe this image in detail." \
--max-tokens 256
Hardware Requirements
- Apple Silicon with ≥128 GB unified memory (tested on M3 Ultra 512 GB)
- macOS 15+, MLX 0.30.4+
Model Details
- Architecture: Qwen3-VL (Vision-Language Model) with Mixture of Experts (128 experts, top-k routing)
- Parameters: 235B total, ~22B active per token
- Capabilities: Text, image, and video understanding
- Source: Converted from BF16 full precision checkpoint using patched mlx-vlm with per-tensor materialization to avoid Metal GPU timeout on large models
Conversion
Converted with mlx-vlm (patched for 235B+ model support):
python -m mlx_vlm convert \
--hf-path Qwen/Qwen3-VL-235B-A22B-Instruct \
-q --q-bits 4 --q-mode mxfp4 --q-group-size 32 \
--mlx-path Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4
Patches required for models >100B: per-tensor lazy weight materialization before quantization to prevent Metal command buffer timeout. See LibraxisAI/mlx-vlm for the fixes.
See Also
- mxfp8 version — higher precision, ~243 GB
Created by M&K (c)2026 The LibraxisAI Team
- Downloads last month
- 65
Model size
45B params
Tensor type
U8
·
U32
·
BF16
·
Hardware compatibility
Log In
to add your hardware
4-bit
Model tree for LibraxisAI/Qwen3-VL-235B-A22B-Instruct-mlx-mxfp4
Base model
Qwen/Qwen3-VL-235B-A22B-Instruct