Z-Image Turbo Control Tile Upscaler
This repository hosts a specialized version of the Z-Image Turbo Control Unified V2 model, fine-tuned specifically for Tile-based upscaling. This model uses the ControlNet Tile architecture to intelligently add detail and increase the resolution of low-quality images while preserving the original composition.
The model architecture integrates control layers directly into the transformer structure, enabling Unified GGUF Quantization. This allows the entire model to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM, making high-resolution upscaling accessible.
π₯ Installation
To set up the environment, simply install the dependencies:
# Create a virtual environment
python -m venv venv
# Activate your venv
# Upgrade pip
python.exe -m pip install --upgrade pip
# Install requirements
pip install -r requirements.txt
Note: This repository contains a diffusers_local folder with the custom ZImageControlUnifiedPipeline and transformer logic required to run this specific architecture.
π Usage
The primary script for upscaling images is infer_tile.py. It's designed to take a low-resolution image and generate a high-resolution version based on your text prompt.
Hardware Options
Option 1: Low VRAM (GGUF) - Recommended
Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized GGUF file. To use it, ensure use_gguf = True in infer_tile.py and provide the path to your .gguf file (e.g., z_image_turbo_control_unified_v2.1_tile_q4_k_m.gguf or z_image_turbo_control_unified_v2.1_tile_q8_0.gguf).
Key Features:
- Loads the unified transformer from a single 4-bit or 8-bit quantized file.
- Enables aggressive
group_offloadto fit large models on consumer GPUs.
Option 2: High Precision (Diffusers/BF16)
Use this version if you have ample VRAM (e.g., 24GB+). Set use_gguf = False in the script to load the full BFloat16 precision model from a standard Hugging Face directory structure.
π οΈ Model Features & Configuration
- Tile Upscaling: Intelligently adds detail to low-resolution images by processing them in tiles, guided by a text prompt.
- Refiner Scale (
controlnet_refiner_conditioning_scale): Provides fine-grained control over the influence of the initial refining layers for better detail enhancement. - Optional Refiner (
add_control_noise_refiner=False): You can disable the control noise refiner layers when loading the model to save memory. - Group Offload Fixes: The underlying code includes crucial fixes to ensure
diffusers'group_offloadworks correctly withuse_stream=True, enabling efficient memory management.
ποΈ Upscaling Examples
The following examples show how a detailed prompt can guide the model to transform a low-resolution input into a sharp, high-quality image.
| Low-Resolution Input | High-Resolution Output |
![]() |
![]() |
| Prompt: "masterpiece, 8k, photorealistic, sharp focus, fantasy character portrait of an anthropomorphic iguana sage. Intricate, hyper-detailed iridescent blue and green facial scales, realistic reptile skin texture. Expressive, intelligent golden-amber eyes. A vibrant blue, feathery throat fan (dewlap). Wearing a coarse, rustic woven hood with orange and brown tones. Dramatic studio lighting, soft side light, deep shadows, moody dark gray background. DSLR, macro details." | |
| Low-Resolution Input | High-Resolution Output |
![]() |
![]() |
| Prompt: "masterpiece, best quality, 8k, photorealistic, documentary photo, sharp focus, Somali people, African people, queue, line, man, woman, children, turban, hijab, colorful clothes, man using cellphone, detailed skin texture, fabric texture, dawn lighting, soft light, outdoors." | |
| Low-Resolution Input | High-Resolution Output |
![]() |
![]() |
| Prompt: "Photo of a bright living room, viewed from a dark hallway. The polished hardwood floor reflects light from large windows. The room has modern furniture and a TV. Strong natural lighting, high contrast, interior shot, ultra detailed." | |
| Low-Resolution Input | High-Resolution Output |
![]() |
![]() |
| Prompt: "Photo of a sunny day in a bustling Times Square, New York. A crowd of people walks through an intersection surrounded by skyscrapers with huge digital billboards. A yellow taxi is visible. Crisp street photography, ultra detailed, vibrant colors." | |
./transformer/: Directory for model weights (GGUF or standard).infer_tile.py: The primary script for Tile-based upscaling.infer_t2i.py: Script for standard Text-to-Image generation.infer_i2i.py: Script for standard Image-to-Image generation.diffusers_local/: Directory containing custom pipeline code.requirements.txt: Python dependencies.assets/: Folder for your input images.outputs/: Folder where generated images will be saved.
- Downloads last month
- 65
4-bit
8-bit







