Z-Image Turbo Control Tile Upscaler

Github Original Repo

This repository hosts a specialized version of the Z-Image Turbo Control Unified V2 model, fine-tuned specifically for Tile-based upscaling. This model uses the ControlNet Tile architecture to intelligently add detail and increase the resolution of low-quality images while preserving the original composition.

The model architecture integrates control layers directly into the transformer structure, enabling Unified GGUF Quantization. This allows the entire model to be quantized (e.g., Q4_K_M, Q8_0) and run efficiently on consumer hardware with limited VRAM, making high-resolution upscaling accessible.

πŸ“₯ Installation

To set up the environment, simply install the dependencies:

# Create a virtual environment
python -m venv venv

# Activate your venv

# Upgrade pip
python.exe -m pip install --upgrade pip

# Install requirements
pip install -r requirements.txt

Note: This repository contains a diffusers_local folder with the custom ZImageControlUnifiedPipeline and transformer logic required to run this specific architecture.

πŸš€ Usage

The primary script for upscaling images is infer_tile.py. It's designed to take a low-resolution image and generate a high-resolution version based on your text prompt.

Hardware Options

Option 1: Low VRAM (GGUF) - Recommended

Use this version if you have limited VRAM (e.g., 6GB - 8GB). It loads the model from a quantized GGUF file. To use it, ensure use_gguf = True in infer_tile.py and provide the path to your .gguf file (e.g., z_image_turbo_control_unified_v2.1_tile_q4_k_m.gguf or z_image_turbo_control_unified_v2.1_tile_q8_0.gguf).

Key Features:

  • Loads the unified transformer from a single 4-bit or 8-bit quantized file.
  • Enables aggressive group_offload to fit large models on consumer GPUs.

Option 2: High Precision (Diffusers/BF16)

Use this version if you have ample VRAM (e.g., 24GB+). Set use_gguf = False in the script to load the full BFloat16 precision model from a standard Hugging Face directory structure.

πŸ› οΈ Model Features & Configuration

  • Tile Upscaling: Intelligently adds detail to low-resolution images by processing them in tiles, guided by a text prompt.
  • Refiner Scale (controlnet_refiner_conditioning_scale): Provides fine-grained control over the influence of the initial refining layers for better detail enhancement.
  • Optional Refiner (add_control_noise_refiner=False): You can disable the control noise refiner layers when loading the model to save memory.
  • Group Offload Fixes: The underlying code includes crucial fixes to ensure diffusers' group_offload works correctly with use_stream=True, enabling efficient memory management.

🏞️ Upscaling Examples

The following examples show how a detailed prompt can guide the model to transform a low-resolution input into a sharp, high-quality image.

Low-Resolution Input High-Resolution Output
Prompt: "masterpiece, 8k, photorealistic, sharp focus, fantasy character portrait of an anthropomorphic iguana sage. Intricate, hyper-detailed iridescent blue and green facial scales, realistic reptile skin texture. Expressive, intelligent golden-amber eyes. A vibrant blue, feathery throat fan (dewlap). Wearing a coarse, rustic woven hood with orange and brown tones. Dramatic studio lighting, soft side light, deep shadows, moody dark gray background. DSLR, macro details."
Low-Resolution Input High-Resolution Output
Prompt: "masterpiece, best quality, 8k, photorealistic, documentary photo, sharp focus, Somali people, African people, queue, line, man, woman, children, turban, hijab, colorful clothes, man using cellphone, detailed skin texture, fabric texture, dawn lighting, soft light, outdoors."
Low-Resolution Input High-Resolution Output
Prompt: "Photo of a bright living room, viewed from a dark hallway. The polished hardwood floor reflects light from large windows. The room has modern furniture and a TV. Strong natural lighting, high contrast, interior shot, ultra detailed."
Low-Resolution Input High-Resolution Output
Prompt: "Photo of a sunny day in a bustling Times Square, New York. A crowd of people walks through an intersection surrounded by skyscrapers with huge digital billboards. A yellow taxi is visible. Crisp street photography, ultra detailed, vibrant colors."
## πŸ“‚ Repository Structure
  • ./transformer/: Directory for model weights (GGUF or standard).
  • infer_tile.py: The primary script for Tile-based upscaling.
  • infer_t2i.py: Script for standard Text-to-Image generation.
  • infer_i2i.py: Script for standard Image-to-Image generation.
  • diffusers_local/: Directory containing custom pipeline code.
  • requirements.txt: Python dependencies.
  • assets/: Folder for your input images.
  • outputs/: Folder where generated images will be saved.

Downloads last month
65
GGUF
Model size
10B params
Architecture
lumina2
Hardware compatibility
Log In to add your hardware

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support