This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.

Pipeline Type: Krea2AutoBlocks

Description: Auto Modular pipeline for text-to-image generation using Krea 2: encode text -> core denoise (symmetric CFG) -> decode.

This pipeline uses a 3-block architecture that can be customized and extended.

Example Usage

[TODO]

Pipeline Architecture

This modular pipeline is composed of the following blocks:

text_encoder (Krea2TextEncoderStep)
- Text encoder step that tokenizes the prompt(s) with the Krea 2 chat template, runs the Qwen3-VL text encoder, and stacks a fixed set of decoder-layer hidden states per token as the transformer's text conditioning. When guidance_scale > 0 the negative prompt is encoded the same way for CFG.
denoise (Krea2CoreDenoiseStep)
- Core denoising workflow for Krea 2 text-to-image: prepares the batch/latents/timesteps and the shared position ids, then runs the symmetric-CFG denoising loop, producing the denoised packed latents for the decoder.
decode (Krea2DecodeStep)
- Step that unpacks the denoised packed latents back to the spatial grid, de-normalizes them with the VAE's per-channel statistics, and decodes them through the Qwen-Image VAE into images.

is_distilled (default: False)

text2image

Inputs:

prompt (str): The prompt or prompts to guide image generation.
negative_prompt (str, optional): The negative prompt(s) for CFG.
guidance_scale (float, optional, defaults to 4.5): CFG scale; the negative prompt is only encoded when this is > 0.
max_sequence_length (int, optional, defaults to 512): Maximum sequence length for prompt encoding.
num_images_per_prompt (int, optional, defaults to 1): The number of images to generate per prompt.
latents (Tensor, optional): Pre-generated noisy latents for image generation.
height (int): The height in pixels of the generated image.
width (int): The width in pixels of the generated image.
generator (Generator, optional): Torch generator for deterministic generation.
num_inference_steps (int, optional, defaults to 28): The number of denoising steps.
sigmas (list, optional): Custom sigma schedule (defaults to a linear ramp).
attention_kwargs (dict, optional): Additional kwargs for attention processors.
output_type (str, optional, defaults to pil): Output format: 'pil', 'np', 'pt'.

Outputs: