This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.

Pipeline Type: Krea2AutoBlocks

Description: Auto Modular pipeline for text-to-image generation using Krea 2: encode text -> core denoise (symmetric CFG) -> decode.

This pipeline uses a 3-block architecture that can be customized and extended.

Example Usage

[TODO]

Pipeline Architecture

This modular pipeline is composed of the following blocks:

  1. text_encoder (Krea2TextEncoderStep)
    • Text encoder step that tokenizes the prompt(s) with the Krea 2 chat template, runs the Qwen3-VL text encoder, and stacks a fixed set of decoder-layer hidden states per token as the transformer's text conditioning. When guidance_scale > 0 the negative prompt is encoded the same way for CFG.
  2. denoise (Krea2CoreDenoiseStep)
    • Core denoising workflow for Krea 2 text-to-image: prepares the batch/latents/timesteps and the shared position ids, then runs the symmetric-CFG denoising loop, producing the denoised packed latents for the decoder.
  3. decode (Krea2DecodeStep)
    • Step that unpacks the denoised packed latents back to the spatial grid, de-normalizes them with the VAE's per-channel statistics, and decodes them through the Qwen-Image VAE into images.

Model Components

  1. text_encoder (Qwen3VLModel): The Qwen3-VL text encoder.
  2. tokenizer (AutoTokenizer): The tokenizer paired with the text encoder.
  3. transformer (Krea2Transformer2DModel)
  4. scheduler (FlowMatchEulerDiscreteScheduler)
  5. vae (AutoencoderKLQwenImage)
  6. image_processor (VaeImageProcessor)

Configuration Parameters

is_distilled (default: False)

Workflow Input Specification

text2image
  • prompt (str): The prompt or prompts to guide image generation.

Input/Output Specification

Inputs:

  • prompt (str): The prompt or prompts to guide image generation.
  • negative_prompt (str, optional): The negative prompt(s) for CFG.
  • guidance_scale (float, optional, defaults to 4.5): CFG scale; the negative prompt is only encoded when this is > 0.
  • max_sequence_length (int, optional, defaults to 512): Maximum sequence length for prompt encoding.
  • num_images_per_prompt (int, optional, defaults to 1): The number of images to generate per prompt.
  • latents (Tensor, optional): Pre-generated noisy latents for image generation.
  • height (int): The height in pixels of the generated image.
  • width (int): The width in pixels of the generated image.
  • generator (Generator, optional): Torch generator for deterministic generation.
  • num_inference_steps (int, optional, defaults to 28): The number of denoising steps.
  • sigmas (list, optional): Custom sigma schedule (defaults to a linear ramp).
  • attention_kwargs (dict, optional): Additional kwargs for attention processors.
  • output_type (str, optional, defaults to pil): Output format: 'pil', 'np', 'pt'.

Outputs:

  • images (list): Generated images.
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support