Instructions to use CedricPerauer/tiny-krea2-modular-pipe with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Diffusers
How to use CedricPerauer/tiny-krea2-modular-pipe with Diffusers:
pip install -U diffusers transformers accelerate
import torch from diffusers import DiffusionPipeline # switch to "mps" for apple devices pipe = DiffusionPipeline.from_pretrained("CedricPerauer/tiny-krea2-modular-pipe", dtype=torch.bfloat16, device_map="cuda") prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k" image = pipe(prompt).images[0] - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- Draw Things
- DiffusionBee
This is a modular diffusion pipeline built with 🧨 Diffusers' modular pipeline framework.
Pipeline Type: Krea2AutoBlocks
Description: Auto Modular pipeline for text-to-image generation using Krea 2: encode text -> core denoise (symmetric CFG) -> decode.
This pipeline uses a 3-block architecture that can be customized and extended.
Example Usage
[TODO]
Pipeline Architecture
This modular pipeline is composed of the following blocks:
- text_encoder (
Krea2TextEncoderStep)- Text encoder step that tokenizes the prompt(s) with the Krea 2 chat template, runs the Qwen3-VL text encoder, and stacks a fixed set of decoder-layer hidden states per token as the transformer's text conditioning. When
guidance_scale > 0the negative prompt is encoded the same way for CFG.
- Text encoder step that tokenizes the prompt(s) with the Krea 2 chat template, runs the Qwen3-VL text encoder, and stacks a fixed set of decoder-layer hidden states per token as the transformer's text conditioning. When
- denoise (
Krea2CoreDenoiseStep)- Core denoising workflow for Krea 2 text-to-image: prepares the batch/latents/timesteps and the shared position ids, then runs the symmetric-CFG denoising loop, producing the denoised packed latents for the decoder.
- decode (
Krea2DecodeStep)- Step that unpacks the denoised packed latents back to the spatial grid, de-normalizes them with the VAE's per-channel statistics, and decodes them through the Qwen-Image VAE into images.
Model Components
- text_encoder (
Qwen3VLModel): The Qwen3-VL text encoder. - tokenizer (
AutoTokenizer): The tokenizer paired with the text encoder. - transformer (
Krea2Transformer2DModel) - scheduler (
FlowMatchEulerDiscreteScheduler) - vae (
AutoencoderKLQwenImage) - image_processor (
VaeImageProcessor)
Configuration Parameters
is_distilled (default: False)
Workflow Input Specification
text2image
prompt(str): The prompt or prompts to guide image generation.
Input/Output Specification
Inputs:
prompt(str): The prompt or prompts to guide image generation.negative_prompt(str, optional): The negative prompt(s) for CFG.guidance_scale(float, optional, defaults to4.5): CFG scale; the negative prompt is only encoded when this is > 0.max_sequence_length(int, optional, defaults to512): Maximum sequence length for prompt encoding.num_images_per_prompt(int, optional, defaults to1): The number of images to generate per prompt.latents(Tensor, optional): Pre-generated noisy latents for image generation.height(int): The height in pixels of the generated image.width(int): The width in pixels of the generated image.generator(Generator, optional): Torch generator for deterministic generation.num_inference_steps(int, optional, defaults to28): The number of denoising steps.sigmas(list, optional): Custom sigma schedule (defaults to a linear ramp).attention_kwargs(dict, optional): Additional kwargs for attention processors.output_type(str, optional, defaults topil): Output format: 'pil', 'np', 'pt'.
Outputs:
images(list): Generated images.
- Downloads last month
- -