PG-MAP / UG-FM for Stable Diffusion 3.5-medium

Custom diffusers pipeline for UG-FM โ€” the flow-matching reduction of PG-MAP on SD3.5-medium. Defaults to the paper's headline configuration (data-side gate, $K_{UG}=4$, $\eta_z=0.1$, full backprop through the velocity prediction) which delivers 91.9% PickScore / 75.7% HPS win-rates against the static rectified-flow baseline on PartiPrompts ($n=1632$, seed 123).

NeurIPS 2026 โ€” see github.com/sophialanlan/PG-MAP for the paper, full configs, and reproduction scripts.

Install

pip install pg-map
# or
pip install git+https://github.com/sophialanlan/PG-MAP

You also need to accept the Stability AI Community License for the SD3.5 weights on huggingface.co/stabilityai/stable-diffusion-3.5-medium before the first load.

Usage

from diffusers import DiffusionPipeline
from pgmap import FrozenRewardModel
import torch

pipe = DiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-3.5-medium",
    custom_pipeline="sophialan/pg-map-sd3",
    torch_dtype=torch.float16,
).to("cuda")

reward = FrozenRewardModel("pickscore", device="cuda")

# UG-FM (default): 91.9% PickScore configuration
image = pipe(
    "a phoenix rising from ashes, vivid orange and red feathers",
    reward_model=reward,
).images[0]

For the full PG-MAP-FM (joint c + z_t with flow consistency + Gaussian priors + reward), pass pg_map_config with optimize_c=True:

from pgmap import sdxl_defaults
from dataclasses import replace

cfg = sdxl_defaults()                                     # starting point
cfg = replace(cfg, optimize_c=True, optimize_z=True)
image = pipe("a phoenix rising from ashes", pg_map_config=cfg).images[0]

Why UG-FM is the right default for flow matching

Per paper ยง3.2, on SD3.5 the optimal active set collapses to ${z_t}$ alone at data-side steps for two transport-specific reasons:

  1. Conditioning capacity. SD3.5's concatenated CLIP-L / CLIP-G / T5-XXL representation has ~1.4 M optimisable parameters, so a unit-normalised c-gradient is spread too thinly to move any single direction.
  2. Local Euler amplification. A noise-side perturbation traverses ~25 factors of $I + \Delta t_j,\partial_z v_\theta$ and grows 5โ€“50ร—, while a data-side perturbation has only 1โ€“3 remaining factors and stays bounded (sub-pixel mean RMSE $0.61/255$).

Paper headline (SD3.5-medium, PartiPrompts $n=1632$, seed 123)

Method PickScore HPS Aesthetic CLIP
Static baseline 50.0% 50.0% 50.0% 50.0%
FlowChef (always-on, K=1) 82.4% 68.1% 49.7% 53.9%
FlowChef (gating-matched) 75.0% 62.5% 46.9% 52.9%
UG-FM (Ours) 91.9% 75.7% 51.7% 54.2%

Win-rate vs. same-seed static baseline. The 16.9 pp PickScore gap between UG-FM and gating-matched FlowChef isolates the full backprop through $v_\theta$ axis โ€” FlowChef's gradient skipping (with torch.no_grad(): v = v_theta(...)) discards the Jacobian factor $I - (1-t),\partial_z v_\theta$ which is load-bearing.

Citation

@inproceedings{sun2026pgmap,
  title={{PG-MAP}: Joint {MAP} Optimization for Inference-Time Alignment of Diffusion and Flow-Matching Models},
  author={Sun, Ruolan and Polak, Pawel},
  booktitle={Advances in Neural Information Processing Systems (NeurIPS)},
  year={2026}
}

License

MIT (see LICENSE). SD3.5 weights are under the Stability AI Community License.

Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using sophialan/pg-map-sd3 1