Highlights
- Pro
Stars
Official repository for the paper "MICo-150K: A Comprehensive Dataset for Multi-Image Composition".
A Distributed Attention Towards Linear Scalability for Ultra-Long Context, Heterogeneous Data Training
xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism
🤗A PyTorch-native Inference Engine with Hybrid Cache Acceleration and Parallelism for DiTs: Z-Image, FLUX2, Qwen-Image, etc.
ChronoEdit: Towards Temporal Reasoning for Image Editing and World Simulation
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
[ICCV 2025] 🔥🔥 UNO: A Universal Customization Method for Both Single and Multi-Subject Conditioning
HunyuanImage-3.0: A Powerful Native Multimodal Model for Image Generation
🔥🔥 Official Repo of UMO: Scaling Multi-Identity Consistency for Image Customization via Matching Reward
Arxiv 25: Dynamic Pyramid Network for Efficient Multimodal Large Language Model
[NeurIPS 2025] An official implementation of Flow-GRPO: Training Flow Matching Models via Online RL
Qwen-Image-Lightning: Speed up Qwen-Image model with distillation
Qwen-Image is a powerful image generation foundation model capable of complex text rendering and precise image editing.
Generating Immersive, Explorable, and Interactive 3D Worlds from Words or Pixels with Hunyuan3D World Model
Consistency Distillation with Target Timestep Selection and Decoupled Guidance
This is the official implementation of our Señorita-2M [Weights and Dataset] : A High-Quality Instruction-based Dataset for General Video Editing by Video Specialists
SANA: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer
[ICLR2025 Spotlight] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models
(CVPR 2025) From Slow Bidirectional to Fast Autoregressive Video Diffusion Models
Phantom: Subject-Consistent Video Generation via Cross-Modal Alignment
Enjoy the magic of Diffusion models!
Official PyTorch implementation - Video Motion Transfer with Diffusion Transformers
Official implementation of ATI: Any Trajectory Instruction for Controllable Video Generation. https://arxiv.org/pdf/2505.22944
📹 A more flexible framework that can generate videos at any resolution and creates videos from images.