Lists (2)
Sort Name ascending (A-Z)
Stars
[ArXiv 2025] DiffusionVL: Translating Any Autoregressive Models into Diffusion Vision Language Models
The repository provides code for running inference with the Meta Segment Anything Audio Model (SAM-Audio), links for downloading the trained model checkpoints, and example notebooks that show how t…
Official repo for paper "IC-Effect: Precise and Efficient Video Effects Editing via In-Context Learning"
GUNETR_pplus: Gradient enhanced UNETR_pplus with tumor segmentation
GUNETR_pplus: Gradient enhanced UNETR_pplus with MSD liver segmentation
GUNETR_pplus: Gradient enhanced UNETR_pplus with MSD liver segmentation
[NeurIPS 2025] Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance
Implementing DeepSeek R1's GRPO algorithm from scratch
A very simple GRPO implement for reproducing r1-like LLM thinking.
[CVPR 2025] MatAnyone: Stable Video Matting with Consistent Memory Propagation
Official inference repo for FLUX.2 models
PyTorch implementation of JiT https://arxiv.org/abs/2511.13720
OmniInsert: Mask-Free Video Insertion of Any Reference via Diffusion Transformer Models
High-Quality Text-to-Video Generation with Alpha Channel
Generative Omnimatte (CVPR 2025)
VFXMaster: Unlocking Dynamic Visual Effect Generation via In-Context Learning
"AI-Trader: Can AI Beat the Market?" Live Trading Bench: https://ai4trade.ai Tech Report Link: https://arxiv.org/abs/2512.10971
Official implementation of "DiT360: High-Fidelity Panoramic Image Generation via Hybrid Training".
Lumina-DiMOO - An Open-Sourced Multi-Modal Large Diffusion Language Model
A lightweight LMM-based Document Parsing Model
LongLive: Real-time Interactive Long Video Generation
Hunyuan 3D Part Segmentation and Generation Pipeline
Lyra: Generative 3D Scene Reconstruction via Video Diffusion Model Self-Distillation
A 15TB Collection of Physics Simulation Datasets
[ICCV 2025] Code Implementation of "ArtEditor: Learning Customized Instructional Image Editor from Few-Shot Examples"