Change the repository type filter
All
Repositories list
67 repositories
- Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
- A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
LOVA3
Public(NeurIPS 2024) Learning to Visual Question Answering, Asking and AssessmentEvolveDirector
PublicVideoLISA
PublicGUI-Narrator
PublicAwesome-GUI-Agent
Public💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.RingID
PublicMotionDirector
Public[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.videollm-online
PublicX-Adapter
Publicafformer
PublicBoxDiff
Public[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained DiffusionDragAnything
PublicAssistGaze
Publicvideogui
PublicVisInContext
Publiccosmo
PublicEgoVLP
Public[NeurIPS2022] Egocentric Video-Language PretrainingUniVTG
Public[ICCV2023] UniVTG: Towards Unified Video-Language Temporal GroundingVisorGPT
Public[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPTLong-form-Video-Prior
Publicassistgui
PublicT2VScore
Publicsparseformer
Public(ICLR 2024, CVPR 2024) SparseFormerVideoSwap
Public