Show Lab

All

67 repositories

Show-o
Public
Repository for Show-o, One Single Transformer to Unify Multimodal Understanding and Generation.
multimodal diffusion-models large-language-models
Python
•
Apache License 2.0
•40•930•25•0•Updated Oct 18, 2024Oct 18, 2024
Awesome-Video-Diffusion
Public
A curated list of recent diffusion models for video generation, editing, restoration, understanding, etc.
awesome video-editing video-understanding video-generation diffusion-models text-to-video video-restoration text-to-motion
197•3.3k•0•0•Updated Oct 18, 2024Oct 18, 2024
LOVA3
Public
(NeurIPS 2024) Learning to Visual Question Answering, Asking and Assessment
benchmark visual-question-answering multimodal-deep-learning visual-question-generation multimodal-large-language-models data-asse
Python
•1•39•0•0•Updated Oct 16, 2024Oct 16, 2024
EvolveDirector
Public
[NeurIPS 2024] EvolveDirector: Approaching Advanced Text-to-Image Generation with Large Vision-Language Models.
Python
•0•34•0•0•Updated Oct 14, 2024Oct 14, 2024
Awesome-MLLM-Hallucination
Public
📖 A curated list of resources dedicated to hallucination of multimodal large language models (MLLM).
13•414•1•0•Updated Oct 10, 2024Oct 10, 2024
Awesome-Unified-Multimodal-Models
Public
📖 This is a repository for organizing papers, codes and other resources related to unified multimodal models.
4•177•0•1•Updated Oct 10, 2024Oct 10, 2024
VideoLISA
Public
[NeurlPS 2024] One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
0•20•1•0•Updated Oct 3, 2024Oct 3, 2024
MovieSeq
Public
[ECCV2024] Learning Video Context as Interleaved Multimodal Sequences
Jupyter Notebook
•1•27•1•0•Updated Oct 1, 2024Oct 1, 2024
GUI-Narrator
Public
Repository of GUI Action Narrator
JavaScript
•0•3•0•0•Updated Sep 22, 2024Sep 22, 2024
Awesome-GUI-Agent
Public
💻 A curated list of papers and resources for multi-modal Graphical User Interface (GUI) agents.
awesome graphical-user-interface ai-assistant llm-agent gui-agents
5•142•1•0•Updated Sep 17, 2024Sep 17, 2024
RingID
Public
Python
•0•13•1•0•Updated Aug 30, 2024Aug 30, 2024
MotionDirector
Public
[ECCV 2024 Oral] MotionDirector: Motion Customization of Text-to-Video Diffusion Models.
video-generation diffusion-models text-to-video text-to-motion text-to-video-generation motion-customization
Python
•
Apache License 2.0
•49•825•20•0•Updated Aug 21, 2024Aug 21, 2024
videollm-online
Public
VideoLLM-online: Online Video Large Language Model for Streaming Video (CVPR 2024)
Python
•
Apache License 2.0
•26•206•15•0•Updated Aug 15, 2024Aug 15, 2024
X-Adapter
Public
[CVPR 2024] X-Adapter: Adding Universal Compatibility of Plugins for Upgraded Diffusion Model
Python
•
Apache License 2.0
•43•734•17•4•Updated Aug 14, 2024Aug 14, 2024
afformer
Public
Affordance Grounding from Demonstration Video to Target Image (CVPR 2023)
deep-learning pytorch
Python
•2•38•6•0•Updated Jul 26, 2024Jul 26, 2024
BoxDiff
Public
[ICCV 2023] BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
text-to-image-synthesis diffusion-models
Python
•14•244•6•0•Updated Jul 21, 2024Jul 21, 2024
cvpr2024-tutorial-video-diffusion-models
Public
HTML
•
MIT License
•0•1•0•0•Updated Jul 16, 2024Jul 16, 2024
DragAnything
Public
[ECCV 2024] DragAnything: Motion Control for Anything using Entity Representation
Python
•13•412•20•0•Updated Jul 2, 2024Jul 2, 2024
AssistGaze
Public
Python
•0•0•0•0•Updated Jun 25, 2024Jun 25, 2024
videogui
Public
official repo of "VideoGUI: A Benchmark for GUI Automation from Instructional Videos"
gui video-language llm-agent
JavaScript
•0•20•0•0•Updated Jun 16, 2024Jun 16, 2024
VisInContext
Public
Official implementation of Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning
efficient in-context-learning llm mllm
Python
•1•11•1•0•Updated Jun 6, 2024Jun 6, 2024
cosmo
Public
Python
•4•70•2•2•Updated May 10, 2024May 10, 2024
EgoVLP
Public
[NeurIPS2022] Egocentric Video-Language Pretraining
pretraining video-language egocentric-vision pytorch
Python
•20•223•5•0•Updated May 9, 2024May 9, 2024
UniVTG
Public
[ICCV2023] UniVTG: Towards Unified Video-Language Temporal Grounding
video-summarization video-grounding pretraining moment-retrieval highlight-detection video-language
Python
•
MIT License
•28•317•19•0•Updated May 8, 2024May 8, 2024
VisorGPT
Public
[NeurIPS 2023] Customize spatial layouts for conditional image synthesis models, e.g., ControlNet, using GPT
image-generation gpt diffusion-models controlnet
Python
•
MIT License
•2•130•4•0•Updated May 4, 2024May 4, 2024
Long-form-Video-Prior
Public
Python
•0•23•0•0•Updated May 3, 2024May 3, 2024
assistgui
Public
JavaScript
•1•23•1•0•Updated Apr 16, 2024Apr 16, 2024
T2VScore
Public
T2VScore: Towards A Better Metric for Text-to-Video Generation
1•76•3•0•Updated Apr 10, 2024Apr 10, 2024
sparseformer
Public
(ICLR 2024, CVPR 2024) SparseFormer
computer-vision transformer efficient-neural-networks vision-transformer sparseformer
Python
•
MIT License
•1•62•1•0•Updated Mar 30, 2024Mar 30, 2024
VideoSwap
Public
Code for [CVPR 2024] VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence
13•344•0•0•Updated Mar 29, 2024Mar 29, 2024