Improved Baselines with Visual Instruction Tuning#ai#llm#multimodal·arxiv.org·Oct 9, 2023Improved Baselines with Visual Instruction Tuning
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models#ai#llm#multimodal·arxiv.org·Mar 12, 2023Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers#ai#llm#video#multimodal·arxiv.org·Feb 13, 2023CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers