Improved Baselines with Visual Instruction Tuning
IMAGEBIND: One Embedding Space To Bind Them All
Couairon embedding arithmetic of multimodal queries for image retrieval cvprw 2022 paper
GPT-4 Technical Report
Visual ChatGPT: Talking, Drawing and Editing with Visual Foundation Models
CogVideo: Large-scale Pretraining for Text-to-Video Generation via Transformers