Data-Efficient Multimodal Fusion on a Single GPUView PDF#Machine Learning#Computer Vision#Multimodal#Paper#PDF·arxiv.org·May 2, 2024Data-Efficient Multimodal Fusion on a Single GPU
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training#Large Language Models#Multimodal#Apple#Paper#PDF·arxiv.org·Mar 17, 2024MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Guiding Instruction-based Image Editing via Multimodal Large Language ModelsDownload PDF#Apple#Multimodal#Editing#Paper#PDF#Opensource·arxiv.org·Feb 7, 2024Guiding Instruction-based Image Editing via Multimodal Large Language Models
Ferret: Refer and Ground Anything Anywhere at Any Granularity#Apple#Large Language Models#Multimodal#Paper#PDF#Opensource·arxiv.org·Dec 26, 2023Ferret: Refer and Ground Anything Anywhere at Any Granularity
NExT-GPT: Any-to-Any Multimodal LLM#Large Language Models#Multimodal#Paper#PDF·arxiv.org·Sep 27, 2023NExT-GPT: Any-to-Any Multimodal LLM
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning | Meta AI ResearchDownload the Paper#Meta#Multimodal#Paper#PDF·ai.meta.com·Jul 14, 2023Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning | Meta AI Research
A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics#Medical#Biomedical#Multimodal#Diagnostics#Paper#PDF·nature.com·Jun 13, 2023A transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics
Gong, Y., Rouditchenko, A., Liu, A. H., Harwath, D., Karlinsky, L., Kuehne, H., & Glass, J. (2022). Contrastive audio-visual masked autoencoder. arXiv preprint arXiv:2210.07839.#Machine Learning#Multimodal#Paper#PDF·openreview.net·Jun 10, 2023Gong, Y., Rouditchenko, A., Liu, A. H., Harwath, D., Karlinsky, L., Kuehne, H., & Glass, J. (2022). Contrastive audio-visual masked autoencoder. arXiv preprint arXiv:2210.07839.
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace#Large Language Models#Hugging Face#Multimodal#Multiagent#Paper#PDF·arxiv.org·Apr 5, 2023HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in HuggingFace
Stable Bias: Analyzing Societal Representations in Diffusion Models#Graphics#Bias#Paper#PDF#Text-to-Image#Multimodal#Stable Diffusion#DALL-E#Large Language Models·arxiv.org·Mar 23, 2023Stable Bias: Analyzing Societal Representations in Diffusion Models
ViperGPT: Visual Inference via Python Execution for Reasoning#Multimodal#Paper#PDF#Questions and Answers·arxiv.org·Mar 19, 2023ViperGPT: Visual Inference via Python Execution for Reasoning
Erasing Concepts from Diffusion Models#Diffusion#Model#Editing#Multimodal#Paper#PDF·arxiv.org·Mar 19, 2023Erasing Concepts from Diffusion Models
ChatGPT is on the horizon: Could a large language model be all we need for Intelligent Transportation?#Transportation#Research#Paper#PDF#Multimodal#Large Language Models·arxiv.org·Mar 10, 2023ChatGPT is on the horizon: Could a large language model be all we need for Intelligent Transportation?