MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training
Apple Announces MM1: A Family of Multimodal LLMs Up To 30B Parameters that are SoTA in Pre-Training Metrics and Perform Competitively after Fine-Tuning
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Download PDF
Ferret: Refer and Ground Anything Anywhere at Any Granularity