StyleDrop: Text-to-image generation in any style
Scaling multimodal understanding to long videos
MusicLM
UniTune: Text-Driven Image Editing by Fine Tuning an Image Generation Model on a Single Image
Imagining with Imagen
Google answers Meta's video-generating AI with its own, dubbed Imagen Video
PaLI: Scaling Language-Image Learning in 100+ Languages
Imagen: Text-to-Image Diffusion Models
Mapping Urban Trees Across North America with the Auto Arborist Dataset
How AI creates photorealistic images from text
End-to-end Generative Pre-training for Multimodal Video Captioning
Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion