End-to-end Generative Pre-training for Multimodal Video Captioning#Multimodal#Video#Transformers#Google#Blog·ai.googleblog.com·Jun 7, 2022End-to-end Generative Pre-training for Multimodal Video Captioning
Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion#Multimodal#Transformers#Google·ai.googleblog.com·Mar 16, 2022Multimodal Bottleneck Transformer (MBT): A New Model for Modality Fusion