V-JEPA: The next step toward advanced machine intelligence
Bard gets its biggest upgrade yet with Gemini
Efficient Video-Text Learning with Iterative Co-tokenization
End-to-end Generative Pre-training for Multimodal Video Captioning
VDTTS: Visually-Driven Text-To-Speech