Non-Line-of-Sight 3D Object Reconstruction via mmWave Surface Normal Estimation
GPS as a Control Signal for Image Generation
View PDF
Scaling Language-Free Visual Representation Learning
View PDF
Vid2World: Crafting Video Diffusion Models to Interactive World Models
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
View PDF
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
View PDF
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
View PDF
Automatic Creative Selection with Cross-Modal Matching
View PDF
STT: Stateful Tracking with Transformers for Autonomous Driving
View PDF
Data-Efficient Multimodal Fusion on a Single GPU
View PDF
SAGS: Structure-Aware 3D Gaussian Splatting
View PDF
Genie: Generative Interactive Environments
GPT-4V(ision) system card
Read paper
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
PDF
MIME: Human-Aware 3D Scene Generation
Recognize Anything: A Strong Image Tagging Model
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
Improving Factuality and Reasoning in Language Models through Multiagent Debate
ORCa: Glossy Objects as Radiance Field Cameras
Random-Access Neural Compression of Material Textures
Scaling Vision Transformers to 22 Billion Parameters
DINOv2: Learning Robust Visual Features without Supervision
StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-based 3D Object Detection
A Good Prompt Is Worth Millions of Parameters? Low-resource...