Non-Line-of-Sight 3D Object Reconstruction via mmWave Surface Normal Estimation
GPS as a Control Signal for Image Generation
View PDF
Scaling Language-Free Visual Representation Learning
View PDF
Vid2World: Crafting Video Diffusion Models to Interactive World Models
Robot umpires are getting their first MLB test during spring training
Google’s Gemini AI just shattered the rules of visual processing — here’s what that means for you
AI Godmother Fei-Fei Li Has a Vision for Computer Vision
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
View PDF
Amazon’s new VAPR tech spotlights packages for easier deliveries
Wimbledon: Line judges to be removed and electronic calling brought in from 2025
Claude 3.5 Sonnet for vision
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
View PDF
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
View PDF
Automatic Creative Selection with Cross-Modal Matching
View PDF
STT: Stateful Tracking with Transformers for Autonomous Driving
View PDF
Data-Efficient Multimodal Fusion on a Single GPU
View PDF
SAGS: Structure-Aware 3D Gaussian Splatting
View PDF
NHS AI test spots tiny cancers missed by doctors
SCIN: A new resource for representative dermatology images
Genie: Generative Interactive Environments
PIGEON: Predicting Image Geolocations
The HTML version was okay, though the PDF was not linked.
HTML (experimental)
Greg Brockman on X: "ChatGPT Vision for digitizing journal entries:" / X
Vision-controlled jetting for composite systems and robots
Formula One introduces AI 'computer vision' to monitor track breaches
Open sourcing Project Guideline: A platform for computer vision accessibility technology
SANPO: A Scene understanding, Accessibility, Navigation, Pathfinding, & Obstacle avoidance dataset
Google at ICCV 2023
How to Use ChatGPT’s New Image Features
DynIBaR: Space-time view synthesis from videos of dynamic scenes
GPT-4V(ision) system card
Read paper