Google’s Gemini AI just shattered the rules of visual processing — here’s what that means for you
AI Godmother Fei-Fei Li Has a Vision for Computer Vision
OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
View PDF
Amazon’s new VAPR tech spotlights packages for easier deliveries
Wimbledon: Line judges to be removed and electronic calling brought in from 2025
Claude 3.5 Sonnet for vision
PuLID: Pure and Lightning ID Customization via Contrastive Alignment
View PDF
Paint by Inpaint: Learning to Add Image Objects by Removing Them First
View PDF
Automatic Creative Selection with Cross-Modal Matching
View PDF
STT: Stateful Tracking with Transformers for Autonomous Driving
View PDF
Data-Efficient Multimodal Fusion on a Single GPU
View PDF
SAGS: Structure-Aware 3D Gaussian Splatting
View PDF
NHS AI test spots tiny cancers missed by doctors
SCIN: A new resource for representative dermatology images
Genie: Generative Interactive Environments
PIGEON: Predicting Image Geolocations
The HTML version was okay, though the PDF was not linked.
HTML (experimental)
Greg Brockman on X: "ChatGPT Vision for digitizing journal entries:" / X
Vision-controlled jetting for composite systems and robots
Formula One introduces AI 'computer vision' to monitor track breaches
Open sourcing Project Guideline: A platform for computer vision accessibility technology
SANPO: A Scene understanding, Accessibility, Navigation, Pathfinding, & Obstacle avoidance dataset
Google at ICCV 2023
How to Use ChatGPT’s New Image Features
DynIBaR: Space-time view synthesis from videos of dynamic scenes
GPT-4V(ision) system card
Read paper
These new tools could make AI vision systems less biased
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
PDF
MIME: Human-Aware 3D Scene Generation
Microsoft at CVPR 2023: Pushing the boundaries of computer vision - Microsoft Research
Recognize Anything: A Strong Image Tagging Model