OS-ATLAS: A Foundation Action Model for Generalist GUI AgentsView PDF#User Interfaces#Graphics#Large Language Models#Computer Vision#Opensource#Paper#PDF·arxiv.org·Nov 5, 2024OS-ATLAS: A Foundation Action Model for Generalist GUI Agents
PuLID: Pure and Lightning ID Customization via Contrastive AlignmentView PDF#Computer Vision#Editing#Identification#Paper#PDF#Gradio·arxiv.org·May 2, 2024PuLID: Pure and Lightning ID Customization via Contrastive Alignment
Paint by Inpaint: Learning to Add Image Objects by Removing Them FirstView PDF#Computer Vision#Editing#Paper#PDF·arxiv.org·May 2, 2024Paint by Inpaint: Learning to Add Image Objects by Removing Them First
Automatic Creative Selection with Cross-Modal MatchingView PDF#Search#Computer Vision#Apple#Paper#PDF·arxiv.org·May 2, 2024Automatic Creative Selection with Cross-Modal Matching
STT: Stateful Tracking with Transformers for Autonomous DrivingView PDF#AVs#Transformers#Machine Learning#Computer Vision#Paper#PDF·arxiv.org·May 2, 2024STT: Stateful Tracking with Transformers for Autonomous Driving
Data-Efficient Multimodal Fusion on a Single GPUView PDF#Machine Learning#Computer Vision#Multimodal#Paper#PDF·arxiv.org·May 2, 2024Data-Efficient Multimodal Fusion on a Single GPU
SAGS: Structure-Aware 3D Gaussian SplattingView PDF#Computer Vision#Huawei#Paper#PDF·arxiv.org·May 1, 2024SAGS: Structure-Aware 3D Gaussian Splatting
Genie: Generative Interactive Environments#Machine Learning#Computer Vision#Google#Research#Paper#PDF#Foundation Models#Games#Robotics·arxiv.org·Feb 26, 2024Genie: Generative Interactive Environments
GPT-4V(ision) system cardRead paper#GPT-4#Computer Vision#OpenAI#PDF·openai.com·Sep 25, 2023GPT-4V(ision) system card
StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation LearnersPDF#Machine Learning#Stable Diffusion#Training#Computer Vision#Paper#PDF·arxiv.org·Jun 4, 2023StableRep: Synthetic Images from Text-to-Image Models Make Strong Visual Representation Learners
MIME: Human-Aware 3D Scene Generation#Computer Vision#3D#Paper#PDF·arxiv.org·Jun 22, 2023MIME: Human-Aware 3D Scene Generation
Recognize Anything: A Strong Image Tagging Model#Computer Vision#Image Recognition#Paper#PDF·arxiv.org·Jun 11, 2023Recognize Anything: A Strong Image Tagging Model
CAPE: Camera View Position Embedding for Multi-View 3D Object Detection#Computer Vision#Baidu#Object Detection#Paper#PDF·arxiv.org·Jun 7, 2023CAPE: Camera View Position Embedding for Multi-View 3D Object Detection
Improving Factuality and Reasoning in Language Models through Multiagent Debate#Reasoning#Large Language Models#Machine Learning#Computer Vision#Paper#PDF·arxiv.org·May 30, 2023Improving Factuality and Reasoning in Language Models through Multiagent Debate
ORCa: Glossy Objects as Radiance Field Cameras#Computer Vision#Pattern Recognition#Sensor#Paper#PDF·arxiv.org·May 29, 2023ORCa: Glossy Objects as Radiance Field Cameras
Random-Access Neural Compression of Material Textures#Graphics#Computer Vision#Games#Nvidia#Paper#PDF·research.nvidia.com·May 6, 2023Random-Access Neural Compression of Material Textures
Scaling Vision Transformers to 22 Billion Parameters#Transformers#Computer Vision#Paper#PDF#Google·arxiv.org·Apr 23, 2023Scaling Vision Transformers to 22 Billion Parameters
DINOv2: Learning Robust Visual Features without Supervision#Computer Vision#Meta#Paper#PDF·arxiv.org·Apr 18, 2023DINOv2: Learning Robust Visual Features without Supervision
StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-based 3D Object Detection#Computer Vision#Baidu#Paper#PDF·arxiv.org·Mar 7, 2023StereoDistill: Pick the Cream from LiDAR for Distilling Stereo-based 3D Object Detection
A Good Prompt Is Worth Millions of Parameters? Low-resource...#Prompt Engineering#Large Language Models#Computer Vision#PDF#Questions and Answers·arxiv.org·Dec 6, 2021A Good Prompt Is Worth Millions of Parameters? Low-resource...