Scene Co-pilot: Procedural Text to Video Generation with Human in the Loop
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
LARP: Tokenizing Videos 🎬 with a Learned Autoregressive Generative Prior 🚀
MeshRet
Retrieval-Augmented Diffusion Models for Time Series Forecasting
Continuous Speech Synthesis using per-token Latent Diffusion
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with...
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
FastTalker: Jointly Generating Speech and Conversational Gestures from Text
Intelligent Conversational Bot for Massive Online Open Courses (MOOCs)
Intelligent Conversational Bot for Massive Online Open Courses (MOOCs)
MasterKey: Automated Jailbreak Across Multiple Large Language...
SPARK: Self-supervised Personalized Real-time Monocular Face Capture
Training Spiking Neural Networks Using Lessons From Deep Learning
Training Spiking Neural Networks Using Lessons From Deep Learning
Apollo: Band-sequence Modeling for High-Quality Music Restoration in Compressed Audio
AudioBERT: Audio Knowledge Augmented Language Model
Prompt2Fashion: An automatically generated fashion dataset
Startup success prediction and VC portfolio simulation using...
LiDAR-Event Stereo Fusion with Hallucinations
LOOPY: TAMING AUDIO-DRIVEN PORTRAIT AVATAR WITH LONG-TERM MOTION DEPENDENCY
Accelerating Scientific Discovery with Generative Knowledge...
Proxemics and Social Interactions in an Instrumented Virtual...
Florence-2: Advancing a Unified Representation for a Variety of...
Large Motion Model
Large Motion Model for Unified Multi-Modal Motion Generation
View PDF
Simple, unified analysis of Johnson-Lindenstrauss with applications
Spontaneous Theory of Mind for Artificial Intelligence