Meta-Transformer: A Unified Framework for Multimodal Learning
Brain2Music: Reconstructing Music from Human Brain Activity
Learning from Pixels with Expert Observations
EmotionPrompt: Leveraging Psychology for Large Language Models Enhancement via Emotional Stimulus
HDHumans: A Hybrid Approach for High-fidelity Digital Humans
PDP: Parameter-free Differentiable Pruning is All You Need
Lecture 4 wilde
Quantum compression with classically simulatable circuits
Retentive Network: A Successor to Transformer for Large Language Models
Neural Haircut: Prior-Guided Strand-Based Hair Reconstruction
Robust flight navigation out of distribution with liquid neural networks
science.org
Physics-based Motion Retargeting from Sparse Inputs
DANIELE REDA
Neural Relighting with Subsurface Scattering by Learning the...
SadTalker: Learning Realistic 3D Motion Coefficients for Stylized Audio-Driven Single Image Talking Face Animation
Form at arxiv.org e-Print archive
Computer Science authors/titles Jan 1993
Better speech synthesis through scaling
GitHub - neonbjb/tortoise-tts: A multi-voice TTS system trained with an emphasis on quality
Paper page - MotionGPT: Human Motion as a Foreign Language
Action-GPT
2306
arXiv API Access - arXiv info
camenduru on Twitter
SoundStorm: Efficient Parallel Audio Generation
Scan and Snap: Understanding Training Dynamics and Token Composition in 1-layer Transformer
Role-Play with Large Language Models
EDM3: Event Detection as Multi-task Text Generation
PandaGPT: One Model To Instruction-Follow Them All
Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion
Large Language Models as Tool Makers