Found 116 bookmarks
Newest
EFloat: Entropy-coded Floating Point Format for Compressing Vector Embedding Models
EFloat: Entropy-coded Floating Point Format for Compressing Vector Embedding Models
In a large class of deep learning models, including vector embedding models such as word and database embeddings, we observe that floating point exponent values cluster around a few unique values, permitting entropy based data compression. Entropy coding compresses fixed-length values with variable-length codes, encoding most probable values with fewer bits. We propose the EFloat compressed floating point number format that uses a variable field boundary between the exponent and significand fields. EFloat uses entropy coding on exponent values and signs to minimize the average width of the exponent and sign fields, while preserving the original FP32 exponent range unchanged. Saved bits become part of the significand field increasing the EFloat numeric precision by 4.3 bits on average compared to other reduced-precision floating point formats. EFloat makes 8-bit and even smaller floats practical without sacrificing the exponent range of a 32-bit floating point representation. We currently use the EFloat format for saving memory capacity and bandwidth consumption of large vector embedding models such as those used for database embeddings. Using the RMS error as metric, we demonstrate that EFloat provides higher accuracy than other floating point formats with equal bit budget. The EF12 format with 12-bit budget has less end-to-end application error than the 16-bit BFloat16. EF16 with 16-bit budget has an RMS-error 17 to 35 times less than BF16 RMS-error for a diverse set of embedding models. When making similarity and dissimilarity queries, using the NDCG ranking metric, EFloat matches the result quality of prior floating point representations with larger bit budgets.
·arxiv.org·
EFloat: Entropy-coded Floating Point Format for Compressing Vector Embedding Models
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals
Language models are increasingly attracting interest from writers. However, such models lack long-range semantic coherence, limiting their usefulness for longform creative writing. We address this limitation by applying language models hierarchically, in a system we call Dramatron. By building structural context via prompt chaining, Dramatron can generate coherent scripts and screenplays complete with title, characters, story beats, location descriptions, and dialogue. We illustrate Dramatron's usefulness as an interactive co-creative system with a user study of 15 theatre and film industry professionals. Participants co-wrote theatre scripts and screenplays with Dramatron and engaged in open-ended interviews. We report critical reflections both from our interviewees and from independent reviewers who watched stagings of the works to illustrate how both Dramatron and hierarchical text generation could be useful for human-machine co-creativity. Finally, we discuss the suitability of Dramatron for co-creativity, ethical considerations -- including plagiarism and bias -- and participatory models for the design and deployment of such tools.
·arxiv.org·
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals
Visual Attention Network
Visual Attention Network
While originally designed for natural language processing tasks, the self-attention mechanism has recently taken various computer vision areas by storm. However, the 2D nature of images brings three challenges for applying self-attention in computer vision. (1) Treating images as 1D sequences neglects their 2D structures. (2) The quadratic complexity is too expensive for high-resolution images. (3) It only captures spatial adaptability but ignores channel adaptability. In this paper, we propose a novel linear attention named large kernel attention (LKA) to enable self-adaptive and long-range correlations in self-attention while avoiding its shortcomings. Furthermore, we present a neural network based on LKA, namely Visual Attention Network (VAN). While extremely simple, VAN surpasses similar size vision transformers(ViTs) and convolutional neural networks(CNNs) in various tasks, including image classification, object detection, semantic segmentation, panoptic segmentation, pose estimation, etc. For example, VAN-B6 achieves 87.8% accuracy on ImageNet benchmark and set new state-of-the-art performance (58.2 PQ) for panoptic segmentation. Besides, VAN-B2 surpasses Swin-T 4% mIoU (50.1 vs. 46.1) for semantic segmentation on ADE20K benchmark, 2.6% AP (48.8 vs. 46.2) for object detection on COCO dataset. It provides a novel method and a simple yet strong baseline for the community. Code is available at https://github.com/Visual-Attention-Network.
·arxiv.org·
Visual Attention Network
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the same input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state-of-the-art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high-quality dataset of full-body gesture motion including fingers, with speech, spanning across 19 different styles.
·arxiv.org·
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
NVIDIA Researchers Present 'RANA,' a Novel Artificial Intelligence Framework for Learning Relightable and Articulated Neural Avatars of Humans
NVIDIA Researchers Present 'RANA,' a Novel Artificial Intelligence Framework for Learning Relightable and Articulated Neural Avatars of Humans
Human-like articulated neural avatars have several uses in telepresence, animation, and visual content production. These neural avatars must be simple to create, simple to animate in new stances and views, capable of rendering in photorealistic picture quality, and simple to relight in novel situations if they are to be widely adopted. Existing techniques frequently use monocular films to teach these neural avatars. While the method permits movement and photorealistic image quality, the synthesized images are constantly constrained by the training video's lighting conditions. Other studies specifically address the relighting of human avatars. However, they do not provide the user control
·marktechpost.com·
NVIDIA Researchers Present 'RANA,' a Novel Artificial Intelligence Framework for Learning Relightable and Articulated Neural Avatars of Humans
Machine learning has revealed exactly how much of a Shakespeare play was wr
Machine learning has revealed exactly how much of a Shakespeare play was wr
For much of his life, William Shakespeare was the house playwright for an acting company called the King’s Men that performed his plays on the banks of the River Thames in London. When Shakespeare died in 1616, the company needed a replacement and turned to one of the most prolific and famous playwrights of the…
·technologyreview.com·
Machine learning has revealed exactly how much of a Shakespeare play was wr
A neural net solves the three-body problem 100 million times faster - MIT T
A neural net solves the three-body problem 100 million times faster - MIT T
In the 18th century, the great scientific challenge of the age was to find a way for mariners to determine their position at sea. One of the most successful solutions was to measure the position of the moon in the sky relative to the fixed background of stars. Because of parallax effects, this measurement depends…
·technologyreview.com·
A neural net solves the three-body problem 100 million times faster - MIT T
Machine vision has learned to use radio waves to see through walls and in d
Machine vision has learned to use radio waves to see through walls and in d
Machine vision has an impressive record. It has the superhuman ability to recognize people, faces and objects. It can even recognize many different kinds of actions, albeit not quite as well as humans just yet. But there are limits to its performance. Machines have a particularly difficult time when people, faces, or objects are partially…
·technologyreview.com·
Machine vision has learned to use radio waves to see through walls and in d