Found 116 bookmarks
Custom sorting
The Design Space of Generative Models
The Design Space of Generative Models
Card et al.'s classic paper "The Design Space of Input Devices" established the value of design spaces as a tool for HCI analysis and invention. We posit that developing design spaces for emerging pre-trained, generative AI models is necessary for supporting their integration into human-centered systems and practices. We explore what it means to develop an AI model design space by proposing two design spaces relating to generative AI models: the first considers how HCI can impact generative models (i.e., interfaces for models) and the second considers how generative models can impact HCI (i.e., models as an HCI prototyping material).
·arxiv.org·
The Design Space of Generative Models
SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration
SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration
Increased capabilities such as recognition and self-adaptability are now required from IoT applications. While IoT node power consumption is a major concern for these applications, cloud-based processing is becoming unsustainable due to continuous sensor or image data transmission over the wireless network. Thus optimized ML capabilities and data transfers should be integrated in the IoT node. Moreover, IoT applications are torn between sporadic data-logging and energy-hungry data processing (e.g. image classification). Thus, the versatility of the node is key in addressing this wide diversity of energy and processing needs. This paper presents SamurAI, a versatile IoT node bridging this gap in processing and in energy by leveraging two on-chip sub-systems: a low power, clock-less, event-driven Always-Responsive (AR) part and an energy-efficient On-Demand (OD) part. AR contains a 1.7MOPS event-driven, asynchronous Wake-up Controller (WuC) with a 207ns wake-up time optimized for sporadic computing, while OD combines a deep-sleep RISC-V CPU and 1.3TOPS/W Machine Learning (ML) for more complex tasks up to 36GOPS. This architecture partitioning achieves best in class versatility metrics such as peak performance to idle power ratio. On an applicative classification scenario, it demonstrates system power gains, up to 3.5x compared to cloud-based processing, and thus extended battery lifetime.
·arxiv.org·
SamurAI: A Versatile IoT Node With Event-Driven Wake-Up and Embedded ML Acceleration
We're Afraid Language Models Aren't Modeling Ambiguity
We're Afraid Language Models Aren't Modeling Ambiguity
Ambiguity is an intrinsic feature of natural language. Managing ambiguity is a key part of human language understanding, allowing us to anticipate misunderstanding as communicators and revise our interpretations as listeners. As language models (LMs) are increasingly employed as dialogue interfaces and writing aids, handling ambiguous language is critical to their success. We characterize ambiguity in a sentence by its effect on entailment relations with another sentence, and collect AmbiEnt, a linguist-annotated benchmark of 1,645 examples with diverse kinds of ambiguity. We design a suite of tests based on AmbiEnt, presenting the first evaluation of pretrained LMs to recognize ambiguity and disentangle possible meanings. We find that the task remains extremely challenging, including for the recent GPT-4, whose generated disambiguations are considered correct only 32% of the time in human evaluation, compared to 90% for disambiguations in our dataset. Finally, to illustrate the value of ambiguity-sensitive tools, we show that a multilabel NLI model can flag political claims in the wild that are misleading due to ambiguity. We encourage the field to rediscover the importance of ambiguity for NLP.
·arxiv.org·
We're Afraid Language Models Aren't Modeling Ambiguity
EFloat: Entropy-coded Floating Point Format for Compressing Vector Embedding Models
EFloat: Entropy-coded Floating Point Format for Compressing Vector Embedding Models
In a large class of deep learning models, including vector embedding models such as word and database embeddings, we observe that floating point exponent values cluster around a few unique values, permitting entropy based data compression. Entropy coding compresses fixed-length values with variable-length codes, encoding most probable values with fewer bits. We propose the EFloat compressed floating point number format that uses a variable field boundary between the exponent and significand fields. EFloat uses entropy coding on exponent values and signs to minimize the average width of the exponent and sign fields, while preserving the original FP32 exponent range unchanged. Saved bits become part of the significand field increasing the EFloat numeric precision by 4.3 bits on average compared to other reduced-precision floating point formats. EFloat makes 8-bit and even smaller floats practical without sacrificing the exponent range of a 32-bit floating point representation. We currently use the EFloat format for saving memory capacity and bandwidth consumption of large vector embedding models such as those used for database embeddings. Using the RMS error as metric, we demonstrate that EFloat provides higher accuracy than other floating point formats with equal bit budget. The EF12 format with 12-bit budget has less end-to-end application error than the 16-bit BFloat16. EF16 with 16-bit budget has an RMS-error 17 to 35 times less than BF16 RMS-error for a diverse set of embedding models. When making similarity and dissimilarity queries, using the NDCG ranking metric, EFloat matches the result quality of prior floating point representations with larger bit budgets.
·arxiv.org·
EFloat: Entropy-coded Floating Point Format for Compressing Vector Embedding Models
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
We present ZeroEGGS, a neural network framework for speech-driven gesture generation with zero-shot style control by example. This means style can be controlled via only a short example motion clip, even for motion styles unseen during training. Our model uses a Variational framework to learn a style embedding, making it easy to modify style through latent space manipulation or blending and scaling of style embeddings. The probabilistic nature of our framework further enables the generation of a variety of outputs given the same input, addressing the stochastic nature of gesture motion. In a series of experiments, we first demonstrate the flexibility and generalizability of our model to new speakers and styles. In a user study, we then show that our model outperforms previous state-of-the-art techniques in naturalness of motion, appropriateness for speech, and style portrayal. Finally, we release a high-quality dataset of full-body gesture motion including fingers, with speech, spanning across 19 different styles.
·arxiv.org·
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
NVIDIA Researchers Present 'RANA,' a Novel Artificial Intelligence Framework for Learning Relightable and Articulated Neural Avatars of Humans
NVIDIA Researchers Present 'RANA,' a Novel Artificial Intelligence Framework for Learning Relightable and Articulated Neural Avatars of Humans
Human-like articulated neural avatars have several uses in telepresence, animation, and visual content production. These neural avatars must be simple to create, simple to animate in new stances and views, capable of rendering in photorealistic picture quality, and simple to relight in novel situations if they are to be widely adopted. Existing techniques frequently use monocular films to teach these neural avatars. While the method permits movement and photorealistic image quality, the synthesized images are constantly constrained by the training video's lighting conditions. Other studies specifically address the relighting of human avatars. However, they do not provide the user control
·marktechpost.com·
NVIDIA Researchers Present 'RANA,' a Novel Artificial Intelligence Framework for Learning Relightable and Articulated Neural Avatars of Humans
Machine vision has learned to use radio waves to see through walls and in d
Machine vision has learned to use radio waves to see through walls and in d
Machine vision has an impressive record. It has the superhuman ability to recognize people, faces and objects. It can even recognize many different kinds of actions, albeit not quite as well as humans just yet. But there are limits to its performance. Machines have a particularly difficult time when people, faces, or objects are partially…
·technologyreview.com·
Machine vision has learned to use radio waves to see through walls and in d
Machine learning has revealed exactly how much of a Shakespeare play was wr
Machine learning has revealed exactly how much of a Shakespeare play was wr
For much of his life, William Shakespeare was the house playwright for an acting company called the King’s Men that performed his plays on the banks of the River Thames in London. When Shakespeare died in 1616, the company needed a replacement and turned to one of the most prolific and famous playwrights of the…
·technologyreview.com·
Machine learning has revealed exactly how much of a Shakespeare play was wr
A neural net solves the three-body problem 100 million times faster - MIT T
A neural net solves the three-body problem 100 million times faster - MIT T
In the 18th century, the great scientific challenge of the age was to find a way for mariners to determine their position at sea. One of the most successful solutions was to measure the position of the moon in the sky relative to the fixed background of stars. Because of parallax effects, this measurement depends…
·technologyreview.com·
A neural net solves the three-body problem 100 million times faster - MIT T