Automated Class Discovery and One-Shot Interactions for Acoustic Activity Recognition
Top Viewed Papers Referred by ArXiv | Semantic Scholar
Welcome to a curated collection of the most popular papers on Semantic Scholar, filtered through arXiv referrals to provide you with valuable insights into what truly piques researchers' interest.
Scene Synthesis from Human Motion
Large-scale capture of human motion with diverse, complex scenes, while immensely useful, is often considered prohibitively costly. Meanwhile, human motion alone contains rich information about the scene they reside in and interact with. For example, a sitting human suggests the existence of a chair, and their leg position further implies the chair's pose. In this paper, we propose to synthesize diverse, semantically reasonable, and physically plausible scenes based on human motion. Our framework, Scene Synthesis from HUMan MotiON (SUMMON), includes two steps. It first uses ContactFormer, our newly introduced contact predictor, to obtain temporally consistent contact labels from human motion. Based on these predictions, SUMMON then chooses interacting objects and optimizes physical plausibility losses; it further populates the scene with objects that do not interact with humans. Experimental results demonstrate that SUMMON synthesizes feasible, plausible, and diverse scenes and has the potential to generate extensive human-scene interaction data for the community.
Beyond Memorization: Violating Privacy Via Inference with Large...
Current privacy research on large language models (LLMs) primarily focuses on the issue of extracting memorized training data. At the same time, models' inference capabilities have increased...
Scalable Extraction of Training Data from (Production) Language Models
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization.
Edelman's Steps Toward a Conscious Artifact
In 2006, during a meeting of a working group of scientists in La Jolla, California at The Neurosciences Institute (NSI), Gerald Edelman described a roadmap towards the creation of a Conscious Artifact. As far as I know, this roadmap was not published. However, it did shape my thinking and that of many others in the years since that meeting. This short paper, which is based on my notes taken during the meeting, describes the key steps in this roadmap. I believe it is as groundbreaking today as it was more than 15 years ago.
A Survey of Graph Meets Large Language Model: Progress and Future Directions
Graph plays a significant role in representing and analyzing complex relationships in real-world applications such as citation networks, social networks, and biological data. Recently, Large Language Models (LLMs), which have achieved tremendous success in various domains, have also been leveraged in graph-related tasks to surpass traditional Graph Neural Networks (GNNs) based methods and yield state-of-the-art performance. In this survey, we first present a comprehensive review and analysis of existing methods that integrate LLMs with graphs. First of all, we propose a new taxonomy, which organizes existing methods into three categories based on the role (i.e., enhancer, predictor, and alignment component) played by LLMs in graph-related tasks. Then we systematically survey the representative methods along the three categories of the taxonomy. Finally, we discuss the remaining limitations of existing studies and highlight promising avenues for future research. The relevant papers are summarized and will be consistently updated at: https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks.
Experimental Investigations into Using Motion Capture State Feedback for Real-Time Control of a Humanoid Robot
Regardless of recent advances, humanoid robots still face significant difficulties in performing locomotion tasks. Among the key challenges that must be addressed to achieve robust bipedal locomotion are dynamically consistent motion planning, feedback control, and state estimation of such complex systems. In this paper, we investigate the use of an external motion capture system to provide state feedback to an online whole-body controller. We present experimental results with the humanoid robot RH5 performing two different whole-body motions: squatting with both feet in contact with the ground and balancing on one leg. We compare the execution of these motions using state feedback from (i) an external motion tracking system and (ii) an internal state estimator based on inertial measurement unit (IMU), forward kinematics, and contact sensing. It is shown that state-of-the-art motion capture systems can be successfully used in the high-frequency feedback control loop of humanoid robots, providing an alternative in cases where state estimation is not reliable.
ACS-Solutions GmbH. 2021. A1220 MONOLITH 3D. https://acs-... - Google Scholar
3-D Motion Capture of an Unmodified Drone with Single-chip Millimeter Wave Radar
Accurate motion capture of aerial robots in 3-D is a key enabler for autonomous operation in indoor environments such as warehouses or factories, as well as driving forward research in these areas. The most commonly used solutions at present are optical motion capture (e.g. VICON) and Ultrawideband (UWB), but these are costly and cumbersome to deploy, due to their requirement of multiple cameras/sensors spaced around the tracking area. They also require the drone to be modified to carry an active or passive marker. In this work, we present an inexpensive system that can be rapidly installed, based on single-chip millimeter wave (mmWave) radar. Importantly, the drone does not need to be modified or equipped with any markers, as we exploit the Doppler signals from the rotating propellers. Furthermore, 3-D tracking is possible from a single point, greatly simplifying deployment. We develop a novel deep neural network and demonstrate decimeter level 3-D tracking at 10Hz, achieving better performance than classical baselines. Our hope is that this low-cost system will act to catalyse inexpensive drone research and increased autonomy.
2307
null
Preventing Language Models From Hiding Their Reasoning
Large language models (LLMs) often benefit from intermediate steps of reasoning to generate answers to complex problems. When these intermediate steps of reasoning are used to monitor the activity of the model, it is essential that this explicit reasoning is faithful, i.e. that it reflects what the model is actually reasoning about. In this work, we focus on one potential way intermediate steps of reasoning could be unfaithful: encoded reasoning, where an LLM could encode intermediate steps of reasoning in the generated text in a way that is not understandable to human readers. We show that language models can be trained to make use of encoded reasoning to get higher performance without the user understanding the intermediate steps of reasoning. We argue that, as language models get stronger, this behavior becomes more likely to appear naturally. Finally, we describe a methodology that enables the evaluation of defenses against encoded reasoning, and show that, under the right conditions, paraphrasing successfully prevents even the best encoding schemes we built from encoding more than 3 bits of information per KB of text.
Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges
We present a brief history of the field of interpretable machine learning (IML), give an overview of state-of-the-art interpretation methods, and discuss challenges. Research in IML has boomed in recent years. As young as the field is, it has over 200 years old roots in regression modeling and rule-based machine learning, starting in the 1960s. Recently, many new IML methods have been proposed, many of them model-agnostic, but also interpretation techniques specific to deep learning and tree-based ensembles. IML methods either directly analyze model components, study sensitivity to input perturbations, or analyze local or global surrogate approximations of the ML model. The field approaches a state of readiness and stability, with many methods not only proposed in research, but also implemented in open-source software. But many important challenges remain for IML, such as dealing with dependent features, causal interpretation, and uncertainty estimation, which need to be resolved for its successful application to scientific problems. A further challenge is a missing rigorous definition of interpretability, which is accepted by the community. To address the challenges and advance the field, we urge to recall our roots of interpretable, data-driven modeling in statistics and (rule-based) ML, but also to consider other areas such as sensitivity analysis, causal inference, and the social sciences.
Quantum batteries -- The future of energy storage?. (arXiv:2310.13020v1 [quant-ph])
Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning
Activation functions (AFs) are crucial components of deep neural networks (DNNs), having a significant impact on their performance. An activation function in a DNN is typically a smooth, nonlinear function that transforms an input signal into an output signal for the subsequent layer. In this paper, we propose the Parametric Leaky Tanh (PLTanh), a novel hybrid activation function designed to combine the strengths of both the Tanh and Leaky ReLU (LReLU) activation functions. PLTanh is differentiable at all points and addresses the 'dying ReLU' problem by ensuring a non-zero gradient for negative inputs, consistent with the behavior of LReLU. By integrating the unique advantages of these two diverse activation functions, PLTanh facilitates the learning of more intricate nonlinear relationships within the network. This paper presents an empirical evaluation of PLTanh against established activation functions, namely ReLU, LReLU, and ALReLU utilizing five diverse datasets.
Parametric Leaky Tanh: A New Hybrid Activation Function for Deep Learning
Long-term and Real-time High-speed Underwater Wireless Optical Communications in Deep Sea
Seafloor observation network can perform all-weather, long-term, continuous, real-time, and in-situ observation of the ocean by combing various observation methods including cabled seafloor nodes, self-contained nodes, as well as mobile platforms, where reliable and long-term high-speed underwater wireless communication becomes an essential demand. Recently, underwater wireless optical communication (UWOC) has emerged as a highly promising solution and is rapidly becoming a research hotspot for meeting this requirement. In this article, we demonstrate the experiment and application of high-speed UWOC system for deep sea seafloor observation network. To the best of our knowledge this is the first long-term real-time deep-sea UWOC link with bitrate as high as 125 Mbps. Between 30 m distance and at a depth of 1650 m, two-way Ethernet UWOC links are realized with 125 Mbps direction-adjustable green light link and 6.25 Mbps non-line-of-sight (NLOS) blue light link. High quality video transmission of 8K 30 FPS and 4K 120 FPS are realized through high-speed 125 Mbps green light link, with 100% peak signal-to-noise ratio (PSNR) agreement, showing the capability of transmitting high-quality videos lossless. The 30-day long-term measurement results show that the BER performance of both 125 Mbps and 6.25 Mbps links is lower than 10-5, proving the stability and reliability of this UWOC system at depth of 1650 m. The maximum transmission distance for the green and blue light links are estimated to be 117.7 and 128.3 m with considering the geometry loss, which can be extended to 231.6 and 337.5 m without geometry loss. As the first long-term and real-time UWOC system in deep sea, we believe this demonstration can provide valuable experience for further UWOC studies and converged ocean observation networking with cabled and cable-less observation platforms.
Theoretical Proposal for Dynamic Attention Sinks in Streaming Large Language Models
Note: This proposal is theoretical and aims to extend existing concepts for further research and validation.
Bark: Text-to-Speech AI Voice Cloning & Text-Prompted Generative Audio
Bark is a revolutionary text-to-audio model created by Suno, based on the GPT-style models, which can generate highly realistic…
The strain on scientific publishing
Scientists are increasingly overwhelmed by the volume of articles being published. Total articles indexed in Scopus and Web of Science have grown exponentially in recent years; in 2022 the article total was 47% higher than in 2016, which has outpaced the limited growth, if any, in the number of practising scientists. Thus, publication workload per scientist (writing, reviewing, editing) has increased dramatically. We define this problem as the strain on scientific publishing. To analyse this strain, we present five data-driven metrics showing publisher growth, processing times, and citation behaviours. We draw these data from web scrapes, requests for data from publishers, and material that is freely available through publisher websites. Our findings are based on millions of papers produced by leading academic publishers. We find specific groups have disproportionately grown in their articles published per year, contributing to this strain. Some publishers enabled this growth by adopting a strategy of hosting special issues, which publish articles with reduced turnaround times. Given pressures on researchers to publish or perish to be competitive for funding applications, this strain was likely amplified by these offers to publish more articles. We also observed widespread year-over-year inflation of journal impact factors coinciding with this strain, which risks confusing quality signals. Such exponential growth cannot be sustained. The metrics we define here should enable this evolving conversation to reach actionable solutions to address the strain on scientific publishing.
Efficient Streaming Language Models with Attention Sinks
Deploying Large Language Models (LLMs) in streaming applications such as multi-round dialogue, where long interactions are expected, is urgently needed but poses two major challenges. Firstly,...
Taken out of context: On measuring situational awareness in LLMs
We aim to better understand the emergence of `situational awareness' in large language models (LLMs). A model is situationally aware if it's aware that it's a model and can recognize whether it's currently in testing or deployment. Today's LLMs are tested for safety and alignment before they are deployed. An LLM could exploit situational awareness to achieve a high score on safety tests, while taking harmful actions after deployment. Situational awareness may emerge unexpectedly as a byproduct of model scaling. One way to better foresee this emergence is to run scaling experiments on abilities necessary for situational awareness. As such an ability, we propose `out-of-context reasoning' (in contrast to in-context learning). We study out-of-context reasoning experimentally. First, we finetune an LLM on a description of a test while providing no examples or demonstrations. At test time, we assess whether the model can pass the test. To our surprise, we find that LLMs succeed on this out-of-context reasoning task. Their success is sensitive to the training setup and only works when we apply data augmentation. For both GPT-3 and LLaMA-1, performance improves with model size. These findings offer a foundation for further empirical study, towards predicting and potentially controlling the emergence of situational awareness in LLMs. Code is available at: https://github.com/AsaCooperStickland/situational-awareness-evals.
Large-Scale Automatic Audiobook Creation
An audiobook can dramatically improve a work of literature's accessibility
and improve reader engagement. However, audiobooks can take hundreds of hours
of human effort to create, edit, and publish. In this work, we present a system
that can automatically generate high-quality audiobooks from online e-books. In
particular, we leverage recent advances in neural text-to-speech to create and
release thousands of human-quality, open-license audiobooks from the Project
Gutenberg e-book collection. Our method can identify the proper subset of
e-book content to read for a wide collection of diversely structured books and
can operate on hundreds of books in parallel. Our system allows users to
customize an audiobook's speaking speed and style, emotional intonation, and
can even match a desired voice using a small amount of sample audio. This work
contributed over five thousand open-license audiobooks and an interactive demo
that allows users to quickly create their own customized audiobooks. To listen
to the audiobook collection visit \url{https://aka.ms/audiobook}.
Paper page - Large-Scale Automatic Audiobook Creation
Join the discussion on this paper page
Attention Is All You Need
https://arxiv.org/abs/1706.03762
Abstract:
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Authors:
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, Illia Polosukhin
Toolformer: Language Models Can Teach Themselves to Use Tools
Language models (LMs) exhibit remarkable abilities to solve new tasks from
just a few examples or textual instructions, especially at scale. They also,
paradoxically, struggle with basic functionality, such as arithmetic or factual
lookup, where much simpler and smaller models excel. In this paper, we show
that LMs can teach themselves to use external tools via simple APIs and achieve
the best of both worlds. We introduce Toolformer, a model trained to decide
which APIs to call, when to call them, what arguments to pass, and how to best
incorporate the results into future token prediction. This is done in a
self-supervised way, requiring nothing more than a handful of demonstrations
for each API. We incorporate a range of tools, including a calculator, a Q\&A
system, two different search engines, a translation system, and a calendar.
Toolformer achieves substantially improved zero-shot performance across a
variety of downstream tasks, often competitive with much larger models, without
sacrificing its core language modeling abilities.
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Despite the advancements of open-source large language models (LLMs) and
their variants, e.g., LLaMA and Vicuna, they remain significantly limited in
performing higher-level tasks, such as following human instructions to use
external tools (APIs). This is because current instruction tuning largely
focuses on basic language tasks instead of the tool-use domain. This is in
contrast to state-of-the-art (SOTA) LLMs, e.g., ChatGPT, which have
demonstrated excellent tool-use capabilities but are unfortunately closed
source. To facilitate tool-use capabilities within open-source LLMs, we
introduce ToolLLM, a general tool-use framework of data construction, model
training and evaluation. We first present ToolBench, an instruction-tuning
dataset for tool use, which is created automatically using ChatGPT.
Specifically, we collect 16,464 real-world RESTful APIs spanning 49 categories
from RapidAPI Hub, then prompt ChatGPT to generate diverse human instructions
involving these APIs, covering both single-tool and multi-tool scenarios.
Finally, we use ChatGPT to search for a valid solution path (chain of API
calls) for each instruction. To make the searching process more efficient, we
develop a novel depth-first search-based decision tree (DFSDT), enabling LLMs
to evaluate multiple reasoning traces and expand the search space. We show that
DFSDT significantly enhances the planning and reasoning capabilities of LLMs.
For efficient tool-use assessment, we develop an automatic evaluator: ToolEval.
We fine-tune LLaMA on ToolBench and obtain ToolLLaMA. Our ToolEval reveals that
ToolLLaMA demonstrates a remarkable ability to execute complex instructions and
generalize to unseen APIs, and exhibits comparable performance to ChatGPT. To
make the pipeline more practical, we devise a neural API retriever to recommend
appropriate APIs for each instruction, negating the need for manual API
selection.
RTFM: Generalising to Novel Environment Dynamics via Reading
Obtaining policies that can generalise to new environments in reinforcement
learning is challenging. In this work, we demonstrate that language
understanding via a reading policy learner is a promising vehicle for
generalisation to new environments. We propose a grounded policy learning
problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason
over a language goal, relevant dynamics described in a document, and
environment observations. We procedurally generate environment dynamics and
corresponding language descriptions of the dynamics, such that agents must read
to understand new environment dynamics instead of memorising any particular
information. In addition, we propose txt2$π$, a model that captures three-way
interactions between the goal, document, and observations. On RTFM, txt2$π$
generalises to new environments with dynamics not seen during training via
reading. Furthermore, our model outperforms baselines such as FiLM and
language-conditioned CNNs on RTFM. Through curriculum learning, txt2$π$
produces policies that excel on complex RTFM tasks requiring several reasoning
and coreference steps.
RTFM: Generalising to Novel Environment Dynamics via Reading
Yoasi, a Generative LLM Based Universe Builder
Introduction I have always been drawn to the idea of building fantastic worlds with fractal complexity; the more you look, the more you see. Witnessing the advancement of LLM capabilities and reading papers like "Simulacra of Human Behavior"[1] truly piqued my interest in exploring the idea of creating generative worlds.
[SIGGRAPH 2023] DreamFace: Progressive Generation of Animatable 3D Faces under Text Guidance
Project: https://sites.google.com/view/dreamfaceArxiv: https://arxiv.org/pdf/2304.03117.pdfWeb demo: https://hyperhuman.deemos.comHuggingFace: https://huggin...
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
We introduce Graph of Thoughts (GoT): a framework that advances prompting
capabilities in large language models (LLMs) beyond those offered by paradigms
such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary
advantage of GoT is the ability to model the information generated by an LLM as
an arbitrary graph, where units of information ("LLM thoughts") are vertices,
and edges correspond to dependencies between these vertices. This approach
enables combining arbitrary LLM thoughts into synergistic outcomes, distilling
the essence of whole networks of thoughts, or enhancing thoughts using feedback
loops. We illustrate that GoT offers advantages over state of the art on
different tasks, for example increasing the quality of sorting by 62% over ToT,
while simultaneously reducing costs by 31%. We ensure that GoT is extensible
with new thought transformations and thus can be used to spearhead new
prompting schemes. This work brings the LLM reasoning closer to human thinking
or brain mechanisms such as recurrence, both of which form complex networks.