Found 138 bookmarks
Custom sorting
Scalable Extraction of Training Data from (Production) Language Models
Scalable Extraction of Training Data from (Production) Language Models
This paper studies extractable memorization: training data that an adversary can efficiently extract by querying a machine learning model without prior knowledge of the training dataset. We show an adversary can extract gigabytes of training data from open-source language models like Pythia or GPT-Neo, semi-open models like LLaMA or Falcon, and closed models like ChatGPT. Existing techniques from the literature suffice to attack unaligned models; in order to attack the aligned ChatGPT, we develop a new divergence attack that causes the model to diverge from its chatbot-style generations and emit training data at a rate 150x higher than when behaving properly. Our methods show practical attacks can recover far more data than previously thought, and reveal that current alignment techniques do not eliminate memorization.
·arxiv.org·
Scalable Extraction of Training Data from (Production) Language Models
Edelman's Steps Toward a Conscious Artifact
Edelman's Steps Toward a Conscious Artifact
In 2006, during a meeting of a working group of scientists in La Jolla, California at The Neurosciences Institute (NSI), Gerald Edelman described a roadmap towards the creation of a Conscious Artifact. As far as I know, this roadmap was not published. However, it did shape my thinking and that of many others in the years since that meeting. This short paper, which is based on my notes taken during the meeting, describes the key steps in this roadmap. I believe it is as groundbreaking today as it was more than 15 years ago.
·arxiv.org·
Edelman's Steps Toward a Conscious Artifact
A Survey of Graph Meets Large Language Model: Progress and Future Directions
A Survey of Graph Meets Large Language Model: Progress and Future Directions
Graph plays a significant role in representing and analyzing complex relationships in real-world applications such as citation networks, social networks, and biological data. Recently, Large Language Models (LLMs), which have achieved tremendous success in various domains, have also been leveraged in graph-related tasks to surpass traditional Graph Neural Networks (GNNs) based methods and yield state-of-the-art performance. In this survey, we first present a comprehensive review and analysis of existing methods that integrate LLMs with graphs. First of all, we propose a new taxonomy, which organizes existing methods into three categories based on the role (i.e., enhancer, predictor, and alignment component) played by LLMs in graph-related tasks. Then we systematically survey the representative methods along the three categories of the taxonomy. Finally, we discuss the remaining limitations of existing studies and highlight promising avenues for future research. The relevant papers are summarized and will be consistently updated at: https://github.com/yhLeeee/Awesome-LLMs-in-Graph-tasks.
·arxiv.org·
A Survey of Graph Meets Large Language Model: Progress and Future Directions
Preventing Language Models From Hiding Their Reasoning
Preventing Language Models From Hiding Their Reasoning
Large language models (LLMs) often benefit from intermediate steps of reasoning to generate answers to complex problems. When these intermediate steps of reasoning are used to monitor the activity of the model, it is essential that this explicit reasoning is faithful, i.e. that it reflects what the model is actually reasoning about. In this work, we focus on one potential way intermediate steps of reasoning could be unfaithful: encoded reasoning, where an LLM could encode intermediate steps of reasoning in the generated text in a way that is not understandable to human readers. We show that language models can be trained to make use of encoded reasoning to get higher performance without the user understanding the intermediate steps of reasoning. We argue that, as language models get stronger, this behavior becomes more likely to appear naturally. Finally, we describe a methodology that enables the evaluation of defenses against encoded reasoning, and show that, under the right conditions, paraphrasing successfully prevents even the best encoding schemes we built from encoding more than 3 bits of information per KB of text.
·arxiv.org·
Preventing Language Models From Hiding Their Reasoning
Experimental Investigations into Using Motion Capture State Feedback for Real-Time Control of a Humanoid Robot
Experimental Investigations into Using Motion Capture State Feedback for Real-Time Control of a Humanoid Robot
Regardless of recent advances, humanoid robots still face significant difficulties in performing locomotion tasks. Among the key challenges that must be addressed to achieve robust bipedal locomotion are dynamically consistent motion planning, feedback control, and state estimation of such complex systems. In this paper, we investigate the use of an external motion capture system to provide state feedback to an online whole-body controller. We present experimental results with the humanoid robot RH5 performing two different whole-body motions: squatting with both feet in contact with the ground and balancing on one leg. We compare the execution of these motions using state feedback from (i) an external motion tracking system and (ii) an internal state estimator based on inertial measurement unit (IMU), forward kinematics, and contact sensing. It is shown that state-of-the-art motion capture systems can be successfully used in the high-frequency feedback control loop of humanoid robots, providing an alternative in cases where state estimation is not reliable.
·mdpi.com·
Experimental Investigations into Using Motion Capture State Feedback for Real-Time Control of a Humanoid Robot
3-D Motion Capture of an Unmodified Drone with Single-chip Millimeter Wave Radar
3-D Motion Capture of an Unmodified Drone with Single-chip Millimeter Wave Radar
Accurate motion capture of aerial robots in 3-D is a key enabler for autonomous operation in indoor environments such as warehouses or factories, as well as driving forward research in these areas. The most commonly used solutions at present are optical motion capture (e.g. VICON) and Ultrawideband (UWB), but these are costly and cumbersome to deploy, due to their requirement of multiple cameras/sensors spaced around the tracking area. They also require the drone to be modified to carry an active or passive marker. In this work, we present an inexpensive system that can be rapidly installed, based on single-chip millimeter wave (mmWave) radar. Importantly, the drone does not need to be modified or equipped with any markers, as we exploit the Doppler signals from the rotating propellers. Furthermore, 3-D tracking is possible from a single point, greatly simplifying deployment. We develop a novel deep neural network and demonstrate decimeter level 3-D tracking at 10Hz, achieving better performance than classical baselines. Our hope is that this low-cost system will act to catalyse inexpensive drone research and increased autonomy.
·arxiv.org·
3-D Motion Capture of an Unmodified Drone with Single-chip Millimeter Wave Radar
Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges
Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges
We present a brief history of the field of interpretable machine learning (IML), give an overview of state-of-the-art interpretation methods, and discuss challenges. Research in IML has boomed in recent years. As young as the field is, it has over 200 years old roots in regression modeling and rule-based machine learning, starting in the 1960s. Recently, many new IML methods have been proposed, many of them model-agnostic, but also interpretation techniques specific to deep learning and tree-based ensembles. IML methods either directly analyze model components, study sensitivity to input perturbations, or analyze local or global surrogate approximations of the ML model. The field approaches a state of readiness and stability, with many methods not only proposed in research, but also implemented in open-source software. But many important challenges remain for IML, such as dealing with dependent features, causal interpretation, and uncertainty estimation, which need to be resolved for its successful application to scientific problems. A further challenge is a missing rigorous definition of interpretability, which is accepted by the community. To address the challenges and advance the field, we urge to recall our roots of interpretable, data-driven modeling in statistics and (rule-based) ML, but also to consider other areas such as sensitivity analysis, causal inference, and the social sciences.
·arxiv.org·
Interpretable Machine Learning -- A Brief History, State-of-the-Art and Challenges
Taken out of context: On measuring situational awareness in LLMs
Taken out of context: On measuring situational awareness in LLMs
We aim to better understand the emergence of `situational awareness' in large language models (LLMs). A model is situationally aware if it's aware that it's a model and can recognize whether it's currently in testing or deployment. Today's LLMs are tested for safety and alignment before they are deployed. An LLM could exploit situational awareness to achieve a high score on safety tests, while taking harmful actions after deployment. Situational awareness may emerge unexpectedly as a byproduct of model scaling. One way to better foresee this emergence is to run scaling experiments on abilities necessary for situational awareness. As such an ability, we propose `out-of-context reasoning' (in contrast to in-context learning). We study out-of-context reasoning experimentally. First, we finetune an LLM on a description of a test while providing no examples or demonstrations. At test time, we assess whether the model can pass the test. To our surprise, we find that LLMs succeed on this out-of-context reasoning task. Their success is sensitive to the training setup and only works when we apply data augmentation. For both GPT-3 and LLaMA-1, performance improves with model size. These findings offer a foundation for further empirical study, towards predicting and potentially controlling the emergence of situational awareness in LLMs. Code is available at: https://github.com/AsaCooperStickland/situational-awareness-evals.
·arxiv.org·
Taken out of context: On measuring situational awareness in LLMs
Large-Scale Automatic Audiobook Creation
Large-Scale Automatic Audiobook Creation
An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. Our method can identify the proper subset of e-book content to read for a wide collection of diversely structured books and can operate on hundreds of books in parallel. Our system allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio. This work contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection visit \url{https://aka.ms/audiobook}.
·arxiv.org·
Large-Scale Automatic Audiobook Creation
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools
Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
·arxiv.org·
Toolformer: Language Models Can Teach Themselves to Use Tools
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Despite the advancements of open-source large language models (LLMs) and their variants, e.g., LLaMA and Vicuna, they remain significantly limited in performing higher-level tasks, such as following human instructions to use external tools (APIs). This is because current instruction tuning largely focuses on basic language tasks instead of the tool-use domain. This is in contrast to state-of-the-art (SOTA) LLMs, e.g., ChatGPT, which have demonstrated excellent tool-use capabilities but are unfortunately closed source. To facilitate tool-use capabilities within open-source LLMs, we introduce ToolLLM, a general tool-use framework of data construction, model training and evaluation. We first present ToolBench, an instruction-tuning dataset for tool use, which is created automatically using ChatGPT. Specifically, we collect 16,464 real-world RESTful APIs spanning 49 categories from RapidAPI Hub, then prompt ChatGPT to generate diverse human instructions involving these APIs, covering both single-tool and multi-tool scenarios. Finally, we use ChatGPT to search for a valid solution path (chain of API calls) for each instruction. To make the searching process more efficient, we develop a novel depth-first search-based decision tree (DFSDT), enabling LLMs to evaluate multiple reasoning traces and expand the search space. We show that DFSDT significantly enhances the planning and reasoning capabilities of LLMs. For efficient tool-use assessment, we develop an automatic evaluator: ToolEval. We fine-tune LLaMA on ToolBench and obtain ToolLLaMA. Our ToolEval reveals that ToolLLaMA demonstrates a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT. To make the pipeline more practical, we devise a neural API retriever to recommend appropriate APIs for each instruction, negating the need for manual API selection.
·arxiv.org·
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
RTFM: Generalising to Novel Environment Dynamics via Reading
RTFM: Generalising to Novel Environment Dynamics via Reading
Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2$π$, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2$π$ generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2$π$ produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps.
RTFM: Generalising to Novel Environment Dynamics via Reading
·arxiv.org·
RTFM: Generalising to Novel Environment Dynamics via Reading
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by 31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks.
·arxiv.org·
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Language-Conditioned Path Planning
Language-Conditioned Path Planning
Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.
·arxiv.org·
Language-Conditioned Path Planning
Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model
Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model
In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods have facilitated the precise prediction of protein structure and function by leveraging co-evolution information from homologous proteins. Despite these advances, predicting the conformation of antibodies remains challenging due to their unique evolution and the high flexibility of their antigen-binding regions. Here, to address this challenge, we present the Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% non-redundant unlabeled antibody sequences, capturing both unique and conserved properties specific to antibodies. Notably, BALM showcases exceptional performance across four antigen-binding prediction tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM, capable of swiftly predicting full atomic antibody structures from individual sequences. Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold, and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials.
·arxiv.org·
Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model
Itinerant Quantum Integers: The Language of Quantum Computers
Itinerant Quantum Integers: The Language of Quantum Computers
The concept of positively and negatively compatible null vectors arises in the study of Clifford geometric algebras with a Lorentz-Minkowski metric. In previous works, the basic properties of such algebras have been set down in terms of a new principle of quantum duality. In the present work, the same structure is studied in terms of real and complex quantum integers, which generalize the real and complex number systems. It seems natural to identify a qubit as a pair of compatible null vectors; the up state of the qubit being their sum, and the down state being their difference. Basic identities are developed to make calculations routine, and two different representations of the symmetric group are given.
·arxiv.org·
Itinerant Quantum Integers: The Language of Quantum Computers
Predicting research trends with semantic and neural networks with an application in quantum physics | Proceedings of the National Academy of Sciences
Predicting research trends with semantic and neural networks with an application in quantum physics | Proceedings of the National Academy of Sciences
The vast and growing number of publications in all disciplines of science cannot be comprehended by a single human researcher. As a consequence, re...
·pnas.org·
Predicting research trends with semantic and neural networks with an application in quantum physics | Proceedings of the National Academy of Sciences
The First Room-Temperature Ambient-Pressure Superconductor
The First Room-Temperature Ambient-Pressure Superconductor
For the first time in the world, we succeeded in synthesizing the room-temperature superconductor ($T_c \ge 400$ K, 127$^\circ$C) working at ambient pressure with a modified lead-apatite (LK-99) structure. The superconductivity of LK-99 is proved with the Critical temperature ($T_c$), Zero-resistivity, Critical current ($I_c$), Critical magnetic field ($H_c$), and the Meissner effect. The superconductivity of LK-99 originates from minute structural distortion by a slight volume shrinkage (0.48 %), not by external factors such as temperature and pressure. The shrinkage is caused by Cu$^{2+}$ substitution of Pb$^{2+}$(2) ions in the insulating network of Pb(2)-phosphate and it generates the stress. It concurrently transfers to Pb(1) of the cylindrical column resulting in distortion of the cylindrical column interface, which creates superconducting quantum wells (SQWs) in the interface. The heat capacity results indicated that the new model is suitable for explaining the superconductivity of LK-99. The unique structure of LK-99 that allows the minute distorted structure to be maintained in the interfaces is the most important factor that LK-99 maintains and exhibits superconductivity at room temperatures and ambient pressure.
The First Room-Temperature Ambient-Pressure Superconductor
·arxiv.org·
The First Room-Temperature Ambient-Pressure Superconductor
Quantum compression with classically simulatable circuits
Quantum compression with classically simulatable circuits
As we continue to find applications where the currently available noisy devices exhibit an advantage over their classical counterparts, the efficient use of quantum resources is highly desirable. The notion of quantum autoencoders was proposed as a way for the compression of quantum information to reduce resource requirements. Here, we present a strategy to design quantum autoencoders using evolutionary algorithms for transforming quantum information into lower-dimensional representations. We successfully demonstrate the initial applications of the algorithm for compressing different families of quantum states. In particular, we point out that using a restricted gate set in the algorithm allows for efficient simulation of the generated circuits. This approach opens the possibility of using classical logic to find low representations of quantum data, using fewer computational resources.
·arxiv.org·
Quantum compression with classically simulatable circuits
The Most ACCURATE ChatGPT Prompt Engineering Technique (new method) | Tree Of Thoughts
The Most ACCURATE ChatGPT Prompt Engineering Technique (new method) | Tree Of Thoughts
ChatGPT Prompt engineering. In this video, I got over the new Tree of Thoughts GPT prompt engineering technique. It has been shown to increase the accuracy of GPT results from 4% to 74%. I hope you enjoy the video. Tools used: ChatGPT Midjourney Canva Prompt: Three experts with exceptional logical thinking skills are collaboratively answering a question using a tree of thoughts method. Each expert will share their thought process in detail, taking into account the previous thoughts of others and admitting any errors. They will iteratively refine and expand upon each other's ideas, giving credit where it's due. The process continues until a conclusive answer is found. Organize the entire response in a markdown table format. The question is... If you enjoyed this, smash that like and subscribe button! Links: https://github.com/dave1010/tree-of-thought-prompting https://arxiv.org/abs/2305.10601 Timestamps: 0:00 Intro 0:30 Bonus 0:41 Tree Of Thoughts 1:03 Input Output 1:23 Chain-of-thought 2:00 TOT Explanation 2:45 Diagram explanation 5:00 GPT mistake 5:15 TOT Prompt 6:15 TOT results 7:00 How to learn anything 8:00 Outro #ai #chatgpt #chatgpt4 #openai #promptengineering #treeofthoughts
·youtube.com·
The Most ACCURATE ChatGPT Prompt Engineering Technique (new method) | Tree Of Thoughts
Stable Diffusion Is Getting Outrageously Good!
Stable Diffusion Is Getting Outrageously Good!
❤️ Check out Fully Connected by Weights & Biases: https://wandb.me/papers W&B+Stable Diffusion: https://wandb.ai/capecape/stable_diffusions/reports/Speed-Up-Stable-Diffusion-on-Your-M1Pro-Macbook-Pro--VmlldzoyNjY0ODYz 📝 The paper "High-Resolution Image Synthesis with Latent Diffusion Models" is available here: https://arxiv.org/abs/2112.10752 Try it: Web 1: https://huggingface.co/spaces/stabilityai/stable-diffusion Web 2: https://beta.dreamstudio.ai/generate Web 3 (also Stable Diffusion XL!): https://clipdrop.co/stable-diffusion Web 4 (notebooks): https://github.com/TheLastBen/fast-stable-diffusion Guide: https://stable-diffusion-art.com/know-these-important-parameters-for-stunning-ai-images/#Sampling_methods Draw Things app: https://drawthings.ai/ Stable Diffusion Web UI: https://github.com/AUTOMATIC1111/stable-diffusion-webui Photoshop integration: http://stable.art Sources: Video https://twitter.com/dreamwieber/status/1618453304970997762 Photorealistic image: https://twitter.com/DiffusionPics/status/1619444407937241089 Realistic vision: https://civitai.com/models/4201?modelVersionId=29461 Infinite zoom: https://twitter.com/hardmaru/status/1612134809924685825 Tiled texture: https://stackoverflow.com/questions/24319825/texture-tiling-with-continuous-random-offset Stable.art (Photoshop): https://github.com/isekaidev/stable.art Wand - drawing: https://twitter.com/wand_app/status/1604186054923210752 Texturing: https://twitter.com/CarsonKatri/status/1600248599254007810 + https://twitter.com/CarsonKatri/status/1603419328019169280 AR + assistant: https://twitter.com/StrangeNative/status/1569700294673702912 Metahumans: https://twitter.com/CoffeeVectors/status/1569416470332858372 My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Martin, Matthew Valle, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/
·youtube.com·
Stable Diffusion Is Getting Outrageously Good!
Enter PaLM 2: Full Breakdown (92 Pages Read + Gemini Before GPT 5?)
Enter PaLM 2: Full Breakdown (92 Pages Read + Gemini Before GPT 5?)
Google puts it foot on the accelerator, casting aside safety concerns to not only release a GPT 4 -competitive model, PaLM 2, but also announce that they are already training Gemini, a GPT 5 competitor [likely on TPU v5 chips]. This is truly a major day in AI history, and I try to cover it all. I'll show the benchmarks in which PaLM (which now powers Bard) beats GPT 4, and detail how they use SmartGPT-like techniques to boost performance. Crazily enough, PaLM 2 beats even Google Translate, due in large part to the text it was trained on. We'll talk coding in Bard, translation, MMLU, Big Bench, and much more. I'll end on the Universal Translator deepfakes and the underwhelming results from Sundar Pichai and Sam Altman's trip to the White House and what Hinton says about it all. On a more positive note, I cover Med PaLM 2, which could genuinely save thousands of lives. PaLM 2 Technical Report: https://ai.google/static/documents/palm2techreport.pdf Release Notes Google Blog: https://blog.google/technology/ai/google-palm-2-ai-large-language-model/ Bard Access: https://bard.google.com/ Scaling Transformer to 1M tokens: https://arxiv.org/pdf/2304.11062.pdf GPT 4 Technical Report: https://arxiv.org/pdf/2303.08774.pdf Bard Languages: https://support.google.com/bard/answer/13575153?hl=en Self Consistency Paper: https://arxiv.org/pdf/2203.11171.pdf Are Emergent Abilities a Mirage: https://arxiv.org/pdf/2304.15004.pdf Sparks of AGI Paper: https://arxiv.org/pdf/2303.12712.pdf Big Bench Hard: https://github.com/suzgunmirac/BIG-Bench-Hard Google Keynote: https://www.youtube.com/watch?v=cNfINi5CNbY Gemini: https://www.youtube.com/watch?v=1UvUjTaJRz0 Med PaLM 2: https://www.youtube.com/watch?v=k_-Z_TkHMqA TPU v5: https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html Hinton Warning: https://www.youtube.com/watch?v=FAbsoxQtUwM White House Readout: https://www.whitehouse.gov/briefing-room/statements-releases/2023/05/04/readout-of-white-house-meeting-with-ceos-on-advancing-responsible-artificial-intelligence-innovation/ https://www.patreon.com/AIExplained
·youtube.com·
Enter PaLM 2: Full Breakdown (92 Pages Read + Gemini Before GPT 5?)
Special Issue: AI and Future of Futures
Special Issue: AI and Future of Futures
AI and Future of FuturesCall for Papers This special issue looks at AI and Futures in light of transformative developments in the fields broadly described as artificial intelligence (e.g., machine learning, natural language processing, Large Language Models, or expert systems). How will AI help reimagine the practice of Futures? How would it shape the ways in which communities imagine and build Futures?  How might Futures evolve in an AI dominated environment? How might this emergence of knowledge shape the body of knowledge? Important Dates: Extended Abstract submission deadline: August 30, 2023. Notification of acceptance for submission to JFS: November 30, 2023. Final paper submission to JFS: February 29, 2024
·jfsdigital.org·
Special Issue: AI and Future of Futures
Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas
Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas
The Vlasov-Poisson system is employed in its reduced form version (1D1V) as a test bed for the applicability of Physics Informed Neural Network (PINN) to the wave-particle resonance. Two examples are explored: the Landau damping and the bump-on-tail instability. PINN is first tested as a compression method for the solution of the Vlasov-Poisson system and compared to the standard neural networks. Second, the application of PINN to solving the Vlasov-Poisson system is also presented with the special emphasis on the integral part, which motivates the implementation of a PINN variant, called Integrable PINN (I-PINN), based on the automatic-differentiation to solve the partial differential equation and on the automatic-integration to solve the integral equation.
·arxiv.org·
Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas
RemovalNet: DNN Fingerprint Removal Attacks
RemovalNet: DNN Fingerprint Removal Attacks
With the performance of deep neural networks (DNNs) remarkably improving, DNNs have been widely used in many areas. Consequently, the DNN model has become a valuable asset, and its intellectual property is safeguarded by ownership verification techniques (e.g., DNN fingerprinting). However, the feasibility of the DNN fingerprint removal attack and its potential influence remains an open problem. In this paper, we perform the first comprehensive investigation of DNN fingerprint removal attacks. Generally, the knowledge contained in a DNN model can be categorized into general semantic and fingerprint-specific knowledge. To this end, we propose a min-max bilevel optimization-based DNN fingerprint removal attack named RemovalNet, to evade model ownership verification. The lower-level optimization is designed to remove fingerprint-specific knowledge. While in the upper-level optimization, we distill the victim model's general semantic knowledge to maintain the surrogate model's performance. We conduct extensive experiments to evaluate the fidelity, effectiveness, and efficiency of the RemovalNet against four advanced defense methods on six metrics. The empirical results demonstrate that (1) the RemovalNet is effective. After our DNN fingerprint removal attack, the model distance between the target and surrogate models is x100 times higher than that of the baseline attacks, (2) the RemovalNet is efficient. It uses only 0.2% (400 samples) of the substitute dataset and 1,000 iterations to conduct our attack. Besides, compared with advanced model stealing attacks, the RemovalNet saves nearly 85% of computational resources at most, (3) the RemovalNet achieves high fidelity that the created surrogate model maintains high accuracy after the DNN fingerprint removal process. Our code is available at: https://github.com/grasses/RemovalNet.
·arxiv.org·
RemovalNet: DNN Fingerprint Removal Attacks
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.
·arxiv.org·
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Algebraic Topology for Data Scientists
Algebraic Topology for Data Scientists
This book gives a thorough introduction to topological data analysis (TDA), the application of algebraic topology to data science. Algebraic topology is traditionally a very specialized field of math, and most mathematicians have never been exposed to it, let alone data scientists, computer scientists, and analysts. I have three goals in writing this book. The first is to bring people up to speed who are missing a lot of the necessary background. I will describe the topics in point-set topology, abstract algebra, and homology theory needed for a good understanding of TDA. The second is to explain TDA and some current applications and techniques. Finally, I would like to answer some questions about more advanced topics such as cohomology, homotopy, obstruction theory, and Steenrod squares, and what they can tell us about data. It is hoped that readers will acquire the tools to start to think about these topics and where they might fit in.
·arxiv.org·
Algebraic Topology for Data Scientists