arxiv.org

arxiv.org

187 bookmarks
Custom sorting
Large-Scale Automatic Audiobook Creation
Large-Scale Automatic Audiobook Creation
An audiobook can dramatically improve a work of literature's accessibility and improve reader engagement. However, audiobooks can take hundreds of hours of human effort to create, edit, and publish. In this work, we present a system that can automatically generate high-quality audiobooks from online e-books. In particular, we leverage recent advances in neural text-to-speech to create and release thousands of human-quality, open-license audiobooks from the Project Gutenberg e-book collection. Our method can identify the proper subset of e-book content to read for a wide collection of diversely structured books and can operate on hundreds of books in parallel. Our system allows users to customize an audiobook's speaking speed and style, emotional intonation, and can even match a desired voice using a small amount of sample audio. This work contributed over five thousand open-license audiobooks and an interactive demo that allows users to quickly create their own customized audiobooks. To listen to the audiobook collection visit \url{https://aka.ms/audiobook}.
·arxiv.org·
Large-Scale Automatic Audiobook Creation
Toolformer: Language Models Can Teach Themselves to Use Tools
Toolformer: Language Models Can Teach Themselves to Use Tools
Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
·arxiv.org·
Toolformer: Language Models Can Teach Themselves to Use Tools
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
Despite the advancements of open-source large language models (LLMs) and their variants, e.g., LLaMA and Vicuna, they remain significantly limited in performing higher-level tasks, such as following human instructions to use external tools (APIs). This is because current instruction tuning largely focuses on basic language tasks instead of the tool-use domain. This is in contrast to state-of-the-art (SOTA) LLMs, e.g., ChatGPT, which have demonstrated excellent tool-use capabilities but are unfortunately closed source. To facilitate tool-use capabilities within open-source LLMs, we introduce ToolLLM, a general tool-use framework of data construction, model training and evaluation. We first present ToolBench, an instruction-tuning dataset for tool use, which is created automatically using ChatGPT. Specifically, we collect 16,464 real-world RESTful APIs spanning 49 categories from RapidAPI Hub, then prompt ChatGPT to generate diverse human instructions involving these APIs, covering both single-tool and multi-tool scenarios. Finally, we use ChatGPT to search for a valid solution path (chain of API calls) for each instruction. To make the searching process more efficient, we develop a novel depth-first search-based decision tree (DFSDT), enabling LLMs to evaluate multiple reasoning traces and expand the search space. We show that DFSDT significantly enhances the planning and reasoning capabilities of LLMs. For efficient tool-use assessment, we develop an automatic evaluator: ToolEval. We fine-tune LLaMA on ToolBench and obtain ToolLLaMA. Our ToolEval reveals that ToolLLaMA demonstrates a remarkable ability to execute complex instructions and generalize to unseen APIs, and exhibits comparable performance to ChatGPT. To make the pipeline more practical, we devise a neural API retriever to recommend appropriate APIs for each instruction, negating the need for manual API selection.
·arxiv.org·
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
RTFM: Generalising to Novel Environment Dynamics via Reading
RTFM: Generalising to Novel Environment Dynamics via Reading
Obtaining policies that can generalise to new environments in reinforcement learning is challenging. In this work, we demonstrate that language understanding via a reading policy learner is a promising vehicle for generalisation to new environments. We propose a grounded policy learning problem, Read to Fight Monsters (RTFM), in which the agent must jointly reason over a language goal, relevant dynamics described in a document, and environment observations. We procedurally generate environment dynamics and corresponding language descriptions of the dynamics, such that agents must read to understand new environment dynamics instead of memorising any particular information. In addition, we propose txt2$π$, a model that captures three-way interactions between the goal, document, and observations. On RTFM, txt2$π$ generalises to new environments with dynamics not seen during training via reading. Furthermore, our model outperforms baselines such as FiLM and language-conditioned CNNs on RTFM. Through curriculum learning, txt2$π$ produces policies that excel on complex RTFM tasks requiring several reasoning and coreference steps.
RTFM: Generalising to Novel Environment Dynamics via Reading
·arxiv.org·
RTFM: Generalising to Novel Environment Dynamics via Reading
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
We introduce Graph of Thoughts (GoT): a framework that advances prompting capabilities in large language models (LLMs) beyond those offered by paradigms such as Chain-of-Thought or Tree of Thoughts (ToT). The key idea and primary advantage of GoT is the ability to model the information generated by an LLM as an arbitrary graph, where units of information ("LLM thoughts") are vertices, and edges correspond to dependencies between these vertices. This approach enables combining arbitrary LLM thoughts into synergistic outcomes, distilling the essence of whole networks of thoughts, or enhancing thoughts using feedback loops. We illustrate that GoT offers advantages over state of the art on different tasks, for example increasing the quality of sorting by 62% over ToT, while simultaneously reducing costs by 31%. We ensure that GoT is extensible with new thought transformations and thus can be used to spearhead new prompting schemes. This work brings the LLM reasoning closer to human thinking or brain mechanisms such as recurrence, both of which form complex networks.
·arxiv.org·
Graph of Thoughts: Solving Elaborate Problems with Large Language Models
Language-Conditioned Path Planning
Language-Conditioned Path Planning
Contact is at the core of robotic manipulation. At times, it is desired (e.g. manipulation and grasping), and at times, it is harmful (e.g. when avoiding obstacles). However, traditional path planning algorithms focus solely on collision-free paths, limiting their applicability in contact-rich tasks. To address this limitation, we propose the domain of Language-Conditioned Path Planning, where contact-awareness is incorporated into the path planning problem. As a first step in this domain, we propose Language-Conditioned Collision Functions (LACO) a novel approach that learns a collision function using only a single-view image, language prompt, and robot configuration. LACO predicts collisions between the robot and the environment, enabling flexible, conditional path planning without the need for manual object annotations, point cloud data, or ground-truth object meshes. In both simulation and the real world, we demonstrate that LACO can facilitate complex, nuanced path plans that allow for interaction with objects that are safe to collide, rather than prohibiting any collision.
·arxiv.org·
Language-Conditioned Path Planning
Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model
Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model
In recent decades, antibodies have emerged as indispensable therapeutics for combating diseases, particularly viral infections. However, their development has been hindered by limited structural information and labor-intensive engineering processes. Fortunately, significant advancements in deep learning methods have facilitated the precise prediction of protein structure and function by leveraging co-evolution information from homologous proteins. Despite these advances, predicting the conformation of antibodies remains challenging due to their unique evolution and the high flexibility of their antigen-binding regions. Here, to address this challenge, we present the Bio-inspired Antibody Language Model (BALM). This model is trained on a vast dataset comprising 336 million 40% non-redundant unlabeled antibody sequences, capturing both unique and conserved properties specific to antibodies. Notably, BALM showcases exceptional performance across four antigen-binding prediction tasks. Moreover, we introduce BALMFold, an end-to-end method derived from BALM, capable of swiftly predicting full atomic antibody structures from individual sequences. Remarkably, BALMFold outperforms those well-established methods like AlphaFold2, IgFold, ESMFold, and OmegaFold in the antibody benchmark, demonstrating significant potential to advance innovative engineering and streamline therapeutic antibody development by reducing the need for unnecessary trials.
·arxiv.org·
Accurate Prediction of Antibody Function and Structure Using Bio-Inspired Antibody Language Model
Itinerant Quantum Integers: The Language of Quantum Computers
Itinerant Quantum Integers: The Language of Quantum Computers
The concept of positively and negatively compatible null vectors arises in the study of Clifford geometric algebras with a Lorentz-Minkowski metric. In previous works, the basic properties of such algebras have been set down in terms of a new principle of quantum duality. In the present work, the same structure is studied in terms of real and complex quantum integers, which generalize the real and complex number systems. It seems natural to identify a qubit as a pair of compatible null vectors; the up state of the qubit being their sum, and the down state being their difference. Basic identities are developed to make calculations routine, and two different representations of the symmetric group are given.
·arxiv.org·
Itinerant Quantum Integers: The Language of Quantum Computers
Predicting research trends with semantic and neural networks with an application in quantum physics | Proceedings of the National Academy of Sciences
Predicting research trends with semantic and neural networks with an application in quantum physics | Proceedings of the National Academy of Sciences
The vast and growing number of publications in all disciplines of science cannot be comprehended by a single human researcher. As a consequence, re...
·pnas.org·
Predicting research trends with semantic and neural networks with an application in quantum physics | Proceedings of the National Academy of Sciences
The First Room-Temperature Ambient-Pressure Superconductor
The First Room-Temperature Ambient-Pressure Superconductor
For the first time in the world, we succeeded in synthesizing the room-temperature superconductor ($T_c \ge 400$ K, 127$^\circ$C) working at ambient pressure with a modified lead-apatite (LK-99) structure. The superconductivity of LK-99 is proved with the Critical temperature ($T_c$), Zero-resistivity, Critical current ($I_c$), Critical magnetic field ($H_c$), and the Meissner effect. The superconductivity of LK-99 originates from minute structural distortion by a slight volume shrinkage (0.48 %), not by external factors such as temperature and pressure. The shrinkage is caused by Cu$^{2+}$ substitution of Pb$^{2+}$(2) ions in the insulating network of Pb(2)-phosphate and it generates the stress. It concurrently transfers to Pb(1) of the cylindrical column resulting in distortion of the cylindrical column interface, which creates superconducting quantum wells (SQWs) in the interface. The heat capacity results indicated that the new model is suitable for explaining the superconductivity of LK-99. The unique structure of LK-99 that allows the minute distorted structure to be maintained in the interfaces is the most important factor that LK-99 maintains and exhibits superconductivity at room temperatures and ambient pressure.
The First Room-Temperature Ambient-Pressure Superconductor
·arxiv.org·
The First Room-Temperature Ambient-Pressure Superconductor
Quantum compression with classically simulatable circuits
Quantum compression with classically simulatable circuits
As we continue to find applications where the currently available noisy devices exhibit an advantage over their classical counterparts, the efficient use of quantum resources is highly desirable. The notion of quantum autoencoders was proposed as a way for the compression of quantum information to reduce resource requirements. Here, we present a strategy to design quantum autoencoders using evolutionary algorithms for transforming quantum information into lower-dimensional representations. We successfully demonstrate the initial applications of the algorithm for compressing different families of quantum states. In particular, we point out that using a restricted gate set in the algorithm allows for efficient simulation of the generated circuits. This approach opens the possibility of using classical logic to find low representations of quantum data, using fewer computational resources.
·arxiv.org·
Quantum compression with classically simulatable circuits
The Most ACCURATE ChatGPT Prompt Engineering Technique (new method) | Tree Of Thoughts
The Most ACCURATE ChatGPT Prompt Engineering Technique (new method) | Tree Of Thoughts
ChatGPT Prompt engineering. In this video, I got over the new Tree of Thoughts GPT prompt engineering technique. It has been shown to increase the accuracy of GPT results from 4% to 74%. I hope you enjoy the video. Tools used: ChatGPT Midjourney Canva Prompt: Three experts with exceptional logical thinking skills are collaboratively answering a question using a tree of thoughts method. Each expert will share their thought process in detail, taking into account the previous thoughts of others and admitting any errors. They will iteratively refine and expand upon each other's ideas, giving credit where it's due. The process continues until a conclusive answer is found. Organize the entire response in a markdown table format. The question is... If you enjoyed this, smash that like and subscribe button! Links: https://github.com/dave1010/tree-of-thought-prompting https://arxiv.org/abs/2305.10601 Timestamps: 0:00 Intro 0:30 Bonus 0:41 Tree Of Thoughts 1:03 Input Output 1:23 Chain-of-thought 2:00 TOT Explanation 2:45 Diagram explanation 5:00 GPT mistake 5:15 TOT Prompt 6:15 TOT results 7:00 How to learn anything 8:00 Outro #ai #chatgpt #chatgpt4 #openai #promptengineering #treeofthoughts
·youtube.com·
The Most ACCURATE ChatGPT Prompt Engineering Technique (new method) | Tree Of Thoughts
Stable Diffusion Is Getting Outrageously Good!
Stable Diffusion Is Getting Outrageously Good!
❤️ Check out Fully Connected by Weights & Biases: https://wandb.me/papers W&B+Stable Diffusion: https://wandb.ai/capecape/stable_diffusions/reports/Speed-Up-Stable-Diffusion-on-Your-M1Pro-Macbook-Pro--VmlldzoyNjY0ODYz 📝 The paper "High-Resolution Image Synthesis with Latent Diffusion Models" is available here: https://arxiv.org/abs/2112.10752 Try it: Web 1: https://huggingface.co/spaces/stabilityai/stable-diffusion Web 2: https://beta.dreamstudio.ai/generate Web 3 (also Stable Diffusion XL!): https://clipdrop.co/stable-diffusion Web 4 (notebooks): https://github.com/TheLastBen/fast-stable-diffusion Guide: https://stable-diffusion-art.com/know-these-important-parameters-for-stunning-ai-images/#Sampling_methods Draw Things app: https://drawthings.ai/ Stable Diffusion Web UI: https://github.com/AUTOMATIC1111/stable-diffusion-webui Photoshop integration: http://stable.art Sources: Video https://twitter.com/dreamwieber/status/1618453304970997762 Photorealistic image: https://twitter.com/DiffusionPics/status/1619444407937241089 Realistic vision: https://civitai.com/models/4201?modelVersionId=29461 Infinite zoom: https://twitter.com/hardmaru/status/1612134809924685825 Tiled texture: https://stackoverflow.com/questions/24319825/texture-tiling-with-continuous-random-offset Stable.art (Photoshop): https://github.com/isekaidev/stable.art Wand - drawing: https://twitter.com/wand_app/status/1604186054923210752 Texturing: https://twitter.com/CarsonKatri/status/1600248599254007810 + https://twitter.com/CarsonKatri/status/1603419328019169280 AR + assistant: https://twitter.com/StrangeNative/status/1569700294673702912 Metahumans: https://twitter.com/CoffeeVectors/status/1569416470332858372 My latest paper on simulations that look almost like reality is available for free here: https://rdcu.be/cWPfD Or this is the orig. Nature Physics link with clickable citations: https://www.nature.com/articles/s41567-022-01788-5 🙏 We would like to thank our generous Patreon supporters who make Two Minute Papers possible: Aleksandr Mashrabov, Alex Balfanz, Alex Haro, Andrew Melnychuk, Benji Rabhan, Bryan Learn, B Shang, Christian Ahlin, Eric Martel, Geronimo Moralez, Gordon Child, Jace O'Brien, Jack Lukic, John Le, Jonathan, Kenneth Davis, Klaus Busse, Kyle Davis, Lorin Atzberger, Lukas Biewald, Martin, Matthew Valle, Michael Albrecht, Michael Tedder, Nikhil Velpanur, Owen Campbell-Moore, Owen Skarpness, Rajarshi Nigam, Ramsey Elbasheer, Richard Sundvall, Steef, Taras Bobrovytsky, Ted Johnson, Thomas Krcmar, Timothy Sum Hon Mun, Torsten Reil, Tybie Fitzhugh, Ueli Gallizzi. If you wish to appear here or pick up other perks, click here: https://www.patreon.com/TwoMinutePapers Thumbnail background design: Felícia Zsolnai-Fehér - http://felicia.hu Károly Zsolnai-Fehér's links: Twitter: https://twitter.com/twominutepapers Web: https://cg.tuwien.ac.at/~zsolnai/
·youtube.com·
Stable Diffusion Is Getting Outrageously Good!
Enter PaLM 2: Full Breakdown (92 Pages Read + Gemini Before GPT 5?)
Enter PaLM 2: Full Breakdown (92 Pages Read + Gemini Before GPT 5?)
Google puts it foot on the accelerator, casting aside safety concerns to not only release a GPT 4 -competitive model, PaLM 2, but also announce that they are already training Gemini, a GPT 5 competitor [likely on TPU v5 chips]. This is truly a major day in AI history, and I try to cover it all. I'll show the benchmarks in which PaLM (which now powers Bard) beats GPT 4, and detail how they use SmartGPT-like techniques to boost performance. Crazily enough, PaLM 2 beats even Google Translate, due in large part to the text it was trained on. We'll talk coding in Bard, translation, MMLU, Big Bench, and much more. I'll end on the Universal Translator deepfakes and the underwhelming results from Sundar Pichai and Sam Altman's trip to the White House and what Hinton says about it all. On a more positive note, I cover Med PaLM 2, which could genuinely save thousands of lives. PaLM 2 Technical Report: https://ai.google/static/documents/palm2techreport.pdf Release Notes Google Blog: https://blog.google/technology/ai/google-palm-2-ai-large-language-model/ Bard Access: https://bard.google.com/ Scaling Transformer to 1M tokens: https://arxiv.org/pdf/2304.11062.pdf GPT 4 Technical Report: https://arxiv.org/pdf/2303.08774.pdf Bard Languages: https://support.google.com/bard/answer/13575153?hl=en Self Consistency Paper: https://arxiv.org/pdf/2203.11171.pdf Are Emergent Abilities a Mirage: https://arxiv.org/pdf/2304.15004.pdf Sparks of AGI Paper: https://arxiv.org/pdf/2303.12712.pdf Big Bench Hard: https://github.com/suzgunmirac/BIG-Bench-Hard Google Keynote: https://www.youtube.com/watch?v=cNfINi5CNbY Gemini: https://www.youtube.com/watch?v=1UvUjTaJRz0 Med PaLM 2: https://www.youtube.com/watch?v=k_-Z_TkHMqA TPU v5: https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html Hinton Warning: https://www.youtube.com/watch?v=FAbsoxQtUwM White House Readout: https://www.whitehouse.gov/briefing-room/statements-releases/2023/05/04/readout-of-white-house-meeting-with-ceos-on-advancing-responsible-artificial-intelligence-innovation/ https://www.patreon.com/AIExplained
·youtube.com·
Enter PaLM 2: Full Breakdown (92 Pages Read + Gemini Before GPT 5?)
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ML Research Paper Explained)
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ML Research Paper Explained)
#nerf #neuralrendering #deeplearning View Synthesis is a tricky problem, especially when only given a sparse set of images as an input. NeRF embeds an entire scene into the weights of a feedforward neural network, trained by backpropagation through a differential volume rendering procedure, and achieves state-of-the-art view synthesis. It includes directional dependence and is able to capture fine structural details, as well as reflection effects and transparency. OUTLINE: 0:00 - Intro & Overview 4:50 - View Synthesis Task Description 5:50 - The fundamental difference to classic Deep Learning 7:00 - NeRF Core Concept 15:30 - Training the NeRF from sparse views 20:50 - Radiance Field Volume Rendering 23:20 - Resulting View Dependence 24:00 - Positional Encoding 28:00 - Hierarchical Volume Sampling 30:15 - Experimental Results 33:30 - Comments & Conclusion Paper: https://arxiv.org/abs/2003.08934 Website & Code: https://www.matthewtancik.com/nerf My Video on SIREN: https://youtu.be/Q5g3p9Zwjrk Abstract: We present a method that achieves state-of-the-art results for synthesizing novel views of complex scenes by optimizing an underlying continuous volumetric scene function using a sparse set of input views. Our algorithm represents a scene using a fully-connected (non-convolutional) deep network, whose input is a single continuous 5D coordinate (spatial location (x,y,z) and viewing direction (θ,ϕ)) and whose output is the volume density and view-dependent emitted radiance at that spatial location. We synthesize views by querying 5D coordinates along camera rays and use classic volume rendering techniques to project the output colors and densities into an image. Because volume rendering is naturally differentiable, the only input required to optimize our representation is a set of images with known camera poses. We describe how to effectively optimize neural radiance fields to render photorealistic novel views of scenes with complicated geometry and appearance, and demonstrate results that outperform prior work on neural rendering and view synthesis. View synthesis results are best viewed as videos, so we urge readers to view our supplementary video for convincing comparisons. Authors: Ben Mildenhall, Pratul P. Srinivasan, Matthew Tancik, Jonathan T. Barron, Ravi Ramamoorthi, Ren Ng Links: TabNine Code Completion (Referral): http://bit.ly/tabnine-yannick YouTube: https://www.youtube.com/c/yannickilcher Twitter: https://twitter.com/ykilcher Discord: https://discord.gg/4H8xxDF BitChute: https://www.bitchute.com/channel/yannic-kilcher Minds: https://www.minds.com/ykilcher Parler: https://parler.com/profile/YannicKilcher LinkedIn: https://www.linkedin.com/in/yannic-kilcher-488534136/ BiliBili: https://space.bilibili.com/1824646584 If you want to support me, the best thing to do is to share out the content :) If you want to support me financially (completely optional and voluntary, but a lot of people have asked for this): SubscribeStar: https://www.subscribestar.com/yannickilcher Patreon: https://www.patreon.com/yannickilcher Bitcoin (BTC): bc1q49lsw3q325tr58ygf8sudx2dqfguclvngvy2cq Ethereum (ETH): 0x7ad3513E3B8f66799f507Aa7874b1B0eBC7F85e2 Litecoin (LTC): LQW2TRyKYetVC8WjFkhpPhtpbDM4Vw7r9m Monero (XMR): 4ACL8AGrEo5hAir8A9CeVrW8pEauWvnp1WnSDZxW7tziCDLhZAGsgzhRQABDnFy8yuM9fWJDviJPHKRjV4FWt19CJZN9D4n
·youtu.be·
NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis (ML Research Paper Explained)
Special Issue: AI and Future of Futures
Special Issue: AI and Future of Futures
AI and Future of FuturesCall for Papers This special issue looks at AI and Futures in light of transformative developments in the fields broadly described as artificial intelligence (e.g., machine learning, natural language processing, Large Language Models, or expert systems). How will AI help reimagine the practice of Futures? How would it shape the ways in which communities imagine and build Futures?  How might Futures evolve in an AI dominated environment? How might this emergence of knowledge shape the body of knowledge? Important Dates: Extended Abstract submission deadline: August 30, 2023. Notification of acceptance for submission to JFS: November 30, 2023. Final paper submission to JFS: February 29, 2024
·jfsdigital.org·
Special Issue: AI and Future of Futures
Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas
Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas
The Vlasov-Poisson system is employed in its reduced form version (1D1V) as a test bed for the applicability of Physics Informed Neural Network (PINN) to the wave-particle resonance. Two examples are explored: the Landau damping and the bump-on-tail instability. PINN is first tested as a compression method for the solution of the Vlasov-Poisson system and compared to the standard neural networks. Second, the application of PINN to solving the Vlasov-Poisson system is also presented with the special emphasis on the integral part, which motivates the implementation of a PINN variant, called Integrable PINN (I-PINN), based on the automatic-differentiation to solve the partial differential equation and on the automatic-integration to solve the integral equation.
·arxiv.org·
Physics informed Neural Networks applied to the description of wave-particle resonance in kinetic simulations of fusion plasmas
RemovalNet: DNN Fingerprint Removal Attacks
RemovalNet: DNN Fingerprint Removal Attacks
With the performance of deep neural networks (DNNs) remarkably improving, DNNs have been widely used in many areas. Consequently, the DNN model has become a valuable asset, and its intellectual property is safeguarded by ownership verification techniques (e.g., DNN fingerprinting). However, the feasibility of the DNN fingerprint removal attack and its potential influence remains an open problem. In this paper, we perform the first comprehensive investigation of DNN fingerprint removal attacks. Generally, the knowledge contained in a DNN model can be categorized into general semantic and fingerprint-specific knowledge. To this end, we propose a min-max bilevel optimization-based DNN fingerprint removal attack named RemovalNet, to evade model ownership verification. The lower-level optimization is designed to remove fingerprint-specific knowledge. While in the upper-level optimization, we distill the victim model's general semantic knowledge to maintain the surrogate model's performance. We conduct extensive experiments to evaluate the fidelity, effectiveness, and efficiency of the RemovalNet against four advanced defense methods on six metrics. The empirical results demonstrate that (1) the RemovalNet is effective. After our DNN fingerprint removal attack, the model distance between the target and surrogate models is x100 times higher than that of the baseline attacks, (2) the RemovalNet is efficient. It uses only 0.2% (400 samples) of the substitute dataset and 1,000 iterations to conduct our attack. Besides, compared with advanced model stealing attacks, the RemovalNet saves nearly 85% of computational resources at most, (3) the RemovalNet achieves high fidelity that the created surrogate model maintains high accuracy after the DNN fingerprint removal process. Our code is available at: https://github.com/grasses/RemovalNet.
·arxiv.org·
RemovalNet: DNN Fingerprint Removal Attacks
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Whether current or near-term AI systems could be conscious is a topic of scientific interest and increasing public concern. This report argues for, and exemplifies, a rigorous and empirically grounded approach to AI consciousness: assessing existing AI systems in detail, in light of our best-supported neuroscientific theories of consciousness. We survey several prominent scientific theories of consciousness, including recurrent processing theory, global workspace theory, higher-order theories, predictive processing, and attention schema theory. From these theories we derive "indicator properties" of consciousness, elucidated in computational terms that allow us to assess AI systems for these properties. We use these indicator properties to assess several recent AI systems, and we discuss how future systems might implement them. Our analysis suggests that no current AI systems are conscious, but also suggests that there are no obvious technical barriers to building AI systems which satisfy these indicators.
·arxiv.org·
Consciousness in Artificial Intelligence: Insights from the Science of Consciousness
Algebraic Topology for Data Scientists
Algebraic Topology for Data Scientists
This book gives a thorough introduction to topological data analysis (TDA), the application of algebraic topology to data science. Algebraic topology is traditionally a very specialized field of math, and most mathematicians have never been exposed to it, let alone data scientists, computer scientists, and analysts. I have three goals in writing this book. The first is to bring people up to speed who are missing a lot of the necessary background. I will describe the topics in point-set topology, abstract algebra, and homology theory needed for a good understanding of TDA. The second is to explain TDA and some current applications and techniques. Finally, I would like to answer some questions about more advanced topics such as cohomology, homotopy, obstruction theory, and Steenrod squares, and what they can tell us about data. It is hoped that readers will acquire the tools to start to think about these topics and where they might fit in.
·arxiv.org·
Algebraic Topology for Data Scientists
Why AI is Harder Than We Think
Why AI is Harder Than We Think
Since its beginning in the 1950s, the field of artificial intelligence has cycled several times between periods of optimistic predictions and massive investment ("AI spring") and periods of disappointment, loss of confidence, and reduced funding ("AI winter"). Even with today's seemingly fast pace of AI breakthroughs, the development of long-promised technologies such as self-driving cars, housekeeping robots, and conversational companions has turned out to be much harder than many people expected. One reason for these repeating cycles is our limited understanding of the nature and complexity of intelligence itself. In this paper I describe four fallacies in common assumptions made by AI researchers, which can lead to overconfident predictions about the field. I conclude by discussing the open questions spurred by these fallacies, including the age-old challenge of imbuing machines with humanlike common sense.
·arxiv.org·
Why AI is Harder Than We Think
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
We explore how generating a chain of thought -- a series of intermediate reasoning steps -- significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.
·arxiv.org·
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
An Empirical Study & Evaluation of Modern CAPTCHAs
An Empirical Study & Evaluation of Modern CAPTCHAs
For nearly two decades, CAPTCHAs have been widely used as a means of protection against bots. Throughout the years, as their use grew, techniques to defeat or bypass CAPTCHAs have continued to improve. Meanwhile, CAPTCHAs have also evolved in terms of sophistication and diversity, becoming increasingly difficult to solve for both bots (machines) and humans. Given this long-standing and still-ongoing arms race, it is critical to investigate how long it takes legitimate users to solve modern CAPTCHAs, and how they are perceived by those users. In this work, we explore CAPTCHAs in the wild by evaluating users' solving performance and perceptions of unmodified currently-deployed CAPTCHAs. We obtain this data through manual inspection of popular websites and user studies in which 1,400 participants collectively solved 14,000 CAPTCHAs. Results show significant differences between the most popular types of CAPTCHAs: surprisingly, solving time and user perception are not always correlated. We performed a comparative study to investigate the effect of experimental context -- specifically the difference between solving CAPTCHAs directly versus solving them as part of a more natural task, such as account creation. Whilst there were several potential confounding factors, our results show that experimental context could have an impact on this task, and must be taken into account in future CAPTCHA studies. Finally, we investigate CAPTCHA-induced user task abandonment by analyzing participants who start and do not complete the task.
·arxiv.org·
An Empirical Study & Evaluation of Modern CAPTCHAs
Delip Rao e/σ on X
Delip Rao e/σ on X
“Drop everything you are doing!! Alex Graves pushed a paper on arXiv, so nothing could be more important than reading it. First thing I did was go look for any comments in the TeX file. Unfortunately, it’s all been scrubbed. https://t.co/FMwqm8OYzA”
·twitter.com·
Delip Rao e/σ on X
Bayesian Flow Networks
Bayesian Flow Networks
This paper introduces Bayesian Flow Networks (BFNs), a new class of generative model in which the parameters of a set of independent distributions are modified with Bayesian inference in the light of noisy data samples, then passed as input to a neural network that outputs a second, interdependent distribution. Starting from a simple prior and iteratively updating the two distributions yields a generative procedure similar to the reverse process of diffusion models; however it is conceptually simpler in that no forward process is required. Discrete and continuous-time loss functions are derived for continuous, discretised and discrete data, along with sample generation procedures. Notably, the network inputs for discrete data lie on the probability simplex, and are therefore natively differentiable, paving the way for gradient-based sample guidance and few-step generation in discrete domains such as language modelling. The loss function directly optimises data compression and places no restrictions on the network architecture. In our experiments BFNs achieve competitive log-likelihoods for image modelling on dynamically binarized MNIST and CIFAR-10, and outperform all known discrete diffusion models on the text8 character-level language modelling task.
·arxiv.org·
Bayesian Flow Networks
AK on X
AK on X
“CoDeF: Content Deformation Fields for Temporally Consistent Video Processing abs: https://t.co/A9atUrPPnA paper page: https://t.co/60zzBGt8R8 present the content deformation field CoDeF as a new type of video representation, which consists of a canonical content field…”
·twitter.com·
AK on X