Public

451 bookmarks

Custom sorting

The Second Law of Quantum Complexity

#physic

·arxiv.org·Nov 28, 2023

The Second Law of Quantum Complexity

Improved Baselines with Visual Instruction Tuning

#ai #llm #multimodal

·arxiv.org·Oct 9, 2023

Improved Baselines with Visual Instruction Tuning

Peter Thiel: We've Seen Innovation in Bits, But Not Enough in Atoms

Star investor Peter Thiel tells Gartner Symposium attendees why we shouldn't be so focused on buzzwords like the Internet of Things, cloud, or big data.

We've Seen Innovation in Bits, But Not Enough in Atoms

#business

·pcmag.com·Sep 17, 2023

Peter Thiel: We've Seen Innovation in Bits, But Not Enough in Atoms

Ecological pyramid - Wikipedia

An ecological pyramid (also trophic pyramid, Eltonian pyramid, energy pyramid, or sometimes food pyramid) is a graphical representation designed to show the biomass or bioproductivity at each trophic level in an ecosystem. A pyramid of energy shows how much energy is retained in the form of new biomass from each trophic level, while a pyramid of biomass shows how much biomass (the amount of living or organic matter present in an organism) is present in the organisms. There is also a pyramid of numbers representing the number of individual organisms at each trophic level. Pyramids of energy are normally upright, but other pyramids can be inverted(pyramid of biomass for marine region) or take other shapes.(spindle shaped pyramid) Ecological pyramids begin with producers on the bottom (such as plants) and proceed through the various trophic levels (such as herbivores that eat plants, then carnivores that eat flesh, then omnivores that eat both plants and flesh, and so on). The highest level is the top of the food chain.

#ethic

·en.wikipedia.org·Aug 6, 2023

Ecological pyramid - Wikipedia

Unilateralist's curse - EA Forum

The unilateralist's curse is the phenomenon whereby, when each of many altruistic agents has the power to bring about some state of affairs whose net value is negative but unknown to these agents, the probability that the state will be realized grows with the number of agents who decide to act based on their own private judgment. Salient examples include decisions to leak information about weapons technologies, potential decisions by individual nations to use geoengineering to mitigate climate change, and the unilateral decision to introduce rabbits to Australia. To avoid the unilateralist’s curse, members of a group might implement a group decision-making procedure, deliberate with others before taking action, or create a norm of deferring to the beliefs or actions of the other members of the group. FURTHER READING Bostrom, Nick, Thomas Douglas & Anders Sandberg (2016) The unilateralist's curse and the case for a principle of conformity, Social Epistemology, vol. 30, pp. 350-371. Lewis, Gregory (2018) Horsepox synthesis: A case of the unilateralist's curse?, Bulletin of the Atomic Scientists, February 19. Usefully connects the curse to other factors Schubert, Stefan & Ben Garfinkel (2017) Hard-to-reverse decisions destroy option value, Centre for Effective Altruism, March 17. Zhang, Linchuan (2020) Framing issues with the unilateralist's curse, Effective Altruism Forum, January 17. RELATED ENTRIES accidental harm | information hazard | optimizer's curse

#ea

·forum.effectivealtruism.org·Jul 8, 2023

Unilateralist's curse - EA Forum

The meaning of life

#philosophy

·pbs.twimg.com·Jul 8, 2023

The meaning of life

Create and Curate

“It’s one thing to create. The other is you have to choose. 'What are we going to do, and what are we not going to do?’ This is a gigantic aspect of show-business survival. It’s kind of unseen, what’s picked and what is discarded, but mastering that is how you stay alive.”

Victim or survivor mindset “I think that there are two ways of looking at things that have happened to you. You can be a victim or you can be a survivor. Those are two very different cognitive positions. You can’t control what happens to you in either circumstance, but one is very powerful. You have overcome. One is you have had something happen to you and you are under that thing for quite some period of time. For me, if I hear someone and I hear that helplessness, one is that I want to reframe that experience. I want to tell a different story. I want them to tell a different narrative to themselves. I want them to rewrite that. In some ways, you want them to rewrite that narrative to survivorship and overcoming and what it took. You ask the right questions to get them to see that their own throughway in that case is based on their strength and ability. You want them to see those things rather than seeing the helplessness and powerlessness.”

Moving forward “When people are telling me that ‘I’m doing this because of my childhood’ or ‘I’m doing this because of this,’ I think you’re giving up some amount of power. You’re giving up a lot of power to something outside of yourself, and also, how you’re interpreting that event is not useful to you. There may be a lot of truth to the terrible things that have happened, but those terrible things—you have to shut the door at some point and say, “I am my own man or woman, and I move forward. … if you are somebody who uses other events as a reason to self-destruct, you’re ceding power… We see that even in companies—’I’m doing this because so-and-so made me angry. I’m doing this because…’—and you end up making some poor decisions and ceding power because of someone else. You’re willing to make a poor decision. You’re willing to give up. Sometimes people are willing to give up their entire future dreams because of X, Y, and Z, and it’s a tragedy. You want people to really understand the power they have to create their own lives at some point and that creation is not given to anyone else but you.”

#wisdom #metaphysical

·fs.blog·Jul 2, 2023

Create and Curate

Nonviolent Communication - Wikipedia

Nonviolent Communication (NVC) is an approach to communication that is claimed to be based on principles of nonviolence. It is not an attempt to end disagreements, but rather a method that is claimed to increase empathy and improve the quality of life of those who utilize the method and the people around them. Nonviolent Communication evolved from concepts used in person-centered therapy, and was developed by clinical psychologist Marshall Rosenberg beginning in the 1960s and 1970s. There are a large number of workshops and clinical materials about NVC, including Rosenberg's book Nonviolent Communication: A Language of Life. [1] [2] [3][4] Marshall Rosenberg also taught NVC in a number of video lectures available online; the workshop recorded in San Francisco is the most well-known

·en.wikipedia.org·Jun 11, 2023

Nonviolent Communication - Wikipedia

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

null

#ai #speech

·scontent-ord5-1.xx.fbcdn.net·Jun 19, 2023

Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale

LIMA: Less Is More for Alignment

#llm #ai #ai-alignment

·arxiv.org·Jun 15, 2023

LIMA: Less Is More for Alignment

Procrustes transformation - Wikipedia

A Procrustes transformation is a geometric transformation that involves only translation, rotation, uniform scaling, or a combination of these transformations. Hence, it may change the size or position, but not the shape of a geometric object. Named after the mythical Greek robber, Procrustes, who made his victims fit his bed either by stretching their limbs or cutting them off.

#mathematic

·en.wikipedia.org·Jun 6, 2023

Procrustes transformation - Wikipedia

Procrustes analysis - Wikipedia

In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The name Procrustes (Greek: Προκρούστης) refers to a bandit from Greek mythology who made his victims fit his bed either by stretching their limbs or cutting them off. In mathematics: an orthogonal Procrustes problem is a method which can be used to find out the optimal rotation and/or reflection (i.e., the optimal orthogonal linear transformation) for the Procrustes Superimposition (PS) of an object with respect to another. a constrained orthogonal Procrustes problem, subject to det(R) = 1 (where R is a rotation matrix), is a method which can be used to determine the optimal rotation for the PS of an object with respect to another (reflection is not allowed). In some contexts, this method is called the Kabsch algorithm.

#mathematic

·en.wikipedia.org·Jun 6, 2023

Procrustes analysis - Wikipedia

The Metagame: Think One Step Ahead

Discover how legends like Warren Buffett and Bill Belichick outsmart the competition using the Metagame. Learn how to apply this timeless strategy in various life contexts for innovative solutions and personal success.

The metagame is this psychological game that exists among players, involving adjustments – adjustments based on how an opponent is likely to interpret a given set of actions. Better players adjust their strategies and styles to those of particular opponents, always analyzing how the opponents are playing in terms of how the opponents believe they’re playing. Maintaining a well-balanced strategy, while deciphering your opponents’ strategies, is the key to the metagame. If you comprehend the concept of the metagame, accurately perceive the flow of your table and then tournament, and stay alerted to and aware of current strategy trends, you’ll be able to successfully mix up your play when considering your image and that of your opponents. In return, your game will be highly unpredictable and difficult to read, which should be your ultimate goal.

#rationality

·fs.blog·May 21, 2023

The Metagame: Think One Step Ahead

How we sped up transformer inference 100x for 🤗 API customers

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

Tokenization is often a bottleneck for efficiency during inference. We use the most efficient methods from the 🤗 Tokenizers library, leveraging the Rust implementation of the model tokenizer in combination with smart caching to get up to 10x speedup for the overall latency.

Once the compute platform has been selected for the use case, we can go to work. Here are some CPU-specific techniques that can be applied with a static graph: Optimizing the graph (Removing unused flow) Fusing layers (with specific CPU instructions) Quantizing the operations

#llm #mlops

·huggingface.co·May 21, 2023

How we sped up transformer inference 100x for 🤗 API customers

IMAGEBIND: One Embedding Space To Bind Them All

#ai #multimodal

·arxiv.org·May 10, 2023

IMAGEBIND: One Embedding Space To Bind Them All

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

Language models are increasingly being deployed for general problem solving across a wide range of tasks, but are still confined to token-level, left-to-right decision-making processes during inference. This means they can fall short in tasks that require exploration, strategic lookahead, or where initial decisions play a pivotal role. To surmount these challenges, we introduce a new framework for language model inference, Tree of Thoughts (ToT), which generalizes over the popular Chain of Thought approach to prompting language models, and enables exploration over coherent units of text (thoughts) that serve as intermediate steps toward problem solving. ToT allows LMs to perform deliberate decision making by considering multiple different reasoning paths and self-evaluating choices to decide the next course of action, as well as looking ahead or backtracking when necessary to make global choices. Our experiments show that ToT significantly enhances language models' problem-solving abilities on three novel tasks requiring non-trivial planning or search: Game of 24, Creative Writing, and Mini Crosswords. For instance, in Game of 24, while GPT-4 with chain-of-thought prompting only solved 4% of tasks, our method achieved a success rate of 74%. Code repo with all prompts: https://github.com/ysymyth/tree-of-thought-llm.

#ai #llm

·arxiv.org·May 23, 2023

Tree of Thoughts: Deliberate Problem Solving with Large Language Models

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

#ai #llm

·arxiv.org·May 15, 2023

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION

#ai #llm

·arxiv.org·May 12, 2023

TRAIN SHORT, TEST LONG: ATTENTION WITH LINEAR BIASES ENABLES INPUT LENGTH EXTRAPOLATION

Quantization - Qdrant

Qdrant is an Open-Source Vector Database and Vector Search Engine written in Rust. It provides fast and scalable vector similarity search service with convenient API.

Quantization is an optional feature in Qdrant that enables efficient storage and search of high-dimensional vectors. By transforming original vectors into a new representations, quantization compresses data while preserving close to original relative distances between vectors. Different quantization methods have different mechanics and tradeoffs. We will cover them in this section.

#embeddings #ml #ai

·qdrant.tech·May 1, 2023

Quantization - Qdrant

Backpropagation through time - Wikipedia

Backpropagation through time (BPTT) is a gradient-based technique for training certain types of recurrent neural networks. It can be used to train Elman networks. The algorithm was independently derived by numerous researchers

#ai #ml

·en.wikipedia.org·Apr 29, 2023

Backpropagation through time - Wikipedia

Nonviolent Communication - Wikipedia

Nonviolent Communication (NVC) is an approach to communication based on principles of nonviolence. It is not a technique to end disagreements, but rather a method designed to increase empathy and improve the quality of life of those who utilize the method and the people around them.

#humans

·en.m.wikipedia.org·May 7, 2023

Nonviolent Communication - Wikipedia

DNA and RNA codon tables - Wikipedia

A codon table can be used to translate a genetic code into a sequence of amino acids.[1][2] The standard genetic code is traditionally represented as an RNA codon table, because when proteins are made in a cell by ribosomes, it is messenger RNA (mRNA) that directs protein synthesis

#biology

·en.wikipedia.org·Apr 17, 2023

DNA and RNA codon tables - Wikipedia

Emergent and Predictable Memorization in Large Language Models

Memorization, or the tendency of large language models (LLMs) to output entire sequences from their training data verbatim, is a key concern for safely deploying language models. In particular, it is vital to minimize a model's memorization of sensitive datapoints such as those containing personal identifiable information (PII). The prevalence of such undesirable memorization can pose issues for model trainers, and may even require discarding an otherwise functional model. We therefore seek to predict which sequences will be memorized before a large model's full train-time by extrapolating the memorization behavior of lower-compute trial runs. We measure memorization of the Pythia model suite, and find that intermediate checkpoints are better predictors of a model's memorization behavior than smaller fully-trained models. We additionally provide further novel discoveries on the distribution of memorization scores across models and data.

The paper "Emergent and Predictable Memorization in Large Language Models" by Stella Biderman et al. studies the problem of memorization in large language models and proposes a method to predict which sequences will be memorized before full training of the model, based on extrapolation of memorization behavior from lower-compute trial runs, and provides novel insights on the distribution of memorization scores across models and data. Key insights and lessons learned from the paper: Memorization is a key concern for deploying large language models safely, particularly for sensitive datapoints such as PII. Intermediate checkpoints are better predictors of memorization behavior than smaller fully-trained models. Memorization scores follow a power-law distribution across models and data, with some datapoints being more prone to memorization than others. Fine-tuning can mitigate memorization to some extent, but not completely.

#ai #llm

·arxiv.org·Apr 24, 2023

Emergent and Predictable Memorization in Large Language Models

The Forward-Forward Algorithm: Some Preliminary Investigations

#ai

·arxiv.org·Apr 24, 2023

The Forward-Forward Algorithm: Some Preliminary Investigations

Evidence of a predictive coding hierarchy in the human brain listening to speech - Nature Human Behaviour

Current machine learning language algorithms make adjacent word-level predictions. In this work, Caucheteux et al. show that the human brain probably uses long-range and hierarchical predictions, taking into account up to eight possible words into the future.

#neuroscience #ai

·nature.com·Apr 17, 2023

Evidence of a predictive coding hierarchy in the human brain listening to speech - Nature Human Behaviour

Recurrent Memory Transformer

Transformer-based models show their effectiveness across multiple domains and tasks. The self-attention allows to combine information from all sequence elements into context-aware representations. However, global and local information has to be stored mostly in the same element-wise representations. Moreover, the length of an input sequence is limited by quadratic computational complexity of self-attention. In this work, we propose and study a memory-augmented segment-level recurrent Transformer (RMT). Memory allows to store and process local and global information as well as to pass information between segments of the long sequence with the help of recurrence. We implement a memory mechanism with no changes to Transformer model by adding special memory tokens to the input or output sequence. Then the model is trained to control both memory operations and sequence representations processing. Results of experiments show that RMT performs on par with the Transformer-XL on language modeling for smaller memory sizes and outperforms it for tasks that require longer sequence processing. We show that adding memory tokens to Tr-XL is able to improve its performance. This makes Recurrent Memory Transformer a promising architecture for applications that require learning of long-term dependencies and general purpose in memory processing, such as algorithmic tasks and reasoning.

The paper "Recurrent Memory Transformer" proposes a memory-augmented segment-level recurrent Transformer (RMT) model that stores and processes global and local information by adding memory tokens to the input or output sequence, and shows that RMT performs on par with Transformer-XL on language modeling for smaller memory sizes and outperforms it for longer sequence processing tasks. Key insights and lessons learned: The self-attention mechanism in Transformer-based models has quadratic computational complexity for long sequences and limits the amount of global and local information that can be stored and processed. Adding memory tokens to the input or output sequence of a Transformer-based model allows for memory-augmentation and the storage and processing of global and local information, as well as the passing of information between segments of long sequences with the help of recurrence. The proposed RMT model performs on par with Transformer-XL on language modeling for smaller memory sizes and outperforms it for longer sequence processing tasks. The RMT model can be applied to a wide range of tasks and domains, including natural language processing and image recognition.

#ai #llm

·arxiv.org·Apr 25, 2023

Recurrent Memory Transformer

Antiderivative - Wikipedia

In calculus, an antiderivative, inverse derivative, primitive function, primitive integral or indefinite integral[Note 1] of a function f is a differentiable function F whose derivative is equal to the original function f. This can be stated symbolically as F' = f.[1][2] The process of solving for antiderivatives is called antidifferentiation (or indefinite integration), and its opposite operation is called differentiation, which is the process of finding a derivative. Antiderivatives are often denoted by capital Roman letters such as F and G.

#mathematic #calculus

·en.wikipedia.org·Dec 6, 2022

Antiderivative - Wikipedia

Fundamental theorem of calculus - Wikipedia

The fundamental theorem of calculus is a theorem that links the concept of differentiating a function (calculating its slopes, or rate of change at each time) with the concept of integrating a function (calculating the area under its graph, or the cumulative effect of small contributions). The two operations are inverses of each other apart from a constant value which depends on where one starts to compute area.

#mathematic #calculus

·en.wikipedia.org·Dec 6, 2022

Fundamental theorem of calculus - Wikipedia

Stochastic variance reduction - Wikipedia

(Stochastic) variance reduction is an algorithmic approach to minimizing functions that can be decomposed into finite sums. By exploiting the finite sum structure, variance reduction techniques are able to achieve convergence rates that are impossible to achieve with methods that treat the objective as an infinite sum, as in the classical Stochastic approximation setting. Variance reduction approaches are widely used for training machine learning models such as logistic regression and support vector machines[1] as these problems have finite-sum structure and uniform conditioning that make them ideal candidates for variance reduction.

#ai #mathematic

·en.wikipedia.org·Nov 14, 2022

Stochastic variance reduction - Wikipedia

Stochastic approximation - Wikipedia

Stochastic approximation methods are a family of iterative methods typically used for root-finding problems or for optimization problems. The recursive update rules of stochastic approximation methods can be used, among other things, for solving linear systems when the collected data is corrupted by noise, or for approximating extreme values of functions which cannot be computed directly, but only estimated via noisy observations.

#ai #mathematic

·en.wikipedia.org·Nov 14, 2022

Stochastic approximation - Wikipedia