Search Public

Found 118 bookmarks

Custom sorting

Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning

Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning

·arxiv.org·Oct 18, 2022

Robust Preference Learning for Storytelling via Contrastive Reinforcement Learning

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Large-scale pretraining is fast becoming the norm in Vision-Language (VL) modeling. However, prevailing VL approaches are limited by the requirement for labeled data and the use of complex multi-step pretraining objectives. We present MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pretraining is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. MAGMA outperforms Frozen on open-ended generative tasks, achieving state of the art results on the OKVQA benchmark and competitive results on a range of other popular VL benchmarks, while pretraining on 0.2% of the number of samples used to train SimVLM.

MAGMA - a simple method for augmenting generative language models with additional modalities using adapter-based finetuning. Building on Frozen, we train a series of VL models that autoregressively generate text from arbitrary combinations of visual and textual input. The pretraining is entirely end-to-end using a single language modeling objective, simplifying optimization compared to previous approaches. Importantly, the language model weights remain unchanged during training, allowing for transfer of encyclopedic knowledge and in-context learning abilities from language pretraining. MAGMA outperforms Frozen on open-ended generative tasks, achieving state of the art results on the OKVQA benchmark and competitive results on a range of other popular VL benchmarks, while pretraining on 0.2% of the number of samples used to train SimVLM.

·arxiv.org·Oct 16, 2022

MAGMA -- Multimodal Augmentation of Generative Models through Adapter-based Finetuning

Protein structure generation via folding diffusion

Protein structure generation via folding diffusion

·arxiv.org·Oct 12, 2022

Protein structure generation via folding diffusion

DreamFusion: Text-to-3D using 2D Diffusion

DreamFusion: Text-to-3D using 2D Diffusion

DreamFusion: Text-to-3D using 2D Diffusion, 2022.

Adapting this approach to 3D synthesis would require large-scale datasets of labeled 3D assets and efficient architectures for denoising 3D data, neither of which currently exist. In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis. We introduce a loss based on probability density distillation that enables the use of a 2D diffusion model as a prior for optimization of a parametric image generator. Using this loss in a DeepDream-like procedure, we optimize a randomly-initialized 3D model (a Neural Radiance Field, or NeRF) via gradient descent such that its 2D renderings from random angles achieve a low loss

·dreamfusion3d.github.io·Oct 7, 2022

DreamFusion: Text-to-3D using 2D Diffusion

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

·arxiv.org·Oct 7, 2022

Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning

Dreamfusion

Dreamfusion

·dreamfusion-cdn.ajayj.com·Oct 7, 2022

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

In particular, (i) we combine visual inputs with the frozen BiLM using light trainable modules, (ii) we train such modules using Web-scraped multi-modal data, and finally (iii) we perform zero-shot VideoQA inference through masked language modeling, where the masked text is the answer to a given question. Our proposed approach, FrozenBiLM, outperforms the state of the art in zero-shot VideoQA by a significant margin on a variety of datasets, including LSMDC-FiB, iVQA, MSRVTT-QA, MSVD-QA, ActivityNet-QA, TGIF-FrameQA, How2QA and TVQA. It also demonstrates competitive performance in the few-shot and fully-supervised setting. Our code and models will be made publicly available.

·antoyang.github.io·Oct 6, 2022

Zero-Shot Video Question Answering via Frozen Bidirectional Language Models

Highly accurate protein structure prediction with AlphaFold

Highly accurate protein structure prediction with AlphaFold

·static-content.springer.com·Oct 6, 2022

Highly accurate protein structure prediction with AlphaFold

AlphaFold reveals the structure of the protein universe

AlphaFold reveals the structure of the protein universe

Today, in partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), we’re now releasing predicted structures for nearly all catalogued proteins known to science, which will expand the AlphaFold DB by over 200x - from nearly 1 million structures to over 200 million structures - with the potential to dramatically increase our understanding of biology.

In partnership with EMBL’s European Bioinformatics Institute (EMBL-EBI), we’re now releasing predicted structures for nearly all catalogued proteins known to science, which will expand the AlphaFold DB by over 200x - from nearly 1 million structures to over 200 million structures - with the potential to dramatically increase our understanding of biology.

·deepmind.com·Oct 5, 2022

AlphaFold reveals the structure of the protein universe

Number of protein structure

Number of protein structure

·assets-global.website-files.com·Oct 5, 2022

Number of protein structure

Fine-Tuning Language Models from Human Preferences

Fine-Tuning Language Models from Human Preferences

·arxiv.org·Oct 4, 2022

Fine-Tuning Language Models from Human Preferences

Energy-based model

Energy-based model

·lh3.googleusercontent.com·Sep 30, 2022

Energy-based model

Potentials & challenge cross-silo federated learning

Potentials & challenge cross-silo federated learning

·lh3.googleusercontent.com·Sep 30, 2022

Potentials & challenge cross-silo federated learning

Cross-Silo Federated Learning: Challenges and Opportunities

Federated learning (FL) is an emerging technology that enables the training of machine learning models from multiple clients while keeping the data distributed and private. Based on the participating clients and the model training scale, federated learning can be classified into two types: cross-device FL where clients are typically mobile devices and the client number can reach up to a scale of millions; cross-silo FL where clients are organizations or companies and the client number is usually small (e.g., within a hundred). While existing studies mainly focus on cross-device FL, this paper aims to provide an overview of the cross-silo FL. More specifically, we first discuss applications of cross-silo FL and outline its major challenges. We then provide a systematic overview of the existing approaches to the challenges in cross-silo FL by focusing on their connections and differences to cross-device FL. Finally, we discuss future directions and open issues that merit research efforts from the community.

federated learning can be classified into two types: cross-device FL where clients are typically mobile devices and the client number can reach up to a scale of millions; cross-silo FL where clients are organizations or companies and the client number is usually small (e.g., within a hundred)

·arxiv.org·Sep 30, 2022

Cross-Silo Federated Learning: Challenges and Opportunities

Cross-Silo Federated Learning: Challenges and Opportunities

Cross-Silo Federated Learning: Challenges and Opportunities

·arxiv.org·Sep 30, 2022

Cross-Silo Federated Learning: Challenges and Opportunities

SplitFed: When Federated Learning Meets Split Learning

SplitFed: When Federated Learning Meets Split Learning

·arxiv.org·Sep 30, 2022

SplitFed: When Federated Learning Meets Split Learning

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e.g., search history, medical record, bank account). Privacy-preserving inference of transformer models is on the demand of cloud service users. To protect privacy, it is an attractive choice to compute only with ciphertext in homomorphic encryption (HE). However, enabling pre-trained models inference on ciphertext data is difficult due to the complex computations in transformer blocks, which are not supported by current HE tools yet. In this work, we introduce $\textit{THE-X}$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models developed by popular frameworks. $\textit{THE-X}$ proposes a workflow to deal with complex computation in transformer networks, including all the non-polynomial functions like GELU, softmax, and LayerNorm. Experiments reveal our proposed $\textit{THE-X}$ can enable transformer inference on encrypted data for different downstream tasks, all with negligible performance drop but enjoying the theory-guaranteed privacy-preserving advantage.

THE-X proposes a workflow to deal with complex computation in transformer networks, including all the non-polynomial functions like GELU, softmax, and LayerNorm. Experiments reveal our proposed THE-X can enable transformer inference on encrypted data for different downstream tasks, all with negligible performance drop but enjoying the theory-guaranteed privacy-preserving advantage.

·arxiv.org·Sep 30, 2022

THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

Denoising Diffusion Probabilistic Models

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion

We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding

·arxiv.org·Sep 27, 2022

Denoising Diffusion Probabilistic Models

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hinder their applications to text-to-speech deployment. Through the preliminary study on diffusion model parameterization, we find that previous gradient-based TTS models require hundreds or thousands of iterations to guarantee high sample quality, which poses a challenge for accelerating sampling. In this work, we propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. Unlike previous work estimating the gradient for data density, ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. To tackle the model convergence challenge with decreased diffusion iterations, ProDiff reduces the data variance in the target site via knowledge distillation. Specifically, the denoising model uses the generated mel-spectrogram from an N-step DDIM teacher as the training target and distills the behavior into a new model with N/2 steps. As such, it allows the TTS model to make sharp predictions and further reduces the sampling time by orders of magnitude. Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms, while it maintains sample quality and diversity competitive with state-of-the-art models using hundreds of steps. ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU, making diffusion models practically applicable to text-to-speech synthesis deployment for the first time. Our extensive ablation studies demonstrate that each design in ProDiff is effective, and we further show that ProDiff can be easily extended to the multi-speaker setting. Audio samples are available at \url{https://ProDiff.github.io/.}

Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hinder their applications to text-to-speech deployment. Through the preliminary study on diffusion model parameterization, we find that previous gradient-based TTS models require hundreds or thousands of iterations to guarantee high sample quality, which poses a challenge for accelerating sampling. In this work, we propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. Unlike previous work estimating the gradient for data density, ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. To tackle the model convergence challenge with decreased diffusion iterations, ProDiff reduces the data variance in the target site via knowledge distillation

·arxiv.org·Sep 26, 2022

ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech

A Generalist Neural Algorithmic Learner

The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.

The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.

·arxiv.org·Sep 24, 2022

A Generalist Neural Algorithmic Learner

LLM in business

LLM in business

·pbs.twimg.com·Sep 24, 2022

LLM in business

Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments

Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments

We present a hierarchical learning framework, named PRELUDE, which decomposes the problem of perceptive locomotion into high-level decision-making to predict navigation commands and low-level gait generation to realize the target commands. In this framework, we train the high-level navigation controller with imitation learning on human demonstrations collected on a steerable cart and the low-level gait controller with reinforcement learning (RL). Therefore, our method can acquire complex navigation behaviors from human supervision and discover versatile gaits from trial and error

·ut-austin-rpl.github.io·Sep 21, 2022

Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments

NLMap-SayCan

NLMap-SayCan

Project page for Open-vocabulary Queryable Scene Representations for Real World Planning

In this paper, we develop NLMap, an open-vocabulary and queryable scene representation to address this problem. NLMap serves as a framework to gather and integrate contextual information into LLM planners, allowing them to see and query available objects in the scene before generating a context-conditioned plan. NLMap first establishes a natural language queryable scene representation with Visual Language models (VLMs). An LLM based object proposal module parses instructions and proposes involved objects to query the scene representation for object availability and location. An LLM planner then plans with such information about the scene. NLMap allows robots to operate without a fixed list of objects nor executable options, enabling real robot operation unachievable by previous methods.

·nlmap-saycan.github.io·Sep 21, 2022

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

·richzhang.github.io·Sep 21, 2022

The Unreasonable Effectiveness of Deep Features as a Perceptual Metric

Pure Transformers are Powerful Graph Learners

We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at https://github.com/jw9730/tokengt.

We show that standard Transformers without graph-specific modifications can lead to promising results in graph learning both in theory and practice. Given a graph, we simply treat all nodes and edges as independent tokens, augment them with token embeddings, and feed them to a Transformer. With an appropriate choice of token embeddings, we prove that this approach is theoretically at least as expressive as an invariant graph network (2-IGN) composed of equivariant linear layers, which is already more expressive than all message-passing Graph Neural Networks (GNN). When trained on a large-scale graph dataset (PCQM4Mv2), our method coined Tokenized Graph Transformer (TokenGT) achieves significantly better results compared to GNN baselines and competitive results compared to Transformer variants with sophisticated graph-specific inductive bias. Our implementation is available at this https URL.

·arxiv.org·Sep 21, 2022

Pure Transformers are Powerful Graph Learners

Test-Time Prompt Tuning for Zero-shot Generalization in Vision-Language Models

Test-Time Prompt Tuning for Zero-shot Generalization in Vision-Language Models

Pre-trained vision-language models (e.g., CLIP) have shown impressive zero-shot generalization in various downstream tasks with properly designed text prompts. Instead of relying on hand-engineered prompts, recent works learn prompts using training data from downstream tasks, but this can be expensive and hard to generalize to new tasks and distributions. To this end, we propose test-time prompt tuning (TPT) as the first prompt tuning method that can learn adaptive prompts on the fly with a single test sample. TPT optimizes the prompt by minimizing the entropy with confidence selection so that the model has consistent predictions across different augmented views of each test sample

·azshue.github.io·Sep 16, 2022

Test-Time Prompt Tuning for Zero-shot Generalization in Vision-Language Models

What does "stationary" mean in the context of reinforcement learning?

I think I've seen the expressions "stationary data", "stationary dynamics" and "stationary policy", among others, in the context of reinforcement learning. What does it mean? I think stationary pol...

A stationary policy is a policy that does not change. Although strictly that is a time-dependent issue, that is not what the distinction refers to in reinforcement learning. It generally means that the policy is not being updated by a learning algorithm

·ai.stackexchange.com·Sep 16, 2022

What does "stationary" mean in the context of reinforcement learning?

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing...

Adaptive gradient algorithms borrow the moving average idea of heavy ball acceleration to estimate accurate first- and second-order moments of gradient for accelerating convergence. However,...

Adaptive gradient algorithms borrow the moving average idea of heavy ball acceleration to estimate accurate first- and second-order moments of gradient for accelerating convergence. However, Nesterov acceleration which converges faster than heavy ball acceleration in theory and also in many empirical cases is much less investigated under the adaptive gradient setting. In this work, we propose the ADAptive Nesterov momentum algorithm, Adan for short, to effectively speedup the training of deep neural networks. Adan first reformulates the vanilla Nesterov acceleration to develop a new Nesterov momentum estimation (NME) method, which avoids the extra computation and memory overhead of computing gradient at the extrapolation point. Then Adan adopts NME to estimate the first- and second-order moments of the gradient in adaptive gradient algorithms for convergence acceleration

·arxiv.org·Aug 25, 2022

Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing...