Cross-Silo Federated Learning: Challenges and Opportunities
Cross-Silo Federated Learning: Challenges and Opportunities
Federated learning (FL) is an emerging technology that enables the training of machine learning models from multiple clients while keeping the data distributed and private. Based on the participating clients and the model training scale, federated learning can be classified into two types: cross-device FL where clients are typically mobile devices and the client number can reach up to a scale of millions; cross-silo FL where clients are organizations or companies and the client number is usually small (e.g., within a hundred). While existing studies mainly focus on cross-device FL, this paper aims to provide an overview of the cross-silo FL. More specifically, we first discuss applications of cross-silo FL and outline its major challenges. We then provide a systematic overview of the existing approaches to the challenges in cross-silo FL by focusing on their connections and differences to cross-device FL. Finally, we discuss future directions and open issues that merit research efforts from the community.
federated learning can be classified into two types: cross-device FL where clients are typically mobile devices and the client number can reach up to a scale of millions; cross-silo FL where clients are organizations or companies and the client number is usually small (e.g., within a hundred)
·arxiv.org·
Cross-Silo Federated Learning: Challenges and Opportunities
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption
As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e.g., search history, medical record, bank account). Privacy-preserving inference of transformer models is on the demand of cloud service users. To protect privacy, it is an attractive choice to compute only with ciphertext in homomorphic encryption (HE). However, enabling pre-trained models inference on ciphertext data is difficult due to the complex computations in transformer blocks, which are not supported by current HE tools yet. In this work, we introduce $\textit{THE-X}$, an approximation approach for transformers, which enables privacy-preserving inference of pre-trained models developed by popular frameworks. $\textit{THE-X}$ proposes a workflow to deal with complex computation in transformer networks, including all the non-polynomial functions like GELU, softmax, and LayerNorm. Experiments reveal our proposed $\textit{THE-X}$ can enable transformer inference on encrypted data for different downstream tasks, all with negligible performance drop but enjoying the theory-guaranteed privacy-preserving advantage.
THE-X proposes a workflow to deal with complex computation in transformer networks, including all the non-polynomial functions like GELU, softmax, and LayerNorm. Experiments reveal our proposed THE-X can enable transformer inference on encrypted data for different downstream tasks, all with negligible performance drop but enjoying the theory-guaranteed privacy-preserving advantage.
·arxiv.org·
THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption
Measuring human relationships and experiences
Measuring human relationships and experiences
With the lines between enterprises' stakeholders—customers, workers, and partners—blurring rapidly, creating a good human experience could begin with putting in place a holistic strategy to measure this experience.
The lines are blurring between what constitutes a worker, a business partner, or a customer, and the door between these relationships is no longer closed; it is a revolving one.
·www2.deloitte.com·
Measuring human relationships and experiences
How behavioral principles affect consumer loyalty | Deloitte Insights
How behavioral principles affect consumer loyalty | Deloitte Insights
Know thyself: Proactively implement relationship guardrails to avoid repeating self-induced lock-in traps The best indicator of the future often comes from looking to the past. If you have had a history of staying in relationships longer than you should have, consider proactively—ideally, before a relationship commences—putting in place the following guardrails: Hesitate before acting upon referrals from friends or colleagues. Avoid entering into business relationships with friends or family. Set boundaries to prevent business relationships from evolving into personal friendships. Decline special perks, favors, or services from providers; instead, compensate providers for these additional services if they are something you truly desire. Establish explicit relationship agreements and exit clauses (for example, a “pre-nuptial” agreement or termination clause).
What got you here will not necessarily get you there: Beware of the lure of familiar, long-standing relationships Relationship length is a powerful influence on our non-exit decisions. Just because something worked for you in the past, however, doesn’t mean it is the best solution moving forward. Sadly, the tendency to stick with the status quo—a tendency that gets stronger over time—legitimizes firms’ propensity to abuse existing relationships for the sake of new prospects. Many firms commonly allocate more resources toward new prospects and pull back on the resources allocated to existing relationships. Consumers can help themselves recognize when long-standing relationships turn sour by having a heightened awareness of this business tactic
Carrots keep us in positive relationships; sticks keep us in negative relationships. While less prevalent overall as lock-in reasons, carrots represented many of the top reasons for study participants’ staying in positive relationships. As your organization allocates resources toward strategies that prevent consumers from leaving, consider the overall effect of sticks versus carrots on consumer attitude.
·www2.deloitte.com·
How behavioral principles affect consumer loyalty | Deloitte Insights
ARM vs RISC-V: What Are the Major Differences?
ARM vs RISC-V: What Are the Major Differences?
What are the major differences between RISC-V and ARM, and will one win over the other?
CISC allows a computer to do more in a single instruction cycle, while RISC allows for simpler programming. Generally speaking, RISC requires more clock cycles to complete the same instruction in CISC but can do so more efficiently (energy-wise), making them ideal for mobile applications. While x86/x64 remains the dominant architecture in the heavy processing market, ARM may face serious competition from a new processor architecture, RISC-V.
·electropages.com·
ARM vs RISC-V: What Are the Major Differences?
ARM vs. RISC-V: Is one better than the other? | Digital Trends
ARM vs. RISC-V: Is one better than the other? | Digital Trends
If you wanted to make a CPU, there are two obvious choices: ARM and RISC-V. But what are the differences between the two, and is one better than the other?
ARM and RISC-V are instruction set architectures, or ISAs. The ISA is the foundation of a processor and is the most fundamental and basic component of any CPU. Both ISAs are reduced instruction set computer (or RISC) designs, meaning the base instructions the CPU has access to are inherently simple but ideally fast to calculate. The ‘R’ in ARM actually stands for RISC (though ARM is no longer treated as an acronym), so in this sense the two ISAs are similar.
·digitaltrends.com·
ARM vs. RISC-V: Is one better than the other? | Digital Trends
Hamiltonian path - Wikipedia
Hamiltonian path - Wikipedia
In the mathematical field of graph theory, a Hamiltonian path (or traceable path) is a path in an undirected or directed graph that visits each vertex exactly once. A Hamiltonian cycle (or Hamiltonian circuit) is a cycle that visits each vertex exactly once. A Hamiltonian path that starts and ends at adjacent vertices can be completed by adding one more edge to form a Hamiltonian cycle, and removing any edge from a Hamiltonian cycle produces a Hamiltonian path. Determining whether such paths and cycles exist in graphs (the Hamiltonian path problem and Hamiltonian cycle problem) are NP-complete
·en.wikipedia.org·
Hamiltonian path - Wikipedia
Zero-knowledge proof - Wikipedia
Zero-knowledge proof - Wikipedia
In cryptography, a zero-knowledge proof or zero-knowledge protocol is a method by which one party (the prover) can prove to another party (the verifier) that a given statement is true while the prover avoids conveying any additional information apart from the fact that the statement is indeed true. The essence of zero-knowledge proofs is that it is trivial to prove that one possesses knowledge of certain information by simply revealing it; the challenge is to prove such possession without revealing the information itself or any additional information
·en.wikipedia.org·
Zero-knowledge proof - Wikipedia
Fermat's Library | Blue Zones: Lessons From the World's Longest Lived annotated/explained version.
Fermat's Library | Blue Zones: Lessons From the World's Longest Lived annotated/explained version.
Fermat's Library is a platform for illuminating academic papers.
This community of shepherds walk 5 mountainous miles a day or more. This natural movement provides all the positive cardiovascular benefits you might expect and also has a positive effect on muscle and bone metabolism without the point pounding of running marathon
·fermatslibrary.com·
Fermat's Library | Blue Zones: Lessons From the World's Longest Lived annotated/explained version.
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding. On the unconditional CIFAR10 dataset, we obtain an Inception score of 9.46 and a state-of-the-art FID score of 3.17. On 256x256 LSUN, we obtain sample quality similar to ProgressiveGAN. Our implementation is available at https://github.com/hojonathanho/diffusion
We present high quality image synthesis results using diffusion probabilistic models, a class of latent variable models inspired by considerations from nonequilibrium thermodynamics. Our best results are obtained by training on a weighted variational bound designed according to a novel connection between diffusion probabilistic models and denoising score matching with Langevin dynamics, and our models naturally admit a progressive lossy decompression scheme that can be interpreted as a generalization of autoregressive decoding
·arxiv.org·
Denoising Diffusion Probabilistic Models
The big boost: How incumbents successfully scale their new businesses
The big boost: How incumbents successfully scale their new businesses
Corporations can help their new ventures scale up if they avoid these six actions that can undermine success.
Successful start-ups focus from day one on the projected lifetime value for each targeted customer segment, and they review critical leading and lagging indicators every day. Noteworthy among high performers is their fixation on a single “star metric” that is most indicative of success for their business
·mckinsey.com·
The big boost: How incumbents successfully scale their new businesses
16 Startup Metrics | Andreessen Horowitz
16 Startup Metrics | Andreessen Horowitz
We have the privilege of meeting with thousands of entrepreneurs every year, and in the course of those discussions are presented with all kinds of numbers, measures, and metrics that illustrate the promise and health of a particular company. Sometimes, however, the metrics may not be the best gauge of what’s actually happening in the business, or people may use different definitions of the same metric in a way that makes it hard to understand the health of the business. So, while some of this may be obvious to many of you who live and breathe these metrics all day long, we compiled a list of the most common or confusing metrics. Where appropriate, we tried to add some notes on why investors focus on those metrics. Ultimately, though, good metrics aren’t about raising money from VCs -- they’re about running the business in a way where you know how and why certain things are working (or not), and can address or adjust accordingly. MORE
A common mistake is to use bookings and revenue interchangeably, but they aren’t the same thing. Bookings is the value of a contract between the company and the customer. It reflects a contractual obligation on the part of the customer to pay the company. Revenue is recognized when the service is actually provided or ratably over the life of the subscription agreement. How and when revenue is recognized is governed by GAAP.
Investors more highly value companies where the majority of total revenue comes from product revenue (vs. from services). Why? Services revenue is non-recurring, has much lower margins, and is less scalable. Product revenue is the what you generate from the sale of the software or product itself. ARR (annual recurring revenue) is a measure of revenue components that are recurring in nature. It should exclude one-time (non-recurring) fees and professional service fees. ARR per customer: Is this flat or growing? If you are upselling or cross-selling your customers, then it should be growing, which is a positive indicator for a healthy business. MRR (monthly recurring revenue): Often, people will multiply one month’s all-in bookings by 12 to get to ARR. Common mistakes with this method include: (1) counting non-recurring fees such as hardware, setup, installation, professional services/ consulting agreements; (2) counting bookings (see #1).
While top-line bookings growth is super important, investors want to understand how profitable that revenue stream is. Gross profit provides that measure. What’s included in gross profit may vary by company, but in general all costs associated with the manufacturing, delivery, and support of a product/service should be included.
·a16z.com·
16 Startup Metrics | Andreessen Horowitz
ACV (Average Contract Value) vs ARR (Average Recurring Revenue): How to use them?
ACV (Average Contract Value) vs ARR (Average Recurring Revenue): How to use them?
ACV (Average Contract Value) and ARR (Average Recurring Revenue) are two of the most crucial revenue metrics, Learn how to apply them both in your subscription business.
ACV or Annual Contract Value is a revenue metric that describes the amount of revenue you receive from a given customer each year. ARR or Annual Recurring Revenue is also a revenue metric that describes the amount of revenue you can expect to receive from your existing clients in a given year.
·chargebee.com·
ACV (Average Contract Value) vs ARR (Average Recurring Revenue): How to use them?
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hinder their applications to text-to-speech deployment. Through the preliminary study on diffusion model parameterization, we find that previous gradient-based TTS models require hundreds or thousands of iterations to guarantee high sample quality, which poses a challenge for accelerating sampling. In this work, we propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. Unlike previous work estimating the gradient for data density, ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. To tackle the model convergence challenge with decreased diffusion iterations, ProDiff reduces the data variance in the target site via knowledge distillation. Specifically, the denoising model uses the generated mel-spectrogram from an N-step DDIM teacher as the training target and distills the behavior into a new model with N/2 steps. As such, it allows the TTS model to make sharp predictions and further reduces the sampling time by orders of magnitude. Our evaluation demonstrates that ProDiff needs only 2 iterations to synthesize high-fidelity mel-spectrograms, while it maintains sample quality and diversity competitive with state-of-the-art models using hundreds of steps. ProDiff enables a sampling speed of 24x faster than real-time on a single NVIDIA 2080Ti GPU, making diffusion models practically applicable to text-to-speech synthesis deployment for the first time. Our extensive ablation studies demonstrate that each design in ProDiff is effective, and we further show that ProDiff can be easily extended to the multi-speaker setting. Audio samples are available at \url{https://ProDiff.github.io/.}
Denoising diffusion probabilistic models (DDPMs) have recently achieved leading performances in many generative tasks. However, the inherited iterative sampling process costs hinder their applications to text-to-speech deployment. Through the preliminary study on diffusion model parameterization, we find that previous gradient-based TTS models require hundreds or thousands of iterations to guarantee high sample quality, which poses a challenge for accelerating sampling. In this work, we propose ProDiff, on progressive fast diffusion model for high-quality text-to-speech. Unlike previous work estimating the gradient for data density, ProDiff parameterizes the denoising model by directly predicting clean data to avoid distinct quality degradation in accelerating sampling. To tackle the model convergence challenge with decreased diffusion iterations, ProDiff reduces the data variance in the target site via knowledge distillation
·arxiv.org·
ProDiff: Progressive Fast Diffusion Model For High-Quality Text-to-Speech
A Generalist Neural Algorithmic Learner
A Generalist Neural Algorithmic Learner
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.
The cornerstone of neural algorithmic reasoning is the ability to solve algorithmic tasks, especially in a way that generalises out of distribution. While recent years have seen a surge in methodological improvements in this area, they mostly focused on building specialist models. Specialist models are capable of learning to neurally execute either only one algorithm or a collection of algorithms with identical control-flow backbone. Here, instead, we focus on constructing a generalist neural algorithmic learner -- a single graph neural network processor capable of learning to execute a wide range of algorithms, such as sorting, searching, dynamic programming, path-finding and geometry. We leverage the CLRS benchmark to empirically show that, much like recent successes in the domain of perception, generalist algorithmic learners can be built by "incorporating" knowledge. That is, it is possible to effectively learn algorithms in a multi-task manner, so long as we can learn to execute them well in a single-task regime. Motivated by this, we present a series of improvements to the input representation, training regime and processor architecture over CLRS, improving average single-task performance by over 20% from prior art. We then conduct a thorough ablation of multi-task learners leveraging these improvements. Our results demonstrate a generalist learner that effectively incorporates knowledge captured by specialist models.
·arxiv.org·
A Generalist Neural Algorithmic Learner
Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments
Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments
We present a hierarchical learning framework, named PRELUDE, which decomposes the problem of perceptive locomotion into high-level decision-making to predict navigation commands and low-level gait generation to realize the target commands. In this framework, we train the high-level navigation controller with imitation learning on human demonstrations collected on a steerable cart and the low-level gait controller with reinforcement learning (RL). Therefore, our method can acquire complex navigation behaviors from human supervision and discover versatile gaits from trial and error
·ut-austin-rpl.github.io·
Learning to Walk by Steering: Perceptive Quadrupedal Locomotion in Dynamic Environments