Disasters I've seen in a microservices world, part II
When I first wrote about microservice disasters, I thought we'd eventually "solve" them, with better tooling, frameworks, and operational maturity. We didn't. We just learned to live with the chaos. Distributed systems will always surprise you: timeouts, retries, and fallacies don't disappear; they just shift shape. Maybe that's the re...
When Martin Fowler's post about microservices came out in 2014, the teams where I worked were already building service-oriented architectures. That post and the subsequent hype made their way into almost every software team in the world. The "Netflix OSS stack" was the coolest thing back then, allowing engineers worldwide to leverage N...
How to Design a Rate Limiter (A Complete Guide for System Design Interviews)
A real-world look at how to build scalable rate limiters — from simple array-based approaches to distributed systems design — and how to answer this popular interview question with confidence.
Idempotency in System Design: Full example - Lukas Niessen - Medium
Idempotence is a concept frequently mentioned in system design. I will explain what it means in simple terms, briefly address common misunderstandings, and finish with a full example. In other words…
Writing Load Balancer From Scratch In 250 Line of Code
Hey, everyone. It's another weekend, and I was exploring what to build. So I decided to build a simple yet completely functional load balancer. Let's discuss it in this post.
Techniques for handling failure scenarios in microservice architectures
This article explores the strategies for managing failure scenarios in microservice architectures. It covers techniques to address both technical glitches and business impacts. You will learn how organizations can build fault-tolerant systems that are capable of gracefully handling cascading failures while maintaining core functionalities, even in a degraded state.
The Illustrated Children’s Guide to Kubernetes | CNCF
Brought to you by… Written by: Matt Butcher and Karen Chu Illustrated by: Bailey Beougher Illustration of Goldie is based on the Go Gopher designed by Renee…
This page describes the lifecycle of a Pod. Pods follow a defined lifecycle, starting in the Pending phase, moving through Running if at least one of its primary containers starts OK, and then through either the Succeeded or Failed phases depending on whether any container in the Pod terminated in failure.
Like individual application containers, Pods are considered to be relatively ephemeral (rather than durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to run on nodes where they remain until termination (according to restart policy) or deletion.
SkyPilot uses the venerable SQLite for state management. SQLite can handle millions of QPS, and terabytes of data. However, our efforts to scale our Managed Jobs feature ran up against the one downfall of SQLite: many concurrent writers. Since SkyPilot typically runs as a CLI on your laptop, we wanted to stick with SQLite, so we decided to figure out how we can make it work. We were very surprised with some of our findings.
#43 Oops, I Deployed It Again: Learning from Our Continuous Deployment Fails
While we have managed to make our deployments successful most of the time, there is still that little percentage where human nature kicks in. Learn about the most common mistakes we encountered.