Authenticated Transfer (atp)
Infrastructure
What Stranger Things Can Teach Us About API Architecture | Nordic APIs |
Netflix’s Stranger Things outages show how distributed API architectures fail under synchronized demand and what API teams can learn.
Demystifying user journeys: Revolutionizing troubleshooting with auto tracking
In the dynamic realm of mobile development, understanding user journeys is key to effective troubleshooting. This blog delves into how Grab's innovative AutoTrack SDK has revolutionized session tracking. By addressing the challenges of incomplete user journey data, Grab has significantly reduced downtime, boosted customer satisfaction, and enhanced developer efficiency.
Instagram Architecture: 14 Million users, Terabytes of Photos, 100s of Instances, Dozens of Technologies - High Scalability -
Instagram is a free photo sharing and social networking service for your iPhone that has been an instant success. Growing to 14 million users in just over a year, they reached 150 million photos in August while amassing several terabytes of photos, and they did this with just 3 Instaneers, all on the Amazon stack.
The Instagram team has written up what can be considered the canonical description of an early stage startup in this era: What Powers Instagram: Hundreds of Instances, Dozens of Techn
The Architecture Twitter Uses to Deal with 150M Active Users, 300K QPS, a 22 MB/S Firehose, and Send Tweets in Under 5 Seconds - High Scalability -
Toy solutions solving Twitter’s “problems” are a favorite scalability trope. Everybody has this idea that Twitter is easy. With a little architectural hand waving we have a scalable Twitter, just that simple. Well, it’s not that simple as Raffi Krikorian, VP of Engineering at Twitter, describes in his superb and very detailed presentation on Timelines at Scale. If you want to know how Twitter works - then start here.
It happened gradually so you may have missed it, but Twitter has grown up. It
Flickr Architecture - High Scalability -
Update: Flickr hits 2 Billion photos served. That's a lot of hamburgers.
Flickr is both my favorite bird and the web's leading photo sharing site. Flickr has an amazing challenge, they must handle a vast sea of ever expanding new content, ever increasing legions of users, and a constant stream of new features, all while providing excellent performance. How do they do it?
Site: http://www.flickr.com
Information Sources
Flickr and PHP (an early document)
Capacity Planning for LAMP
Fede
EP190: Cloudflare vs. AWS vs. Azure
Cloudflare is much more than just a CDN and DDoS protection service. Let’s do a quick comparison of Cloudflare, AWS, and Azure.
Disasters I've seen in a microservices world, part II
When I first wrote about microservice disasters, I thought we'd eventually "solve" them, with better tooling, frameworks, and operational maturity. We didn't. We just learned to live with the chaos. Distributed systems will always surprise you: timeouts, retries, and fallacies don't disappear; they just shift shape. Maybe that's the re...
Disasters I've seen in a microservices world
When Martin Fowler's post about microservices came out in 2014, the teams where I worked were already building service-oriented architectures. That post and the subsequent hype made their way into almost every software team in the world. The "Netflix OSS stack" was the coolest thing back then, allowing engineers worldwide to leverage N...
How to Design a Rate Limiter (A Complete Guide for System Design Interviews)
A real-world look at how to build scalable rate limiters — from simple array-based approaches to distributed systems design — and how to answer this popular interview question with confidence.
Why we moved from AWS to Vercel
Our motivation and experience migrating a compute-intensive financial planning engine from AWS to Vercel
Switching from Docker to Podman
Podman offers better security, uses fewer resources, and integrates seamlessly with Linux and Kubernetes, making it a superior Docker alternative
How I solved a distributed queue problem after 15 years | DBOS
Learn how queues make horizontal scaling, scheduling, and flow control easier in cloud systems, and how to make them durable and observable.
Idempotency in System Design: Full example - Lukas Niessen - Medium
Idempotence is a concept frequently mentioned in system design. I will explain what it means in simple terms, briefly address common misunderstandings, and finish with a full example. In other words…
Revere proxy deep dive
A Deep Dive into Reverse Proxy
Writing Load Balancer From Scratch In 250 Line of Code
Hey, everyone. It's another weekend, and I was exploring what to build. So I decided to build a simple yet completely functional load balancer. Let's discuss it in this post.
Techniques for handling failure scenarios in microservice architectures
This article explores the strategies for managing failure scenarios in microservice architectures. It covers techniques to address both technical glitches and business impacts. You will learn how organizations can build fault-tolerant systems that are capable of gracefully handling cascading failures while maintaining core functionalities, even in a degraded state.
How We Survived 10k Requests a Second: Switching to Signed Asset URLs in an Emergency - Hardcover Blog
A long day of debugging led to a better solution: Signed URLs.
OCI for Contracts | Decombine Blog
Ship contracts like software
The Illustrated Children’s Guide to Kubernetes | CNCF
Brought to you by… Written by: Matt Butcher and Karen Chu Illustrated by: Bailey Beougher Illustration of Goldie is based on the Go Gopher designed by Renee…
Cloud design patterns, architectures, and implementations - AWS Prescriptive Guidance
Technical documentation, architecture best practices, and reference implementations for commonly used cloud design patterns.
Use API Gateway Lambda authorizers - Amazon API Gateway
Enable an Amazon API Gateway Lambda authorizer to authenticate API requests.
Pod Lifecycle
This page describes the lifecycle of a Pod. Pods follow a defined lifecycle, starting in the Pending phase, moving through Running if at least one of its primary containers starts OK, and then through either the Succeeded or Failed phases depending on whether any container in the Pod terminated in failure.
Like individual application containers, Pods are considered to be relatively ephemeral (rather than durable) entities. Pods are created, assigned a unique ID (UID), and scheduled to run on nodes where they remain until termination (according to restart policy) or deletion.
AWS Decision Guides
Abusing SQLite to Handle Concurrency
SkyPilot uses the venerable SQLite for state management. SQLite can handle millions of QPS, and terabytes of data. However, our efforts to scale our Managed Jobs feature ran up against the one downfall of SQLite: many concurrent writers. Since SkyPilot typically runs as a CLI on your laptop, we wanted to stick with SQLite, so we decided to figure out how we can make it work. We were very surprised with some of our findings.
Install MongoDB Community Edition on macOS - MongoDB Manual v8.0
Install MongoDB Community Edition on macOS using the Homebrew package manager.
An illustrated guide to Amazon VPCs
In this section, I talk about why VPCs were invented and how they work.
Pessimism-Driven Development: Embracing Murphy’s Law in Cloud Architecture
In the world of cloud computing, particularly with serverless architectures like AWS Lambda and distributed databases like DynamoDB, the…
Crash Course on Load Balancing Algorithms
A device or service that distributes traffic across multiple servers or microservices.
#43 Oops, I Deployed It Again: Learning from Our Continuous Deployment Fails
While we have managed to make our deployments successful most of the time, there is still that little percentage where human nature kicks in. Learn about the most common mistakes we encountered.