System Architecture

System Architecture

7428 bookmarks
Custom sorting
John Battelle's Search Blog Does Web3 Matter To Marketers?
John Battelle's Search Blog Does Web3 Matter To Marketers?
Over at LinkedIn I’ve published a short piece on Web3 – a primer of sorts for the many marketing pals who’ve asked me “does this shit matter!?”. As I do with everythin…
·battellemedia.com·
John Battelle's Search Blog Does Web3 Matter To Marketers?
What is Caching and Why Cache Invalidation is Hard
What is Caching and Why Cache Invalidation is Hard
Phil Karlton, an accomplished engineer who was an Architect at Netscape famously said the following which also happens to be my favorite quote: There are only two hard things in Computer Science: cache invalidation and naming things - Phil Karlton When I first heard this quote during college, I was reading Code Complete 2 and it had an entire chapter on Naming Variables, if I remember correctly. From my nascent programming knowledge and helping my friends with their CS assignments, I knew that naming things is indeed hard (I’m not good at it either. You can always find me refactoring and renaming things.) I wasn’t sure what cache invalidation meant but I knew then that there’s at least half truth to this quote. Later, through my years of experience and running into cache problems, I’d learn how much truth there’s to the entire quote. Caching is present everywhere from the lowest to highest levels: There are hardware caches inside your processor cores (L1, L2, L3), Page/Disk cache that our Operating Systems Caches for databases such as using MemCached, Redis or DAX for DynamoDB API caches Layer-7 (Application layer) HTTP caches like Edge level caching in CDNs DNS caching Cache in your browser Microservices can have internal caches to improve their performance for complex and time consuming operations You are reading this post thanks to many intermediary caches I can keep going but you get the point: Caching is ubiquitous. Which begs the question, why do we need caching? Before you scroll down for the answer, take a few seconds to think about the answer. What is Caching? A cache is a fast data storage layer for storing a subset of data on a temporary basis for a duration of time. Caches are faster than original sources of data so they speed up future data retrievals by accessing the data in cache as opposed to fetching it from the actual storage location. Caches also make data retrievals efficient by avoiding complex or resource intensive operations to compute the data. Why We Need Caching? Typically, there are two main reasons for caching data: We cache things when the cost of generating some information is high (resource intensive) and we don’t need fresh information each time. We can calculate the information once, and then store it for a period of time and return the cached version to the users. Arguably the top reason why we use caching is to speed up data retrieval. Caches are faster than original sources of data and cached information can be retrieved quickly resulting in faster responses to users. Caching Example Let’s look at an example. Suppose we have a webpage that displays “Related Content” links on the sidebar. This related content is generated by machine learning algorithms by processing large volumes of data in the main database, and can take several seconds to compute. This is a complex and resource intensive operation: each user request has to calculate this information. For popular pages on the website, a significant amount of time and resources will be spent computing the same data over and over again. Impact: Increased load on backend servers and databases, and higher cloud infrastructure costs. Generating “Related Links” takes time and holds up the final response that’s sent to users. Impact: The response times increase that hurt user experience and page performance metrics such as the Core Web Vitals that search engines use. To address both these issues, we can use a “Cache”. We can computed the Related Links once, then store it in the cache and return the cached copy for several hours or even days. The next time the data is requested, rather than performing a costly operation and waiting for several seconds for it to complete, the result can be fetched from cache and returned to users faster. (This type of caching strategy is called Cache Aside.) pWhy we use caching? Because it speeds up information delivery and reduces the cost of calculating that information over and over again./p This is just one example illustrating just how super useful caching thing is. Caches save costs, scale heavy workloads, and reduce latency. But like all good things, there’s a catch or rather a trade-off. Cache Invalidation and Why it is Hard Cache invalidation is the process of marking the data in the cache as invalid. When the next request arrives, the corresponding invalid data must be treated as a cache-miss, forcing it to be generated from the original source (database or service.) Determining when to invalidate cache is a hard problem. Caches are not the source of truth for your data. That’d be your database (or a service.) The problem happens when the data in your database (source of truth) changes, leaving invalid data in the cache. If the data is the cache is not invalidated, you’ll get inconsistent, conflicting or incorrect information. Let revisit our earlier example of caching “Related Content” links (links to other related pages for a webpage.) Suppose one of the linked pages is no longer present in the system (it was deleted by the user or removed by admin.) Because we were caching these links, some bad things can happen: your users get an error (HTTP 400) when they click on a link if your application were processing those links, it will encounter errors and it could break the system In distributed systems with several inter-connected caches, invalidation becomes even more difficult thanks to dependencies, race conditions and invalidating all the caches that need to be updated. Distributed caching has its own challenges at scale and some complex systems like Facebook’s Tao use cache leaders for handling invalidations for all data under their shards. Heck, it is easy to run into cache issues during the course of normal software development. Modern CPUs have several cores and each has its own cache (L1) that’s periodically synced with the main memory (RAM). In the absence of proper synchronization, values stored in variables on one thread may not be visible to threads. For example: foo = 2; In Java, the JVM might update the value of foo in the local cache and not commit the result to memory. A thread running on another core may see a stale value for foo. (This is one of the primary reasons why writing multithreading applications is hard.) In summary, caching is a super useful technique. But it can easily go wrong if we are not careful. When using a cache, it’s important to understand how and when to invalidate it and to build proper invalidation processes. When to Not Use a Cache Caches are not always the right choice. They may not add any value and in some cases, may actually degrade performance. Here are some questions you need to answer to determine if you need a cache or not. The original source of data is slow (e.g. a query that does complex JOINs in a relational database.) Computing the data is resource intensive . The data doesn’t need to change for each request (e.g. caching real-time sensor data that your car needs when it’s in the self-driving mode or live medical data from patients… not good ideas.) The operation to fetch the data must not have any side-effects (e.g. a Relational DB Transaction that fetches data and updated KPI counters is not a good caching candidate due to side-effect of updating counters.) The data is frequently accessed and needed more than once. Good cache hit:miss ratio and total cost of cache misses. For example, suppose I put a cache for user requests as they come in and it takes 10 ms to check if the data exists in the cache or not, vs the original time of 60 ms. If only 5% of requests are cached, I’m adding an additional 10ms to 95% of the requests that result in a cache-miss. Doing rough calculations, we can see that cache is actually hurting performance: Before cache: 1,000,000 requests * 60 milliseconds per request = 60,000,000 milliseconds total After cache: (0.05 * 1,000,000 * 10) + (0.95 * 1,000,000 * (60 + 10) ) = 67,000,000 milliseconds total (Each cache miss results in 60+10 millisecond) That’s poorer than using no cache.
·codeahoy.com·
What is Caching and Why Cache Invalidation is Hard
What Good Security Looks Like in a Cloudy World
What Good Security Looks Like in a Cloudy World
A look at why continuous security will ultimately be the enabler of high-velocity engineering for forward-thinking engineering teams.
·thenewstack.io·
What Good Security Looks Like in a Cloudy World
Platform as a Product: True DevOps?
Platform as a Product: True DevOps?
You need to build an internal developer platform, especially for complex brownfield setups, but you don't have to start from scratch.
·thenewstack.io·
Platform as a Product: True DevOps?
Software Supply Chains Require Immutable Databases
Software Supply Chains Require Immutable Databases
An immutable database that makes it impossible to update code without the maintainers of an application code base knowing about it.
·thenewstack.io·
Software Supply Chains Require Immutable Databases
Fewer, happier incident heroes.
Fewer, happier incident heroes.
My wife was kind enough to review a piece I’m writing about incident response, and brought up a related topic that I couldn’t fit into that article but is infinitely fascinating: how should companies respond to a small group of engineers who become critical responders in almost all incidents? This happens at many companies, usually along the lines of: a few long-tenured engineers, who happen to have the explicit access credentials to all key systems and implicit permission to use them, help respond to almost all incidents.
·lethain.com·
Fewer, happier incident heroes.
Storage and the Supercloud – Blocks and Files
Storage and the Supercloud – Blocks and Files
Analyst firm Wikibon has defined a new layer of technology – the Supercloud – and storage has a role within it.
·blocksandfiles.com·
Storage and the Supercloud – Blocks and Files
Introduction | Yggdrasil
Introduction | Yggdrasil
End-to-end encrypted IPv6 networking to connect worlds
·yggdrasil-network.github.io·
Introduction | Yggdrasil
Key enablers for mass IoT adoption
Key enablers for mass IoT adoption
At 'The Things Conference ' in Amsterdam in September, Roman Nemish, Co-Founder & President of TEKTELIC presented a critical view of differe...
·blog.3g4g.co.uk·
Key enablers for mass IoT adoption
Fingerprinting systems with TCP source-port selection
Fingerprinting systems with TCP source-port selection
Back in May 2022, a mysterious set of patches titled insufficient TCP source port randomness crossed the mailing lists and was subsequently merged (at -rc6) into the 5.18 kernel. Little information was available at the time about why significant changes to the networking stack needed to be made so late in the development cycle. That situation has finally changed with the publication of this paper by Moshe Kol, Amit Klein, and Yossi Gilad. It seems that the way the kernel chose port numbers for outgoing network connections made it possible to uniquely fingerprint users.
·lwn.net·
Fingerprinting systems with TCP source-port selection
A Self-Authenticating Social Protocol
A Self-Authenticating Social Protocol
Bluesky’s mission is to drive the evolution from platforms to protocols. The conceptual framework we've adopted for meeting this objective is the "self-authenticating protocol."
·blueskyweb.xyz·
A Self-Authenticating Social Protocol
Bluesky
Bluesky
·blueskyweb.xyz·
Bluesky
Connect a Smart Contract to the Twitter API
Connect a Smart Contract to the Twitter API
Learn how to build smart contracts that interact with Twitter’s API and trigger tweets via Chainlink’s oracle network.
·blog.chain.link·
Connect a Smart Contract to the Twitter API
HTTPS Outcalls | Internet Computer Home
HTTPS Outcalls | Internet Computer Home
The HTTPS Outcalls feature allows the Internet Computer to make HTTPS requests in a distributed and secure manner, all approved by consensus. Oracles are now a thing of the past.
·internetcomputer.org·
HTTPS Outcalls | Internet Computer Home
Internet Computer Home | Internet Computer Home
Internet Computer Home | Internet Computer Home
Deploy smart contracts and build scalable dapps on the Internet Computer - the world’s fastest and most powerful open-source blockchain network
·internetcomputer.org·
Internet Computer Home | Internet Computer Home
5 Years of Postgres on Kubernetes
5 Years of Postgres on Kubernetes
Building a highly available, self-healing Postgres cluster is hard. You have to think about things like backups, load balancing across databases, metrics, database hosts changing, storage and correctly sizing all of these services.
·thenewstack.io·
5 Years of Postgres on Kubernetes
Runno
Runno
Make your code samples Runno.
·runno.dev·
Runno
Rust in the Linux Kernel
Rust in the Linux Kernel
Why it's all happening for the Rust programming language, how it made it into the Linux kernel, and where it will go from here.
·thenewstack.io·
Rust in the Linux Kernel
Why Traditional Logging and Observability Waste Developer Time
Why Traditional Logging and Observability Waste Developer Time
The ability to jump directly to a specific line of code that caused an error, without restarting, redeploying or adding more code, is where the magic happens in shift-left observability.
·thenewstack.io·
Why Traditional Logging and Observability Waste Developer Time