Using Prometheus to Avoid Disasters with Kubernetes CPU Limits | Amazon Web Services
“Sir, your application is continually getting throttled,” I repeated. The highly skilled team that I was brought in to help with an outage was in disbelief. They had been using the same limits configuration in production for over two years. Yet, the Grafana chart was definitive: CPU throttling was causing the outage they were currently […]
Creating Kubernetes Auto Scaling Groups for Multiple Availability Zones | Amazon Web Services
Kubernetes is a scalable container orchestrator that helps you build fault-tolerant, cloud native applications. It can handle automatic container placement, scale up and down, and provision resources for your containers to run. While Kubernetes can take care of many things, it can’t solve problems it doesn’t know about. Usually these are called unknown unknowns and […]
How we reduced 502 errors by caring about PID 1 in Kubernetes
For every deploy, scale down event, or pod termination, users of GitLab's Pages service were experiencing 502 errors. This explains how we found the root cause and rolled out a fix for it.
Seamlessly migrate workloads from EKS self-managed node group to EKS-managed node groups | Amazon Web Services
Amazon Elastic Kubernetes Service (Amazon EKS) managed service makes it easy to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane. When Amazon EKS was made generally available in 2018, it supported self-managed node groups. With self-managed node groups, customers are responsible for configuring the Amazon Elastic Compute […]