Found 55296 bookmarks
Newest
Last Week in Kubernetes Development - Week Ending October 12 2025
Last Week in Kubernetes Development - Week Ending October 12 2025

Week Ending October 12, 2025

https://lwkd.info/2025/20251015

Developer News

The ballots for the Steering Committee Elections are due on October 24th. If you haven’t already, submit your Steering votes. If you have contributed to Kubernetes in the last year but haven’t met the eligibility requirements, you will need to submit an exception request to vote in the steering election, the deadline for which is October 22nd.

The CFP for Maintainer Summit: KubeCon + CloudNativeCon Europe 2026 is open. Please send in your submissions before 14th December 2025.

SIG-Testing is continuing to improve alpha/beta feature coverage, including moving kind-beta-features to release blocking and several other beta jobs to release-informing.

Release Schedule

Next Deadline: Docs Deadline for placeholder PRs, October 23

We are in PRR freeze. Enhancements Freeze will begin this week (16th October). If you are going to miss the deadline, please file an Exception.

Patch releases have been delayed until 22nd October.

Featured PRs

134433 : kubeadm print errors during control-plane-wait retries

This PR improves troubleshooting during control plane startup by ensuring that errors encountered while waiting for control plane components are printed during each retry at log verbosity level 5. Previously, these errors were not shown, which made it harder to identify issues when components failed to become ready. With this change, administrators can now see the actual errors without additional steps, making failure causes more visible and debugging faster.

KEP of the Week

KEP-4622: New TopologyManager Policy which configure the value of maxAllowableNUMANodes

This KEP introduces a new TopologyManager policy option called max-allowable-numa-nodes, allowing users to configure the maximum number of NUMA nodes supported by the TopologyManager. Previously, this value was hardcoded to 8 as a temporary measure to prevent state explosion. By making it configurable, the KEP enables better support for high-end CPUs with more than 8 NUMA nodes, without changing existing TopologyManager policies or addressing broader resource management aspects.

This KEP is tracked as stable in v1.35

Other Merges

Enforce valid label-key format in device tolerations

Add declarative validation and path normalization for ResourceClaim fields

Remove runtime gogo protobuf dependencies from Kubernetes API types

Fix IPv6 allocator for /64 CIDRs

Add -n shorthand flag for kubectl config set-context

Add k8s.update flag to enable validation rules just for updates

Prevent panic when creating an invalid CronJob schedule

Stop calling --chunk-size beta, it’s been around since 2017

Make sure that the eviction controller knows about NoExecute device tolerations

APIApprovalController can run with contextual logging

kubeadm: show control plane retry errors

ResourceClaim: ensure that fields don’t exceed list limits, that shareID is validated, and that it supports the immutable tag and long name format

Add test for endpoint/endpointslice headless label propagation

Maybe don’t let folks create ResourceQuotas with request > limit

kubectl gets -n shorthand for --namespace

Set FeatureGates simultaneously during tests to avoid dependency problems

DeviceRequests exactly and firstAvailable shortcut some logic

Refactor away most of the dependencies on the unmaintained gogo protobuf library

Allocate within IPv6 subnets correctly

resource.k8s.io v1 API is now the default

Prometheus client can handle deprecated/missing metrics

APIserver will abort startup due to invalid CA configuration

Subprojects and Dependency Updates

headlamp v0.36.0 adds EndpointSlice support, label-based search, and clipboard copy for resource names.

cloud-provider-openstack v1.34.1 updates test dependencies and fixes build-script issues across OCCM and CSI plugins. Multiple Helm charts were also updated.

csi-driver-nfs v4.12.1 updates CSI release tools and documentation for NFS volumes.

csi-driver-smb v1.19.1 updates CSI release tooling and improves maintenance scripts.

kubespray v2.29.0 adds new configuration options, supports Kubernetes v1.33.1 and Debian 13 Trixie, and upgrades major components

prometheus v3.7.0 adds experimental anchored and smoothed rate functions, introduces NHCB, improves rule evaluation and TSDB logging, and deprecates several remote-write metrics.

via Last Week in Kubernetes Development https://lwkd.info/

October 15, 2025 at 06:00PM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending October 12 2025
AWS Deprecates Two Dozen Services (Most of Which You've Never Heard Of)
AWS Deprecates Two Dozen Services (Most of Which You've Never Heard Of)
AWS has done its quarterly housecleaning / "Googling" of its services, and deprecated what appears at first glance to be a startlingly long list. However, going through them put my mind at ease, and I'm hoping this post can do the same for you.
·lastweekinaws.com·
AWS Deprecates Two Dozen Services (Most of Which You've Never Heard Of)
The Data Engineer's guide to optimizing Kubernetes with Niels Claeys
The Data Engineer's guide to optimizing Kubernetes with Niels Claeys

The Data Engineer's guide to optimizing Kubernetes, with Niels Claeys

https://ku.bz/hGRfkzDJW

Niels Claeys shares how his team at DataMinded built Conveyor, a data platform processing up to 1.5 million core hours monthly. He explains the specific optimizations they discovered through production experience, from scheduler changes that immediately reduce costs by 10-15% to achieving 97% spot instance usage without reliability issues.

You will learn:

Why the default Kubernetes scheduler wastes money on batch workloads and how switching from "least allocated" to "most allocated" scheduling enables faster scale-down and better resource utilization

How to achieve 97% spot instance adoption through strategic instance type diversification, region selection, and Spark-specific techniques

Node pool design principles that balance Kubernetes overhead with workload efficiency

Platform-specific gotchas like AWS cross-AZ data transfer costs that can spike bills unexpectedly

Sponsor

This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/hGRfkzDJW

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 14, 2025 at 02:00AM

·kube.fm·
The Data Engineer's guide to optimizing Kubernetes with Niels Claeys
DevOps & AI Toolkit - Why Your Infrastructure AI Sucks (And How to Fix It) - https://www.youtube.com/watch?v=Ma3gKmuXahc
DevOps & AI Toolkit - Why Your Infrastructure AI Sucks (And How to Fix It) - https://www.youtube.com/watch?v=Ma3gKmuXahc

Why Your Infrastructure AI Sucks (And How to Fix It)

Discover why your AI agent is completely failing at infrastructure management and learn to build an AI-powered Internal Developer Platform that actually works. Most organizations are treating AI like a search engine, asking vague questions and getting generic answers that break in production. This video reveals the five critical components that transform useless AI into intelligent infrastructure automation.

You'll learn to build capabilities discovery using Vector databases for semantic search across Kubernetes resources, capture organizational patterns from tribal knowledge and documentation, create enforceable policies that guide AI toward compliance, implement proper context management to avoid the bloated mess most systems become, and design intelligent workflows that guide users to the right solutions instead of relying on guesswork. Watch as we demonstrate the complete transformation from a generic AI response to a fully functional PostgreSQL deployment that follows organizational patterns, enforces compliance policies, and deploys correctly the first time.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Tuple 🔗 https://tuple.app/DOT 👉 Promo code: DOT2025 ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

AIInfrastructure #InternalDeveloperPlatform #KubernetesAI

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/internal-developer-platforms/why-your-infrastructure-ai-sucks-and-how-to-fix-it 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai 🎬 Stop Blaming AI: Vector DBs + RAG = Game Changer: https://youtu.be/zqpJr1qZhTg 🎬 Why Kubernetes Discovery Sucks for AI (And How Vector DBs Fix It): https://youtu.be/MSNstHj4rmk

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 AI for Infrastructure Challenges 01:42 Tuple (sponsor) 03:16 Why Your AI Agent Is Useless 09:52 Kubernetes API Discovery That Actually Works 13:41 Organizational Knowledge AI Can Actually Use 17:49 Stop Breaking Production With AI 22:17 The Context Window Disaster Nobody Talks About 25:16 Smart Conversations That Get Results 29:34 Your Complete AI-Powered IDP Blueprint

via YouTube https://www.youtube.com/watch?v=Ma3gKmuXahc

·youtube.com·
DevOps & AI Toolkit - Why Your Infrastructure AI Sucks (And How to Fix It) - https://www.youtube.com/watch?v=Ma3gKmuXahc
The Making of Flux: The Scale a KubeFM Original Series
The Making of Flux: The Scale a KubeFM Original Series

The Making of Flux: The Scale, a KubeFM Original Series

https://ku.bz/tWcHlJm7M

In this episode, Philippe Ensarguet, VP of Software Engineering at Orange, and Arnab Chatterjee, Global Head of Container & AI Platforms at Nomura, share how large enterprises are adopting Flux to drive reliable, compliant, and scalable platforms.

How Orange uses Flux to manage bare-metal Kubernetes through its SYLVR project.

Why Nomura relies on GitOps to balance agility with governance in financial services.

How Flux helps enterprises achieve resilience, compliance, and repeatability at scale.

Sponsor

Join the Flux maintainers and community at FluxCon, November 11th in Atlanta—register here

More info

Find all the links and info for this episode here: https://ku.bz/tWcHlJm7M

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 13, 2025 at 06:00AM

·kube.fm·
The Making of Flux: The Scale a KubeFM Original Series
Last Week in Kubernetes Development - Week Ending October 5 2025
Last Week in Kubernetes Development - Week Ending October 5 2025

Week Ending October 5, 2025

https://lwkd.info/2025/20251010

Developer News

Joaquim Rocha has been nominated to be one of the new SIG UI leads. Congrats Joaquim!

Folks are discussing the deprecation of cgroups v1. Find the whole discussion in the mailing list here

There are some updates in the release informing and blocking jobs to improve alpha/beta coverage. Find the full list of jobs moved to release informing and blocking status here

Release Schedule

Next Deadline: Enhancements Freeze, October 16

All enhancements are expected to have met the requirements by the freeze. Those that don’t meet the requirements will be removed from the milestone and will require an Exception.

Kubernetes v1.35.0-alpha.1 is out!

The cherry-pick deadline for patch releases is Oct 10.

Steering Committee Election

The Steering Committee Election voting ends on Friday, 24th October, AoE. You can check your eligibility to vote in the voting app, and file an exception request by October 22 if you need an exception. Don’t forget to cast your votes if you haven’t already!

Featured PRs

133697: Codify feature gate dependencies

With this PR, feature gate dependencies can be explicitly declared and enforced. This has been ad-hoc or implicit in the past. Components will now refuse to start if a feature is enabled without its required dependencies. Feature Owners should review the backfilled dependencies, while users who manually toggle feature gates must ensure dependent features are also enabled—especially noting that AllAlpha=true now requires AllBeta=true or equivalent beta features to be set.

KEP of the Week

KEP 859: Include kubectl command metadata in http request headers

This KEP aims to add extra HTTP headers to kubectl requests sent to the Kubernetes apiserver. These headers would share details such as which kubectl command was used, the flags included, a session ID, and whether the command is deprecated. This would help cluster administrators understand how users interact with the cluster, making it easier to debug issues, track usage, and gather insights, without exposing any sensitive data.

This KEP is tracked for GA in v1.35

Other Merges

Disable SchedulerAsyncAPICalls feature gate to prevent scheduler performance issues under high API server load.

Add path normalization to error matcher for improved field validation.

DeviceClass now enforces a maximum of 32 selectors and configs via declarative validation.

Add declarative validation +k8s:maxItems tag to ResourceClaim

HPA controller now exposes desired_replicas metric to track scaling history.

Fix preemptor pod behavior to prevent endless scheduling loops during slow victim deletion.

Feature gate dependencies are now explicit and validated at startup, preventing enabling a feature if its dependencies are disabled.

kube-scheduler introduces lightweight AssumeCache in VolumeBinding plugin to fix occasional pod scheduling delays.

Version Updates

etcd to v3.6.5

Subprojects and Dependency Updates

cluster-api v1.11.2 extends Kubernetes support to v1.34 for both management and workload clusters, adds CoreDNS migration v1.0.28, and introduces Metal3 as an IPAM provider.

cluster-api v1.10.7 adds Kubernetes v1.33 compatibility and updates CoreDNS migration to v1.0.28.

coredns v1.13.1 updates Go to v1.25.2 to address security issues, improves performance, and enhances the sign plugin by rejecting invalid UTF-8 tokens.

coredns v1.13.0 introduces a new Nomad plugin, fixes Corefile loop and import issues, improves shutdown handling, and hardens gRPC and reload behavior.

containerd API v1.10.0-beta.1 adds a mount manager and aligns with containerd 2.2 APIs (pre-release).

kOps v1.34.0-beta.1 updates AWS and Azure components (VPC CNI v1.20.2, Cilium v1.18.2, Calico v3.30.3), upgrades etcd to v3.6.5, drops Canal support, and removes Kubernetes 1.28 compatibility.

autoscaler vertical-pod-autoscaler v1.5.1 updates the default VPA version and client-go dependency to improve stability.

autoscaler cluster-autoscaler-chart v0.1.1 introduces automatic resource adjustment for workloads through Helm.

csi-driver-nfs v4.12.0 updates Go to 1.24, fixes a goroutine leak, and adds support for creating multiple storage classes with Helm.

csi-driver-smb v1.19.0 improves secret handling with special characters, updates CSI sidecars and resizer to v1.14.0, and adds Helm support for multiple storage classes.

headlamp v0.36.0 adds support for EndpointSlice resources, label-based search, and clipboard copy for resource names. It improves table sorting memory, standardizes resource naming, and enhances Helm charts with optional PodDisruptionBudget, backend TLS termination, and security context updates. The release also fixes several UI issues, improves plugin management, and updates shipped Prometheus and App Catalog plugins.

Shoutouts

Drew Hagen – I’d like to take a moment to acknowledge @Matteo for the seriously impressive leadership of a newer release branch management shadow program for the 1.34 release, and all the amazing work putting together strong documentation for branch management!! I remember my experience releasing alpha 3 being very clear what to do and going really smooth. Very little tribal knowledge. And we did most releases async, which I think speaks to how strong this handbook is. I thank you for still being around to observe and help, even if it meant some later nights in your time zone. @xmudrii @jimangel Great work! Y’all have set the foundation for many more cycles to come. Thank you for all of your patience, guidance and support. It was really great learning and working with you all @Angelos Kolaitis @satyampsoni

via Last Week in Kubernetes Development https://lwkd.info/

October 10, 2025 at 09:03AM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending October 5 2025
SYNOLOGY SUPPORT SEAGATE & WD AGAIN - TOO LITTLE, TOO LATE?
SYNOLOGY SUPPORT SEAGATE & WD AGAIN - TOO LITTLE, TOO LATE?
Synology (FINALLY) Gives In to 3rd Party HDD Support in 2025 PLUS Series NAS 7/10/25 - Updated with information supplied by Synology on how verifications and product ranges will support different HDD/SSD in DSM 7.3 Of all the stories of 2025, very few had the level of impact on the NAS industry th
·nascompares.com·
SYNOLOGY SUPPORT SEAGATE & WD AGAIN - TOO LITTLE, TOO LATE?
Web KAT Attack! Launch Trailer
Web KAT Attack! Launch Trailer
Our first game built with Godot. Web-KAT Attack a straight forward hi-score attack Twin-Stick shooter available now on itch.io: https://thehungrybuppis.itch....
·youtube.com·
Web KAT Attack! Launch Trailer
Red Hat GitLab Data Breach: The Crimson Collective's Attack
Red Hat GitLab Data Breach: The Crimson Collective's Attack
This breach exposed 570GB of data from 28,000 repositories, affecting 800+ organizations. Crimson Collective leaked Customer Engagement Reports containing credentials, API keys, and infrastructure details from major enterprises.
·blog.gitguardian.com·
Red Hat GitLab Data Breach: The Crimson Collective's Attack
CHAOSScast Episode 120: Practitioner Guides: #5 Demonstrating Organizational Value
CHAOSScast Episode 120: Practitioner Guides: #5 Demonstrating Organizational Value
In this episode of CHAOSScast, Harmony Elendu hosts a discussion with Dawn Foster and Bob Killen to discuss their extensive experience in open source and detail the motivations behind the creation of the CHAOSS Practitioner Guides. These guides aim to help practitioners navigate the overwhelming amount of data related to open source projects and understand how to improve project health and sustainability. The discussion covers strategies for communicating the business value of open source efforts to leadership, framing contributions in a way that resonates with organizational priorities, and prioritizing investments in critical projects. Press download now!
·podcast.chaoss.community·
CHAOSScast Episode 120: Practitioner Guides: #5 Demonstrating Organizational Value
DevOps & AI Toolkit - Ep36 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=iZoTwl8BWCI
DevOps & AI Toolkit - Ep36 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=iZoTwl8BWCI

Ep36 - Ask Me Anything About Anything

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=iZoTwl8BWCI

·youtube.com·
DevOps & AI Toolkit - Ep36 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=iZoTwl8BWCI
Asked to do something illegal at work? Here’s what these software engineers did
Asked to do something illegal at work? Here’s what these software engineers did
At FTX, Frank, and Pollen, software engineers were asked to do something potentially illegal, or to go along with what looked like fraud. They obliged in two out of three cases, landed in hot water, and now face jail time. A reminder why it’s never a good idea to go along with such requests.
·blog.pragmaticengineer.com·
Asked to do something illegal at work? Here’s what these software engineers did
How We Integrated Native macOS Workloads with Kubernetes with Vitalii Horbachov
How We Integrated Native macOS Workloads with Kubernetes with Vitalii Horbachov

How We Integrated Native macOS Workloads with Kubernetes, with Vitalii Horbachov

https://ku.bz/q_JS76SvM

Vitalii Horbachov explains how Agoda built macOS VZ Kubelet, a custom solution that registers macOS hosts as Kubernetes nodes and spins up macOS VMs using Apple's native virtualization framework. He details their journey from managing 200 Mac minis with bash scripts to a Kubernetes-native approach that handles 20,000 iOS tests at scale.

You will learn:

How to build hybrid runtime pods that combine macOS VMs with Docker sidecar containers for complex CI/CD workflows

Custom OCI image format implementation for managing 55-60GB macOS VM images with layered copy-on-write disks and digest validation

Networking and security challenges including Apple entitlements, direct NIC access, and implementing kubectl exec over SSH

Real-world adoption considerations including MDM-based host lifecycle management and the build vs. buy decision for Apple infrastructure at scale

Sponsor

This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/q_JS76SvM

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 07, 2025 at 06:00AM

·kube.fm·
How We Integrated Native macOS Workloads with Kubernetes with Vitalii Horbachov
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/

Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.

Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.

The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.

The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.

Map view of Karpenter Resources and how they relate to Kubernetes resources

Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Visualization of Karpenter Metrics

Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .

Scaling decisions

Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.

Config editor with validation support

Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.

Real time view of Karpenter resources

View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.

Dashboard for Pending Pods

View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Karpenter Providers

This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.

Provider Name

Tested

Extra provider specific info supported

AWS

Azure

AlibabaCloud

Bizfly Cloud

Cluster API

GCP

Proxmox

Oracle Cloud Infrastructure (OCI)

Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).

How to use

Please see the plugins/karpenter/README.md for instructions on how to use.

Feedback and Questions

Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.

via Kubernetes Blog https://kubernetes.io/

October 05, 2025 at 08:00PM

·kubernetes.io·
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
DevOps & AI Toolkit - Kubernetes Controllers Deep Dive: How They Really Work - https://www.youtube.com/watch?v=kss081c8EqY
DevOps & AI Toolkit - Kubernetes Controllers Deep Dive: How They Really Work - https://www.youtube.com/watch?v=kss081c8EqY

Kubernetes Controllers Deep Dive: How They Really Work

Most people using Kubernetes know how to write YAML and run kubectl apply, but when things break, they're completely lost. The secret they're missing? Understanding controllers - the beating heart that makes Kubernetes actually work. Controllers are what automatically restart your crashed pods, scale your applications, and make custom resources feel native to the platform.

This video dives deep into the real mechanics of how Kubernetes controllers operate. You'll discover how controllers consume and emit events to coordinate with each other, how the reconciliation loop continuously maintains your desired state, and how the Watch API efficiently streams changes without overwhelming the system. We'll explore custom resource definitions that extend Kubernetes, controller communication patterns, and the event-driven architecture that makes everything self-healing. Whether you're debugging cluster issues or building your own controllers, this knowledge will transform how you think about Kubernetes from just throwing YAML at the wall to truly understanding the orchestration engine underneath.

KubernetesControllers #Kubernetes #DevOpsEngineering

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/kubernetes-controllers-deep-dive-how-they-really-work 🔗 Kubernetes: https://kubernetes.io

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Controllers Deep Dive 01:18 Kubernetes Control Loops Explained 04:12 How Kubernetes Controllers Watch Events 07:35 Kubernetes Event Emission 11:56 Kubernetes Reconciliation Loop 17:12 Kubernetes Watch API 21:01 Kubernetes Custom Resource Definitions (CRDs) 21:13 Kubernetes Controller Communication 25:22 Kubernetes Controllers Mastery

via YouTube https://www.youtube.com/watch?v=kss081c8EqY

·youtube.com·
DevOps & AI Toolkit - Kubernetes Controllers Deep Dive: How They Really Work - https://www.youtube.com/watch?v=kss081c8EqY
The Making of Flux: The Rewrite a KubeFM Original Series
The Making of Flux: The Rewrite a KubeFM Original Series

The Making of Flux: The Rewrite, a KubeFM Original Series

https://ku.bz/bgkgn227-

In this episode, Michael Bridgen (the engineer who wrote Flux's first lines) and Stefan Prodan (the maintainer who led the V2 rewrite) share how Flux grew from a fragile hack-day script into a production-grade GitOps toolkit.

How early Flux addressed the risks of manual, unsafe Kubernetes upgrades

Why the complete V2 rewrite was critical for stability, scalability, and adoption

What the maintainers learned about building a sustainable, community-driven open-source project

Sponsor

Join the Flux maintainers and community at FluxCon, November 11th in Salt Lake City—register here

More info

Find all the links and info for this episode here: https://ku.bz/bgkgn227-

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 06, 2025 at 06:00AM

·kube.fm·
The Making of Flux: The Rewrite a KubeFM Original Series
lasantosr/intelli-shell
lasantosr/intelli-shell
Like IntelliSense, but for shells. Contribute to lasantosr/intelli-shell development by creating an account on GitHub.
·github.com·
lasantosr/intelli-shell
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

https://kubernetes.io/blog/2025/09/23/introducing-headlamp-plugin-for-karpenter/

Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.

Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.

The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.

The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.

Map view of Karpenter Resources and how they relate to Kubernetes resources

Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Visualization of Karpenter Metrics

Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .

Scaling decisions

Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.

Config editor with validation support

Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.

Real time view of Karpenter resources

View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.

Dashboard for Pending Pods

View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Karpenter Providers

This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.

Provider Name

Tested

Extra provider specific info supported

AWS

Azure

AlibabaCloud

Bizfly Cloud

Cluster API

GCP

Proxmox

Oracle Cloud Infrastructure (OCI)

Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).

How to use

Please see the plugins/karpenter/README.md for instructions on how to use.

Feedback and Questions

Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.

via Kubernetes Blog https://kubernetes.io/

September 22, 2025 at 08:00PM

·kubernetes.io·
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility