Found 55141 bookmarks
Newest
The Data Engineer's guide to optimizing Kubernetes with Niels Claeys
The Data Engineer's guide to optimizing Kubernetes with Niels Claeys

The Data Engineer's guide to optimizing Kubernetes, with Niels Claeys

https://ku.bz/hGRfkzDJW

Niels Claeys shares how his team at DataMinded built Conveyor, a data platform processing up to 1.5 million core hours monthly. He explains the specific optimizations they discovered through production experience, from scheduler changes that immediately reduce costs by 10-15% to achieving 97% spot instance usage without reliability issues.

You will learn:

Why the default Kubernetes scheduler wastes money on batch workloads and how switching from "least allocated" to "most allocated" scheduling enables faster scale-down and better resource utilization

How to achieve 97% spot instance adoption through strategic instance type diversification, region selection, and Spark-specific techniques

Node pool design principles that balance Kubernetes overhead with workload efficiency

Platform-specific gotchas like AWS cross-AZ data transfer costs that can spike bills unexpectedly

Sponsor

This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/hGRfkzDJW

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 14, 2025 at 02:00AM

·kube.fm·
The Data Engineer's guide to optimizing Kubernetes with Niels Claeys
DevOps & AI Toolkit - Why Your Infrastructure AI Sucks (And How to Fix It) - https://www.youtube.com/watch?v=Ma3gKmuXahc
DevOps & AI Toolkit - Why Your Infrastructure AI Sucks (And How to Fix It) - https://www.youtube.com/watch?v=Ma3gKmuXahc

Why Your Infrastructure AI Sucks (And How to Fix It)

Discover why your AI agent is completely failing at infrastructure management and learn to build an AI-powered Internal Developer Platform that actually works. Most organizations are treating AI like a search engine, asking vague questions and getting generic answers that break in production. This video reveals the five critical components that transform useless AI into intelligent infrastructure automation.

You'll learn to build capabilities discovery using Vector databases for semantic search across Kubernetes resources, capture organizational patterns from tribal knowledge and documentation, create enforceable policies that guide AI toward compliance, implement proper context management to avoid the bloated mess most systems become, and design intelligent workflows that guide users to the right solutions instead of relying on guesswork. Watch as we demonstrate the complete transformation from a generic AI response to a fully functional PostgreSQL deployment that follows organizational patterns, enforces compliance policies, and deploys correctly the first time.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Tuple 🔗 https://tuple.app/DOT 👉 Promo code: DOT2025 ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

AIInfrastructure #InternalDeveloperPlatform #KubernetesAI

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/internal-developer-platforms/why-your-infrastructure-ai-sucks-and-how-to-fix-it 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai 🎬 Stop Blaming AI: Vector DBs + RAG = Game Changer: https://youtu.be/zqpJr1qZhTg 🎬 Why Kubernetes Discovery Sucks for AI (And How Vector DBs Fix It): https://youtu.be/MSNstHj4rmk

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 AI for Infrastructure Challenges 01:42 Tuple (sponsor) 03:16 Why Your AI Agent Is Useless 09:52 Kubernetes API Discovery That Actually Works 13:41 Organizational Knowledge AI Can Actually Use 17:49 Stop Breaking Production With AI 22:17 The Context Window Disaster Nobody Talks About 25:16 Smart Conversations That Get Results 29:34 Your Complete AI-Powered IDP Blueprint

via YouTube https://www.youtube.com/watch?v=Ma3gKmuXahc

·youtube.com·
DevOps & AI Toolkit - Why Your Infrastructure AI Sucks (And How to Fix It) - https://www.youtube.com/watch?v=Ma3gKmuXahc
The Making of Flux: The Scale a KubeFM Original Series
The Making of Flux: The Scale a KubeFM Original Series

The Making of Flux: The Scale, a KubeFM Original Series

https://ku.bz/tWcHlJm7M

In this episode, Philippe Ensarguet, VP of Software Engineering at Orange, and Arnab Chatterjee, Global Head of Container & AI Platforms at Nomura, share how large enterprises are adopting Flux to drive reliable, compliant, and scalable platforms.

How Orange uses Flux to manage bare-metal Kubernetes through its SYLVR project.

Why Nomura relies on GitOps to balance agility with governance in financial services.

How Flux helps enterprises achieve resilience, compliance, and repeatability at scale.

Sponsor

Join the Flux maintainers and community at FluxCon, November 11th in Atlanta—register here

More info

Find all the links and info for this episode here: https://ku.bz/tWcHlJm7M

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 13, 2025 at 06:00AM

·kube.fm·
The Making of Flux: The Scale a KubeFM Original Series
Last Week in Kubernetes Development - Week Ending October 5 2025
Last Week in Kubernetes Development - Week Ending October 5 2025

Week Ending October 5, 2025

https://lwkd.info/2025/20251010

Developer News

Joaquim Rocha has been nominated to be one of the new SIG UI leads. Congrats Joaquim!

Folks are discussing the deprecation of cgroups v1. Find the whole discussion in the mailing list here

There are some updates in the release informing and blocking jobs to improve alpha/beta coverage. Find the full list of jobs moved to release informing and blocking status here

Release Schedule

Next Deadline: Enhancements Freeze, October 16

All enhancements are expected to have met the requirements by the freeze. Those that don’t meet the requirements will be removed from the milestone and will require an Exception.

Kubernetes v1.35.0-alpha.1 is out!

The cherry-pick deadline for patch releases is Oct 10.

Steering Committee Election

The Steering Committee Election voting ends on Friday, 24th October, AoE. You can check your eligibility to vote in the voting app, and file an exception request by October 22 if you need an exception. Don’t forget to cast your votes if you haven’t already!

Featured PRs

133697: Codify feature gate dependencies

With this PR, feature gate dependencies can be explicitly declared and enforced. This has been ad-hoc or implicit in the past. Components will now refuse to start if a feature is enabled without its required dependencies. Feature Owners should review the backfilled dependencies, while users who manually toggle feature gates must ensure dependent features are also enabled—especially noting that AllAlpha=true now requires AllBeta=true or equivalent beta features to be set.

KEP of the Week

KEP 859: Include kubectl command metadata in http request headers

This KEP aims to add extra HTTP headers to kubectl requests sent to the Kubernetes apiserver. These headers would share details such as which kubectl command was used, the flags included, a session ID, and whether the command is deprecated. This would help cluster administrators understand how users interact with the cluster, making it easier to debug issues, track usage, and gather insights, without exposing any sensitive data.

This KEP is tracked for GA in v1.35

Other Merges

Disable SchedulerAsyncAPICalls feature gate to prevent scheduler performance issues under high API server load.

Add path normalization to error matcher for improved field validation.

DeviceClass now enforces a maximum of 32 selectors and configs via declarative validation.

Add declarative validation +k8s:maxItems tag to ResourceClaim

HPA controller now exposes desired_replicas metric to track scaling history.

Fix preemptor pod behavior to prevent endless scheduling loops during slow victim deletion.

Feature gate dependencies are now explicit and validated at startup, preventing enabling a feature if its dependencies are disabled.

kube-scheduler introduces lightweight AssumeCache in VolumeBinding plugin to fix occasional pod scheduling delays.

Version Updates

etcd to v3.6.5

Subprojects and Dependency Updates

cluster-api v1.11.2 extends Kubernetes support to v1.34 for both management and workload clusters, adds CoreDNS migration v1.0.28, and introduces Metal3 as an IPAM provider.

cluster-api v1.10.7 adds Kubernetes v1.33 compatibility and updates CoreDNS migration to v1.0.28.

coredns v1.13.1 updates Go to v1.25.2 to address security issues, improves performance, and enhances the sign plugin by rejecting invalid UTF-8 tokens.

coredns v1.13.0 introduces a new Nomad plugin, fixes Corefile loop and import issues, improves shutdown handling, and hardens gRPC and reload behavior.

containerd API v1.10.0-beta.1 adds a mount manager and aligns with containerd 2.2 APIs (pre-release).

kOps v1.34.0-beta.1 updates AWS and Azure components (VPC CNI v1.20.2, Cilium v1.18.2, Calico v3.30.3), upgrades etcd to v3.6.5, drops Canal support, and removes Kubernetes 1.28 compatibility.

autoscaler vertical-pod-autoscaler v1.5.1 updates the default VPA version and client-go dependency to improve stability.

autoscaler cluster-autoscaler-chart v0.1.1 introduces automatic resource adjustment for workloads through Helm.

csi-driver-nfs v4.12.0 updates Go to 1.24, fixes a goroutine leak, and adds support for creating multiple storage classes with Helm.

csi-driver-smb v1.19.0 improves secret handling with special characters, updates CSI sidecars and resizer to v1.14.0, and adds Helm support for multiple storage classes.

headlamp v0.36.0 adds support for EndpointSlice resources, label-based search, and clipboard copy for resource names. It improves table sorting memory, standardizes resource naming, and enhances Helm charts with optional PodDisruptionBudget, backend TLS termination, and security context updates. The release also fixes several UI issues, improves plugin management, and updates shipped Prometheus and App Catalog plugins.

Shoutouts

Drew Hagen – I’d like to take a moment to acknowledge @Matteo for the seriously impressive leadership of a newer release branch management shadow program for the 1.34 release, and all the amazing work putting together strong documentation for branch management!! I remember my experience releasing alpha 3 being very clear what to do and going really smooth. Very little tribal knowledge. And we did most releases async, which I think speaks to how strong this handbook is. I thank you for still being around to observe and help, even if it meant some later nights in your time zone. @xmudrii @jimangel Great work! Y’all have set the foundation for many more cycles to come. Thank you for all of your patience, guidance and support. It was really great learning and working with you all @Angelos Kolaitis @satyampsoni

via Last Week in Kubernetes Development https://lwkd.info/

October 10, 2025 at 09:03AM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending October 5 2025
SYNOLOGY SUPPORT SEAGATE & WD AGAIN - TOO LITTLE, TOO LATE?
SYNOLOGY SUPPORT SEAGATE & WD AGAIN - TOO LITTLE, TOO LATE?
Synology (FINALLY) Gives In to 3rd Party HDD Support in 2025 PLUS Series NAS 7/10/25 - Updated with information supplied by Synology on how verifications and product ranges will support different HDD/SSD in DSM 7.3 Of all the stories of 2025, very few had the level of impact on the NAS industry th
·nascompares.com·
SYNOLOGY SUPPORT SEAGATE & WD AGAIN - TOO LITTLE, TOO LATE?
Web KAT Attack! Launch Trailer
Web KAT Attack! Launch Trailer
Our first game built with Godot. Web-KAT Attack a straight forward hi-score attack Twin-Stick shooter available now on itch.io: https://thehungrybuppis.itch....
·youtube.com·
Web KAT Attack! Launch Trailer
Red Hat GitLab Data Breach: The Crimson Collective's Attack
Red Hat GitLab Data Breach: The Crimson Collective's Attack
This breach exposed 570GB of data from 28,000 repositories, affecting 800+ organizations. Crimson Collective leaked Customer Engagement Reports containing credentials, API keys, and infrastructure details from major enterprises.
·blog.gitguardian.com·
Red Hat GitLab Data Breach: The Crimson Collective's Attack
CHAOSScast Episode 120: Practitioner Guides: #5 Demonstrating Organizational Value
CHAOSScast Episode 120: Practitioner Guides: #5 Demonstrating Organizational Value
In this episode of CHAOSScast, Harmony Elendu hosts a discussion with Dawn Foster and Bob Killen to discuss their extensive experience in open source and detail the motivations behind the creation of the CHAOSS Practitioner Guides. These guides aim to help practitioners navigate the overwhelming amount of data related to open source projects and understand how to improve project health and sustainability. The discussion covers strategies for communicating the business value of open source efforts to leadership, framing contributions in a way that resonates with organizational priorities, and prioritizing investments in critical projects. Press download now!
·podcast.chaoss.community·
CHAOSScast Episode 120: Practitioner Guides: #5 Demonstrating Organizational Value
DevOps & AI Toolkit - Ep36 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=iZoTwl8BWCI
DevOps & AI Toolkit - Ep36 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=iZoTwl8BWCI

Ep36 - Ask Me Anything About Anything

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=iZoTwl8BWCI

·youtube.com·
DevOps & AI Toolkit - Ep36 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=iZoTwl8BWCI
Asked to do something illegal at work? Here’s what these software engineers did
Asked to do something illegal at work? Here’s what these software engineers did
At FTX, Frank, and Pollen, software engineers were asked to do something potentially illegal, or to go along with what looked like fraud. They obliged in two out of three cases, landed in hot water, and now face jail time. A reminder why it’s never a good idea to go along with such requests.
·blog.pragmaticengineer.com·
Asked to do something illegal at work? Here’s what these software engineers did
How We Integrated Native macOS Workloads with Kubernetes with Vitalii Horbachov
How We Integrated Native macOS Workloads with Kubernetes with Vitalii Horbachov

How We Integrated Native macOS Workloads with Kubernetes, with Vitalii Horbachov

https://ku.bz/q_JS76SvM

Vitalii Horbachov explains how Agoda built macOS VZ Kubelet, a custom solution that registers macOS hosts as Kubernetes nodes and spins up macOS VMs using Apple's native virtualization framework. He details their journey from managing 200 Mac minis with bash scripts to a Kubernetes-native approach that handles 20,000 iOS tests at scale.

You will learn:

How to build hybrid runtime pods that combine macOS VMs with Docker sidecar containers for complex CI/CD workflows

Custom OCI image format implementation for managing 55-60GB macOS VM images with layered copy-on-write disks and digest validation

Networking and security challenges including Apple entitlements, direct NIC access, and implementing kubectl exec over SSH

Real-world adoption considerations including MDM-based host lifecycle management and the build vs. buy decision for Apple infrastructure at scale

Sponsor

This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/q_JS76SvM

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 07, 2025 at 06:00AM

·kube.fm·
How We Integrated Native macOS Workloads with Kubernetes with Vitalii Horbachov
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/

Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.

Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.

The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.

The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.

Map view of Karpenter Resources and how they relate to Kubernetes resources

Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Visualization of Karpenter Metrics

Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .

Scaling decisions

Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.

Config editor with validation support

Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.

Real time view of Karpenter resources

View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.

Dashboard for Pending Pods

View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Karpenter Providers

This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.

Provider Name

Tested

Extra provider specific info supported

AWS

Azure

AlibabaCloud

Bizfly Cloud

Cluster API

GCP

Proxmox

Oracle Cloud Infrastructure (OCI)

Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).

How to use

Please see the plugins/karpenter/README.md for instructions on how to use.

Feedback and Questions

Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.

via Kubernetes Blog https://kubernetes.io/

October 05, 2025 at 08:00PM

·kubernetes.io·
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
DevOps & AI Toolkit - Kubernetes Controllers Deep Dive: How They Really Work - https://www.youtube.com/watch?v=kss081c8EqY
DevOps & AI Toolkit - Kubernetes Controllers Deep Dive: How They Really Work - https://www.youtube.com/watch?v=kss081c8EqY

Kubernetes Controllers Deep Dive: How They Really Work

Most people using Kubernetes know how to write YAML and run kubectl apply, but when things break, they're completely lost. The secret they're missing? Understanding controllers - the beating heart that makes Kubernetes actually work. Controllers are what automatically restart your crashed pods, scale your applications, and make custom resources feel native to the platform.

This video dives deep into the real mechanics of how Kubernetes controllers operate. You'll discover how controllers consume and emit events to coordinate with each other, how the reconciliation loop continuously maintains your desired state, and how the Watch API efficiently streams changes without overwhelming the system. We'll explore custom resource definitions that extend Kubernetes, controller communication patterns, and the event-driven architecture that makes everything self-healing. Whether you're debugging cluster issues or building your own controllers, this knowledge will transform how you think about Kubernetes from just throwing YAML at the wall to truly understanding the orchestration engine underneath.

KubernetesControllers #Kubernetes #DevOpsEngineering

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/kubernetes-controllers-deep-dive-how-they-really-work 🔗 Kubernetes: https://kubernetes.io

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Controllers Deep Dive 01:18 Kubernetes Control Loops Explained 04:12 How Kubernetes Controllers Watch Events 07:35 Kubernetes Event Emission 11:56 Kubernetes Reconciliation Loop 17:12 Kubernetes Watch API 21:01 Kubernetes Custom Resource Definitions (CRDs) 21:13 Kubernetes Controller Communication 25:22 Kubernetes Controllers Mastery

via YouTube https://www.youtube.com/watch?v=kss081c8EqY

·youtube.com·
DevOps & AI Toolkit - Kubernetes Controllers Deep Dive: How They Really Work - https://www.youtube.com/watch?v=kss081c8EqY
The Making of Flux: The Rewrite a KubeFM Original Series
The Making of Flux: The Rewrite a KubeFM Original Series

The Making of Flux: The Rewrite, a KubeFM Original Series

https://ku.bz/bgkgn227-

In this episode, Michael Bridgen (the engineer who wrote Flux's first lines) and Stefan Prodan (the maintainer who led the V2 rewrite) share how Flux grew from a fragile hack-day script into a production-grade GitOps toolkit.

How early Flux addressed the risks of manual, unsafe Kubernetes upgrades

Why the complete V2 rewrite was critical for stability, scalability, and adoption

What the maintainers learned about building a sustainable, community-driven open-source project

Sponsor

Join the Flux maintainers and community at FluxCon, November 11th in Salt Lake City—register here

More info

Find all the links and info for this episode here: https://ku.bz/bgkgn227-

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 06, 2025 at 06:00AM

·kube.fm·
The Making of Flux: The Rewrite a KubeFM Original Series
lasantosr/intelli-shell
lasantosr/intelli-shell
Like IntelliSense, but for shells. Contribute to lasantosr/intelli-shell development by creating an account on GitHub.
·github.com·
lasantosr/intelli-shell
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

https://kubernetes.io/blog/2025/09/23/introducing-headlamp-plugin-for-karpenter/

Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.

Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.

The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.

The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.

Map view of Karpenter Resources and how they relate to Kubernetes resources

Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Visualization of Karpenter Metrics

Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .

Scaling decisions

Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.

Config editor with validation support

Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.

Real time view of Karpenter resources

View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.

Dashboard for Pending Pods

View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Karpenter Providers

This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.

Provider Name

Tested

Extra provider specific info supported

AWS

Azure

AlibabaCloud

Bizfly Cloud

Cluster API

GCP

Proxmox

Oracle Cloud Infrastructure (OCI)

Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).

How to use

Please see the plugins/karpenter/README.md for instructions on how to use.

Feedback and Questions

Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.

via Kubernetes Blog https://kubernetes.io/

September 22, 2025 at 08:00PM

·kubernetes.io·
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Last Week in Kubernetes Development - Week Ending September 28 2025
Last Week in Kubernetes Development - Week Ending September 28 2025

Week Ending September 28, 2025

https://lwkd.info/2025/20251002

Developer News

Instead of reviving the WG API Expression working group, a new SIG API Machinery subproject meeting on Declarative APIs and Linters was held on Sept 23, 2025, at 9 AM PST. The subproject carried the same goals as the proposed WG, and meeting details were shared in the Agenda & Notes document.

The WG AI Gateway has officially launched with a Slack channel, #wg-ai-gateway, and a mailing list. Meetings will begin next week, and the community is encouraged to join and participate.

Release Schedule

Next Deadline: PRR Freeze, October 9

Kubernetes v1.35 is moving along — APAC friendly meetings are running and enhancement opt ins are open.

Starting from v1.35, PRR Freeze is a hard deadline. No new KEPs may be opted in after the PRR Freeze deadline. Read more about about the new PRR Freeze rules here. If your KEP misses the PRR Freeze deadline, you need to submit an exception for your KEP within 3 days after PRR Freeze. Read more about the exception process here. If you have any questions, feel free to reach out in the #sig-release or the #prod-readiness channels in Slack.

If you’re an enhancement owner, make sure your KEP is up to date (status: implementable,milestone: v1.35, test plan + PRR filled) before PRR Freeze on Oct 9 (AoE) / Oct 10, 12:00 UTC.

The next cherry-pick deadline for patch releases is Oct 10.

Featured PRs

134330: Add resource version comparison function in client-go along with conformance

This PR introduces a helper function for comparing Kubernetes resource versions; Resource versions are used for concurrency control and watch operations, but until now, they could only be compared as opaque strings; The new function allows direct comparison of resource versions for objects of the same type; Alongside this, conformance tests have been added to ensure consistent handling across GA resources, making resource version behavior clearer and more reliable.

KEP of the Week

KEP-4412: Projected service account tokens for Kubelet image credential providers

This KEP proposes a secret-less image-pull flow that leverages ephemeral Kubernetes Service Account (KSA) tokens instead of long-lived ImagePullSecrets or node-wide kubelet credential providers. A pod-bound, short-lived KSA token would be used (or exchanged) to obtain transient, workload-scoped image-pull credentials before the pod starts, avoiding persisted secrets in the API or node and allowing external validators to rely on OIDC-like token semantics. This ties image-pull authorization to the workload identity, simplifies secret rotation and management, and reduces the security risk posed by long-lived, hard-to-rotate credentials.

This KEP is tracked for beta in v1.34.

Other Merges

Deallocate extended resource claims on pod completion

Introduce k8s:customUnique tag to control listmap uniqueness validation

Add +enum tag to DeviceAllocationMode type

kubeadm: wait for apiserver using a local client, not the control-plane endpoint

Revert async preemption corner-case fix — undoes prior change to scheduler preemption behavior

kubeadm removes the RootlessControlPlane feature gate as UserNamespacesSupport becomes the replacement

Enable SSATags linter to enforce +listType on lists in APIs

API Dispatcher drops goroutine limit to avoid throughput regression under high latency

Kubelet and controller: enable more asynchronous node status updates and improve tracing/logging

DRA: allocator selection uses correct “incubating” implementation by default

kube-proxy: list available endpoints in /statusz

Restore partial functionality of AuditEventFrom

Add explicit feature gate dependencies with validation

Kubernetes is now built with Go v1.24.7

Promotions

Graduate ControlPlaneKubeletLocalMode to GA

Version Updates

Update publishing rules to use Go v1.24.7

Subprojects and Dependency Updates

cluster-autoscaler v1.34.0 promotes In-Place Updates to Beta, adds Capacity Buffer CRD/controller, improves scale-up logic across multiple providers, and deprecates older flags/APIs

cluster-autoscaler-chart v0.1.0 automatically adjusts resources for workloads

gRPC v1.75.1 adds Python 3.14 support, fixes Python async shutdown race, and refines interpreter exit handling

helm-chart-aws-cloud-controller-manager v0.0.10 installs Cloud Controller Manager for AWS Cloud Provider

ingress-nginx helm-chart v4.13.3 updates Ingress-Nginx to controller v1.13.3

nerdctl v2.1.6 reserves ports in rootful mode to prevent conflicts

Shoutouts

No shoutouts this week. Want to thank someone for special efforts to improve Kubernetes? Tag them in the #shoutouts channel.

via Last Week in Kubernetes Development https://lwkd.info/

October 02, 2025 at 06:25AM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending September 28 2025