
1_r/devopsish
Week Ending August 10, 2025
https://lwkd.info/2025/20250814
Developer News
Do you run multiple Kubernetes clusters in your organization? SIG-Multicluster would love to have you answer their survey so that they can decide the SIG’s priorities.
The Kubernetes SIG Release main meeting has been rescheduled, based on a community vote, to a fixed time slot of Thursdays from 2:30–3:15 pm UTC, starting August 21, 2025, and will now follow a bi-weekly cadence. The previously alternating meeting times are discontinued, and the next scheduled meeting before the change has been canceled. The time is tied to UTC, so participants in regions with Daylight Saving Time may see a ±1 hour shift. The calendar has been updated, and questions or feedback can be shared via email or the #sig-release Slack channel.
Release Schedule
Next Deadline: Release day, 27 August
We are currently in Docs Freeze
Kubernetes v1.34.0-rc.0 was released, followed by v1.34.0-rc.1 to address a critical bug fix
Cherry-pick deadlines for the upcoming patch releases 1.33.4, 1.32.8, and 1.31.12 have passed. These patch releases are expected on August 12.
KEP of the Week
KEP 5080: Ordered Namespace Deletion
This KEP introduces a deterministic, security-aware ordered deletion of all resources within a namespace. Previously, namespace deletion could remove resources in a non-deterministic order that could lead to awkward or risky gaps. When deleting a namespace with the OrderedNamespaceDeletion Feature Flag enabled, Kubernetes tears down namespace objects in waves, so that Pods go first and the crucial resources likeNetworkPolicy Don’t disappear while Pods are still running
This KEP is tracked for Stable in 1.34
Other Merges
NodeRestriction to prevent nodes from updating their OwnerReferences
Etcd metrics uses Delete() instead of DeleteLabelValues()
Enable publishing-bot support for v1.34 branch
Prerelease lifecycle for PodCertificateRequest is fixed
Demote KEP-5278 feature gates for ClearingNominatedNodeNameAfterBinding and NominatedNodeNameForExpectation to Alpha
podcertificaterequestcleaner role is now behind a feature-gate
Deprecated
Deprecated Version is removed for api_server_storage_objects
Subprojects and Dependency Updates
coredns/coredns v1.12.3 improves plugin reliability, adds Kubernetes plugin startup timeout, updates route53 to AWS SDK v2, and fixes race conditions
cluster autoscaler chart v9.50.0 scales Kubernetes worker nodes within autoscaling groups
cluster-api v1.11.0-rc.0 for testing
cluster-api-provider-vsphere v1.14.0-rc.0 for testing
gRPC Core 1.74.1 (gee): patch release for grpc/ruby
kOps v1.34.0-alpha.1 introduces new features and significant bug fixes for Azure, adds experimental IPv6 support for bare-metal, and updates key dependencies like containerd and etcd
kompose 1.37.0 includes code refactoring for simplification and updates several core dependencies for improved stability and performance
Shoutouts
Want to thank someone in the community? Drop a note in #shoutouts on Slack.
via Last Week in Kubernetes Development https://lwkd.info/
August 14, 2025 at 03:00PM
Ep32 - Ask Me Anything About Anything
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=GjV3PtqVP9Q
How Policies Saved us a Thousand Headaches, with Alessandro Pomponio
Alessandro Pomponio from IBM Research explains how his team transformed their chaotic bare-metal clusters into a well-governed, self-service platform for AI and scientific workloads. He walks through their journey from manual cluster interventions to a fully automated GitOps-first architecture using ArgoCD, Kyverno, and Kueue to handle everything from policy enforcement to GPU scheduling.
You will learn:
How to implement GitOps workflows that reduce administrative burden while maintaining governance and visibility across multi-tenant research environments
Practical policy enforcement strategies using Kyverno to prevent GPU monopolization, block interactive pod usage, and automatically inject scheduling constraints
Fair resource sharing techniques with Kueue to manage scarce GPU resources across different hardware types while supporting both specific and flexible allocation requests
Organizational change management approaches for gaining stakeholder buy-in, upskilling admin teams, and communicating policy changes to research users
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/5sK7BFZ-8
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
August 12, 2025 at 06:00AM
AI Meets Kubernetes: Simplifying Developer and Ops Collaboration
Platform engineers and developers often struggle to align on infrastructure needs, leading to platforms that miss the mark and endless iteration loops. But what if AI could bridge this gap? This video explores a three-way collaboration where developers express their requirements in natural language, platform engineers establish guardrails and constraints, and AI intelligently translates developer intent into precise, infrastructure-compliant deployments.
Watch a live demo of the DevOps AI Toolkit (dot-ai) MCP, a project that leverages AI to match developer requests with platform-engineered building blocks, automatically generating optimized Kubernetes configurations. Witness how conversational deployment simplifies and accelerates the deployment process, ensuring deployments meet organizational standards while empowering developers to effortlessly deploy applications tailored to their exact needs.
PlatformEngineering #Kubernetes #AI
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/app-management/ai-meets-kubernetes-simplifying-developer-and-ops-collaboration 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Platform Engineering with AI 01:35 Blacksmith (sponsor) 02:43 The Platform Engineering Problem 08:27 AI Deployment Magic 19:06 When AI Hits Limits 21:02 Beyond Simple Abstractions 23:20 DevOps AI Toolkit Explained
via YouTube https://www.youtube.com/watch?v=8Yzn-9qQpQI
Introducing Headlamp AI Assistant
https://kubernetes.io/blog/2025/08/07/introducing-headlamp-ai-assistant/
This announcement originally appeared on the Headlamp blog.
To simplify Kubernetes management and troubleshooting, we're thrilled to introduce Headlamp AI Assistant: a powerful new plugin for Headlamp that helps you understand and operate your Kubernetes clusters and applications with greater clarity and ease.
Whether you're a seasoned engineer or just getting started, the AI Assistant offers:
Fast time to value: Ask questions like "Is my application healthy?" or "How can I fix this?" without needing deep Kubernetes knowledge.
Deep insights: Start with high-level queries and dig deeper with prompts like "List all the problematic pods" or "How can I fix this pod?"
Focused & relevant: Ask questions in the context of what you're viewing in the UI, such as "What's wrong here?"
Action-oriented: Let the AI take action for you, like "Restart that deployment", with your permission.
Here is a demo of the AI Assistant in action as it helps troubleshoot an application running with issues in a Kubernetes cluster:
Hopping on the AI train
Large Language Models (LLMs) have transformed not just how we access data but also how we interact with it. The rise of tools like ChatGPT opened a world of possibilities, inspiring a wave of new applications. Asking questions or giving commands in natural language is intuitive, especially for users who aren't deeply technical. Now everyone can quickly ask how to do X or Y, without feeling awkward or having to traverse pages and pages of documentation like before.
Therefore, Headlamp AI Assistant brings a conversational UI to Headlamp, powered by LLMs that Headlamp users can configure with their own API keys. It is available as a Headlamp plugin, making it easy to integrate into your existing setup. Users can enable it by installing the plugin and configuring it with their own LLM API keys, giving them control over which model powers the assistant. Once enabled, the assistant becomes part of the Headlamp UI, ready to respond to contextual queries and perform actions directly from the interface.
Context is everything
As expected, the AI Assistant is focused on helping users with Kubernetes concepts. Yet, while there is a lot of value in responding to Kubernetes related questions from Headlamp's UI, we believe that the great benefit of such an integration is when it can use the context of what the user is experiencing in an application. So, the Headlamp AI Assistant knows what you're currently viewing in Headlamp, and this makes the interaction feel more like working with a human assistant.
For example, if a pod is failing, users can simply ask "What's wrong here?" and the AI Assistant will respond with the root cause, like a missing environment variable or a typo in the image name. Follow-up prompts like "How can I fix this?" allow the AI Assistant to suggest a fix, streamlining what used to take multiple steps into a quick, conversational flow.
Sharing the context from Headlamp is not a trivial task though, so it's something we will keep working on perfecting.
Tools
Context from the UI is helpful, but sometimes additional capabilities are needed. If the user is viewing the pod list and wants to identify problematic deployments, switching views should not be necessary. To address this, the AI Assistant includes support for a Kubernetes tool. This allows asking questions like "Get me all deployments with problems" prompting the assistant to fetch and display relevant data from the current cluster. Likewise, if the user requests an action like "Restart that deployment" after the AI points out what deployment needs restarting, it can also do that. In case of "write" operations, the AI Assistant does check with the user for permission to run them.
AI Plugins
Although the initial version of the AI Assistant is already useful for Kubernetes users, future iterations will expand its capabilities. Currently, the assistant supports only the Kubernetes tool, but further integration with Headlamp plugins is underway. Similarly, we could get richer insights for GitOps via the Flux plugin, monitoring through Prometheus, package management with Helm, and more.
And of course, as the popularity of MCP grows, we are looking into how to integrate it as well, for a more plug-and-play fashion.
Try it out!
We hope this first version of the AI Assistant helps users manage Kubernetes clusters more effectively and assist newcomers in navigating the learning curve. We invite you to try out this early version and give us your feedback. The AI Assistant plugin can be installed from Headlamp's Plugin Catalog in the desktop version, or by using the container image when deploying Headlamp. Stay tuned for the future versions of the Headlamp AI Assistant!
via Kubernetes Blog https://kubernetes.io/
August 07, 2025 at 03:00PM
Ep31 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=8TKvzwLIYSQ
MCP Servers Explained: Why Most Are Useless (And How to Fix It)
95% of MCP servers are essentially a waste of time. Many are slower and more complex than the terminal tools agents can already use effectively. But the remaining 5% are game-changers that unlock new, powerful AI capabilities. This video reveals exactly why most MCPs fail, how to identify and avoid redundant architectures, and the right way to build MCPs that truly matter.
Using clear analogies and real-world examples, you'll learn how to design MCP servers that directly reflect user intentions, provide access to otherwise inaccessible services, and combine deterministic code with intelligent agent-driven workflows. By the end, you'll understand the critical architecture and data flow patterns that separate revolutionary MCP servers from redundant ones, enabling you to create AI agents that accomplish complex tasks with ease.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Outskill 👉 Grab your free seat to the 2-Day AI Mastermind: https://link.outskill.com/AIDEVAG2 🔐 100% Discount for the first 1000 people 💥 Dive deep into AI and Learn Automations, Build AI Agents, Make videos & images – all for free! 🎁 Bonuses worth $5100+ if you join and attend ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
AI #MCP #SoftwareArchitecture
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Model Context Protocol (MCP) 01:38 Outskill (sponsor) 03:09 What is the Model Context Protocol (MCP)? 07:06 Why Most MCPs Fail 13:24 MCP Architecture That Works 20:14 MCP Server Design Patterns 25:02 MCP Data Flow Patterns 37:24 Summary
via YouTube https://www.youtube.com/watch?v=7baGJ1bC9zE
Week Ending July 27, 2025
https://lwkd.info/2025/20250730
Developer News
Due to low attendance and frequent cancellations, SIG Release is seeking a better meeting time via a Doodle poll, open until August 3, 2025 (AOE). This applies only to the main SIG Release meeting, not Release Team meetings. Changes will begin the week of August 18, 2025.
A security vulnerability was found in Kubernetes where an unauthorized user may be able to SSH/RDP/WINRM to Windows VMs built with Kubernetes Image Builder. Clusters using Image Builder version v0.1.44 or earlier are affected, specifically when using Windows images built with Nutanix OVA. Images from other providers are not affected.
Release Schedule
Next Deadline: Docs freeze, August 6
Kubernetes v1.34 has entered Code Freeze as of July 25, 2025. Only release-blocking issues and PRs will be accepted into the v1.34 milestone. Enhancements that didn’t meet the criteria have been removed, but exceptions can be requested if necessary. Key deadlines include August 6 for the docs freeze. For concerns, contact the release team via email or the #sig-release Slack channel. Make sure to get your docs PRs reviewed and merged before the upcoming docs freeze deadline!
Featured PRs
133157: KEP 4033: Add metric for out of support CRI and bump feature to GA
This PR graduates the KubeletCgroupDriverFromCRI feature to GA in v1.34; It finalizes a multi-release effort that allows the kubelet to retrieve the cgroup driver configuration directly from the container runtime using the CRI API; This improves consistency between kubelet and container runtime settings and removes the need for manual configuration alignment; A new metric has been added to report when the runtime does not support the Status.cgroupDriver field in its CRI response, helping identify unsupported or outdated CRI implementations.
133136: feat: Add warnings for unrecognized formats in CRDs
This PR updates how Kubernetes handles custom resource definitions (CRDs) that include format values; When a CRD contains a format value that isn’t recognized, the API server now returns a warning during create or update; The CRD is still accepted, but the warning helps you identify issues such as typos or unsupported values.
133105: KEP-5229: Run Unschedulable scheduler_perf test case with SchedulerAsyncAPICalls feature gate enabled
This PR adds new test configurations that specifically toggle SchedulerAsyncAPICalls for the _QueueingHintsEnabled scenarios within the Unschedulable test; These tests measure how the scheduler performs when pods cannot be scheduled, and toggling this feature gate helps validate behavior under different configurations.
KEP of the Week
KEP-961: Implement maxUnavailable in StatefulSet
This KEP enhances StatefulSet rolling updates by introducing the maxUnavailable setting, allowing multiple pods to be updated simultaneously instead of the default one-by-one strategy. It aims to speed up rollouts for large applications while respecting minReadySeconds to maintain availability. The StatefulSet controller is improved to better track pod readiness, and metrics like statefulset_unavailability_violation along with event logs help diagnose rollout issues.
Other Merges
PSA added for blocking .host on pod probes
Aggregated API server discovery supports EndpointSlices
Kubelet monitors device health via DRA and reports it in pod.status.containerStatuses.allocatedResourcesStatus field
pkg/kubelet/winstats and pkg/kubelet/volumemanager migrated to contextual logging
PodLevelResources propagate Pod level hugepage cgroup to containers
Optional APIs in ResouceSlice.Basic and ResourceClaim.Status.AllocatedDeviceStatus added
pvc.spec.VolumeAttributesClassName goes from non-nil to nil
Pod availability checks at the correct time in ReplicaSets
Scheduler interfaces moved from pkg/scheduler/framework to staging repo
kube-apiserver allows white-spaced CABundle during webhook client creation and validation
APIVersion fields of the HPA are validated to ensure created API objects function properly
Allows setting any FQDN as the pod’s hostname
Useful endpoints added for kube-apiserver
Machine readable output options (JSON & YAML) added to kubectl api-resources
PodLevelResources updates Downward API defaulting for resource limits
RV check added on GC delete calls
Container restart policy rules implemented
DRA kubelet adds v1 gRPC
Removed deprecated gogo protocol definitions from k8s.io/kubelet/pkg/apis/pluginregistration in favor of protoc
Runtime cost estimation fix for IntOrString custom resource schemas with maximum length
Kubernetes to return an error if user namespaces are used with volumeDevices
API calls sent through dispatcher and cache
Kubelet: metrics for userns pod creations and failures
Pod rejected when attachment limit is exceeded
KYAML support added to kubectl
debug_redact added to cri api secrets
Metrics added for monitoring async API calls in the scheduler when the SchedulerAsyncAPICalls is enabled
Fix for handle corner cases in the async preemption
Bumped DRA API version to “v1” in “deviceattribute” package in k8s.io/dynamic-resource-allocation
BoundedFrequencyRunner dropped from pkg/util/async
Promotions
VolumeAttributesClass to GA
DRAPrioritizedList to Beta
DRA API to GA
PSI metrics to Beta
kubeletPodResources to Beta
Windows graceful shutdown to Beta
DRAAdminAccess to Beta
Version Updates
Bumped external snapshotter for vgs tests
Bumped etcd sdk to v3.6.4
kustomize to v5.7.0
Subprojects and Dependency Updates
containerd/containerd 1.7.28: The twenty-eighth patch release for containerd 1.7 contains various fixes and updates.
kustomize kyaml/v0.20.1: drop shlex dependency.
cluster-api v1.11.0-beta.0: releases beta version for testing
Shoutouts
Patrick Ohly: Shoutout to @alaypatel07 for tackling the problem of setting up scale tests for DRA. He identified and resolved several bottlenecks, both in the cluster configuration and the Kubernetes source code. He presented at the WG Device Management meeting today and we were happy enough with the preliminary results that graduation to GA is no longer blocked, thanks to @alaypatel07! Also thanks to everyone who has supported him: @jackfrancis, @nojnhuh, @wojtekt and probably others that I don’t know about
Maciej Szulik: Huge shoutout to @Edwin Hernandez and @Heba for their help pushing KEP 961 forward, especially that this is one of the oldest and longest running features
Benjamin Elder: Thanks to @danwinship for quickly looking into and fixing a conformance test flake in SIG Network
Benjamin Elder: Thanks @jasonbraganza for tirelessly handling new member requests
via Last Week in Kubernetes Development https://lwkd.info/
July 30, 2025 at 03:17PM