
1_r/devopsish
Week Ending August 17, 2025
https://lwkd.info/2025/20250820
Developer News
A medium-severity vulnerability (CVE-2025-5187, CVSS 6.7) affects Kubernetes clusters using the NodeRestriction admission controller without OwnerReferencesPermissionEnforcement. It allows a compromised node to delete its own Node object by patching OwnerReferences, then recreate it with altered taints or labels, bypassing normal delete restrictions. Update to the latest patch release (1.33.4, 1.32.8, or 1.31.12) to close this security hole.
Release Schedule
Next Deadline: Release day, 27 August
We are in the final week before releasing 1.34. Make sure to respond quickly to any blocker issues or test failures your SIG is tagged on.
Patch releases 1.33.4, 1.32.8, and 1.31.12 were published this week, built with Go 1.24.5 and 1.23.11 respectively. These patch releases primarily addresses an exploitable security hole so admins should update at the next availble downtime. Kubernetes 1.31 enters maintenance mode on Aug 28, 2025; the End of Life date for Kubernetes 1.31 is Oct 28, 2025.
Featured PRs
133409: Make podcertificaterequestcleaner role feature-gated
This PR restricts the creation of RBAC permissions for the podcertificaterequestcleaner controller behind a feature gate. The ClusterRole and ClusterRoleBinding for this controller are now only created when the related feature is enabled; This change helps reduce unnecessary permissions in clusters where the controller is not in use; It supports a more secure and minimal RBAC configuration by avoiding unused roles.
KEP of the Week
KEP 2340: Consistent Reads from Cache
This KEP introduces a mechanism to serve most reads from the watch cache while maintaining the same consistency guarantees as serving reads from etcd. Previously, the Get and List requests were guaranteed to be Consistent reads and were served from etcd using a “quorum read”. Serving reads from the watch cache is more performant and scalable than reading them from etcd, deserializing them, applying selectors, converting them to the desired version, and then garbage collecting all the objects that were allocated during the whole process.
This KEP is tracked for Stable in 1.34
Other Merges
Prevent data race around claimsToAllocate
Clarify staging repository READMEs
Version Updates
Bumped Go Version to 1.23.12 for publishing bot rules.
Bumped dependencies and images to Go 1.24.6 and distroless iptables
Subprojects and Dependency Updates
Ingress-NGINX v1.13.1 updates NGINX to v2.2.1, Go to v1.24.6, and includes bug fixes and improvements; Helm Chart v4.13.1 adds helm-test target and includes the updated controller
Shoutouts
Want to thank someone in the community? Drop a note in #shoutouts on Slack.
via Last Week in Kubernetes Development https://lwkd.info/
August 20, 2025 at 06:00PM
Tuning Linux Swap for Kubernetes: A Deep Dive
https://kubernetes.io/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/
The Kubernetes NodeSwap feature, likely to graduate to stable in the upcoming Kubernetes v1.34 release, allows swap usage: a significant shift from the conventional practice of disabling swap for performance predictability. This article focuses exclusively on tuning swap on Linux nodes, where this feature is available. By allowing Linux nodes to use secondary storage for additional virtual memory when physical RAM is exhausted, node swap support aims to improve resource utilization and reduce out-of-memory (OOM) kills.
However, enabling swap is not a "turn-key" solution. The performance and stability of your nodes under memory pressure are critically dependent on a set of Linux kernel parameters. Misconfiguration can lead to performance degradation and interfere with Kubelet's eviction logic.
In this blogpost, I'll dive into critical Linux kernel parameters that govern swap behavior. I will explore how these parameters influence Kubernetes workload performance, swap utilization, and crucial eviction mechanisms. I will present various test results showcasing the impact of different configurations, and share my findings on achieving optimal settings for stable and high-performing Kubernetes clusters.
Introduction to Linux swap
At a high level, the Linux kernel manages memory through pages, typically 4KiB in size. When physical memory becomes constrained, the kernel's page replacement algorithm decides which pages to move to swap space. While the exact logic is a sophisticated optimization, this decision-making process is influenced by certain key factors:
Page access patterns (how recently pages are accessed)
Page dirtyness (whether pages have been modified)
Memory pressure (how urgently the system needs free memory)
Anonymous vs File-backed memory
It is important to understand that not all memory pages are the same. The kernel distinguishes between anonymous and file-backed memory.
Anonymous memory: This is memory that is not backed by a specific file on the disk, such as a program's heap and stack. From the application's perspective this is private memory, and when the kernel needs to reclaim these pages, it must write them to a dedicated swap device.
File-backed memory: This memory is backed by a file on a filesystem. This includes a program's executable code, shared libraries, and filesystem caches. When the kernel needs to reclaim these pages, it can simply discard them if they have not been modified ("clean"). If a page has been modified ("dirty"), the kernel must first write the changes back to the file before it can be discarded.
While a system without swap can still reclaim clean file-backed pages memory under pressure by dropping them, it has no way to offload anonymous memory. Enabling swap provides this capability, allowing the kernel to move less-frequently accessed memory pages to disk to conserve memory to avoid system OOM kills.
Key kernel parameters for swap tuning
To effectively tune swap behavior, Linux provides several kernel parameters that can be managed via sysctl.
vm.swappiness: This is the most well-known parameter. It is a value from 0 to 200 (100 in older kernels) that controls the kernel's preference for swapping anonymous memory pages versus reclaiming file-backed memory pages (page cache).
High value (eg: 90+): The kernel will be aggressive in swapping out less-used anonymous memory to make room for file-cache.
Low value (eg: < 10): The kernel will strongly prefer dropping file cache pages over swapping anonymous memory.
vm.min_free_kbytes: This parameter tells the kernel to keep a minimum amount of memory free as a buffer. When the amount of free memory drops below the this safety buffer, the kernel starts more aggressively reclaiming pages (swapping, and eventually handling OOM kills).
Function: It acts as a safety lever to ensure the kernel has enough memory for critical allocation requests that cannot be deferred.
Impact on swap: Setting a higher min_free_kbytes effectively raises the floor for for free memory, causing the kernel to initiate swap earlier under memory pressure.
vm.watermark_scale_factor: This setting controls the gap between different watermarks: min, low and high, which are calculated based on min_free_kbytes.
Watermarks explained:
low: When free memory is below this mark, the kswapd kernel process wakes up to reclaim pages in the background. This is when a swapping cycle begins.
min: When free memory hits this minimum level, then aggressive page reclamation will block process allocation. Failing to reclaim pages will cause OOM kills.
high: Memory reclamation stops once the free memory reaches this level.
Impact: A higher watermark_scale_factor careates a larger buffer between the low and min watermarks. This gives kswapd more time to reclaim memory gradually before the system hits a critical state.
In a typical server workload, you might have a long-running process with some memory that becomes 'cold'. A higher swappiness value can free up RAM by swapping out the cold memory, for other active processes that can benefit from keeping their file-cache.
Tuning the min_free_kbytes and watermark_scale_factor parameters to move the swapping window early will give more room for kswapd to offload memory to disk and prevent OOM kills during sudden memory spikes.
Swap tests and results
To understand the real-impact of these parameters, I designed a series of stress tests.
Test setup
Environment: GKE on Google Cloud
Kubernetes version: 1.33.2
Node configuration: n2-standard-2 (8GiB RAM, 50GB swap on a pd-balanced disk, without encryption), Ubuntu 22.04
Workload: A custom Go application designed to allocate memory at a configurable rate, generate file-cache pressure, and simulate different memory access patterns (random vs sequential).
Monitoring: A sidecar container capturing system metrics every second.
Protection: Critical system components (kubelet, container runtime, sshd) were prevented from swapping by setting memory.swap.max=0 in their respective cgroups.
Test methodology
I ran a stress-test pod on nodes with different swappiness settings (0, 60, and 90) and varied the min_free_kbytes and watermark_scale_factor parameters to observe the outcomes under heavy memory allocation and I/O pressure.
Visualizing swap in action
The graph below, from a 100MBps stress test, shows swap in action. As free memory (in the "Memory Usage" plot) decreases, swap usage (Swap Used (GiB)) and swap-out activity (Swap Out (MiB/s)) increase. Critically, as the system relies more on swap, the I/O activity and corresponding wait time (IO Wait % in the "CPU Usage" plot) also rises, indicating CPU stress.
Findings
My initial tests with default kernel parameters (swappiness=60, min_free_kbytes=68MB, watermark_scale_factor=10) quickly led to OOM kills and even unexpected node restarts under high memory pressure. With selecting appropriate kernel parameters a good balance in node stability and performance can be achieved.
The impact of swappiness
The swappiness parameter directly influences the kernel's choice between reclaiming anonymous memory (swapping) and dropping page cache. To observe this, I ran a test where one pod generated and held file-cache pressure, followed by a second pod allocating anonymous memory at 100MB/s, to observe the kernel preference on reclaim:
My findings reveal a clear trade-off:
swappiness=90: The kernel proactively swapped out the inactive anonymous memory to keep the file cache. This resulted in high and sustained swap usage and significant I/O activity ("Blocks Out"), which in turn caused spikes in I/O wait on the CPU.
swappiness=0: The kernel favored dropping file-cache pages delaying swap consumption. However, it's critical to understand that this does not disable swapping. When memory pressure was high, the kernel still swapped anonymous memory to disk.
The choice is workload-dependent. For workloads sensitive to I/O latency, a lower swappiness is preferable. For workloads that rely on a large and frequently accessed file cache, a higher swappiness may be beneficial, provided the underlying disk is fast enough to handle the load.
Tuning watermarks to prevent eviction and OOM kills
The most critical challenge I encountered was the interaction between rapid memory allocation and Kubelet's eviction mechanism. When my test pod, which was deliberately configured to overcommit memory, allocated it at a high rate (e.g., 300-500 MBps), the system quickly ran out of free memory.
With default watermarks, the buffer for reclamation was too small. Before kswapd could free up enough memory by swapping, the node would hit a critical state, leading to two potential outcomes:
Kubelet eviction If kubelet's eviction manager detected memory.available was below its threshold, it would evict the pod.
OOM killer In some high-rate scenarios, the OOM Killer would activate before eviction could complete, sometimes killing higher priority pods that were not the source of the pressure.
To mitigate this I tuned the watermarks:
Increased min_free_kbytes to 512MiB: This forces the kernel to start reclaiming memory much earlier, providing a larger safety buffer.
Increased watermark_scale_factor to 2000: This widened the gap between the low and high watermarks (from ≈337MB to ≈591MB in my test node's /proc/zoneinfo), effectively increasing the swapping window.
This combination gave kswapd a larger operational zone and more time to swap pages to disk during memory spikes, successfully preventing both premature evictions and OOM kills in my test runs.
Table compares watermark levels from /proc/zoneinfo (Non-NUMA node):
min_free_kbytes=67584KiB and watermark_scale_factor=10
min_free_kbytes=524288KiB and watermark_scale_factor=2000
Node 0, zone Normal pages free 583273 boost 0 min 10504 low
Building a Carbon and Price-Aware Kubernetes Scheduler, with Dave Masselink
Data centers consume over 4% of global electricity and this number is projected to triple in the next few years due to AI workloads.
Dave Masselink, founder of Compute Gardener, discusses how he built a Kubernetes scheduler that makes scheduling decisions based on real-time carbon intensity data from power grids.
You will learn:
How carbon-aware scheduling works - Using real-time grid data to shift workloads to periods when electricity generation has lower carbon intensity, without changing energy consumption
Technical implementation details - Building custom Kubernetes schedulers using the scheduler plugin framework, including pre-filter and filter stages for carbon and time-of-use pricing optimization
Energy measurement strategies - Approaches for tracking power consumption across CPUs, memory, and GPUs
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/zk2xM1lfW
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
August 19, 2025 at 06:00AM
AI Will Replace Coders - But Not the Way You Think
After three decades in tech, I've never seen developers this terrified, and for good reason. AI can already write code faster than us, and it's rapidly approaching the point where it might write better code too. But here's what's driving me crazy: everyone is panicking about the wrong thing. They're worried AI will steal their jobs because it can code, which is like a chef fearing unemployment because someone invented a better knife.
Your real value was never in typing syntax or executing commands; that's just the mechanical stuff that happens after all the important thinking is done. The developers who will thrive aren't trying to out-code AI; they're the architects, problem-solvers, and domain experts who understand what needs to be built and why. Your deep knowledge of your industry, your business context, and the messy realities of how things actually work? That's your moat. AI doesn't know why your healthcare platform needs that weird HIPAA workaround, or why your e-commerce flow accommodates that legacy client system. Stop being a code monkey and start being the expert AI needs to not screw everything up. The choice is yours, but the clock is ticking.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Readdy 🔗 https://readdy.ai ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
AIandDevelopers #FutureOfCoding #TechCareerAdvice
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/ai-will-replace-coders---but-not-the-way-you-think
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Coding with AI 01:24 Sponsor (Readdy) 02:51 The Fear: AI Replacing Developers 06:50 The Truth: What Developers Really Do 13:54 The Secret Weapon: Your Domain Knowledge is Your Moat 19:16 The Adaptation: Thriving with AI
via YouTube https://www.youtube.com/watch?v=qBp8d6yBPPg
Week Ending August 10, 2025
https://lwkd.info/2025/20250814
Developer News
Do you run multiple Kubernetes clusters in your organization? SIG-Multicluster would love to have you answer their survey so that they can decide the SIG’s priorities.
The Kubernetes SIG Release main meeting has been rescheduled, based on a community vote, to a fixed time slot of Thursdays from 2:30–3:15 pm UTC, starting August 21, 2025, and will now follow a bi-weekly cadence. The previously alternating meeting times are discontinued, and the next scheduled meeting before the change has been canceled. The time is tied to UTC, so participants in regions with Daylight Saving Time may see a ±1 hour shift. The calendar has been updated, and questions or feedback can be shared via email or the #sig-release Slack channel.
Release Schedule
Next Deadline: Release day, 27 August
We are currently in Docs Freeze
Kubernetes v1.34.0-rc.0 was released, followed by v1.34.0-rc.1 to address a critical bug fix
Cherry-pick deadlines for the upcoming patch releases 1.33.4, 1.32.8, and 1.31.12 have passed. These patch releases are expected on August 12.
KEP of the Week
KEP 5080: Ordered Namespace Deletion
This KEP introduces a deterministic, security-aware ordered deletion of all resources within a namespace. Previously, namespace deletion could remove resources in a non-deterministic order that could lead to awkward or risky gaps. When deleting a namespace with the OrderedNamespaceDeletion Feature Flag enabled, Kubernetes tears down namespace objects in waves, so that Pods go first and the crucial resources likeNetworkPolicy Don’t disappear while Pods are still running
This KEP is tracked for Stable in 1.34
Other Merges
NodeRestriction to prevent nodes from updating their OwnerReferences
Etcd metrics uses Delete() instead of DeleteLabelValues()
Enable publishing-bot support for v1.34 branch
Prerelease lifecycle for PodCertificateRequest is fixed
Demote KEP-5278 feature gates for ClearingNominatedNodeNameAfterBinding and NominatedNodeNameForExpectation to Alpha
podcertificaterequestcleaner role is now behind a feature-gate
Deprecated
Deprecated Version is removed for api_server_storage_objects
Subprojects and Dependency Updates
coredns/coredns v1.12.3 improves plugin reliability, adds Kubernetes plugin startup timeout, updates route53 to AWS SDK v2, and fixes race conditions
cluster autoscaler chart v9.50.0 scales Kubernetes worker nodes within autoscaling groups
cluster-api v1.11.0-rc.0 for testing
cluster-api-provider-vsphere v1.14.0-rc.0 for testing
gRPC Core 1.74.1 (gee): patch release for grpc/ruby
kOps v1.34.0-alpha.1 introduces new features and significant bug fixes for Azure, adds experimental IPv6 support for bare-metal, and updates key dependencies like containerd and etcd
kompose 1.37.0 includes code refactoring for simplification and updates several core dependencies for improved stability and performance
Shoutouts
Want to thank someone in the community? Drop a note in #shoutouts on Slack.
via Last Week in Kubernetes Development https://lwkd.info/
August 14, 2025 at 03:00PM
Ep32 - Ask Me Anything About Anything
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=GjV3PtqVP9Q
How Policies Saved us a Thousand Headaches, with Alessandro Pomponio
Alessandro Pomponio from IBM Research explains how his team transformed their chaotic bare-metal clusters into a well-governed, self-service platform for AI and scientific workloads. He walks through their journey from manual cluster interventions to a fully automated GitOps-first architecture using ArgoCD, Kyverno, and Kueue to handle everything from policy enforcement to GPU scheduling.
You will learn:
How to implement GitOps workflows that reduce administrative burden while maintaining governance and visibility across multi-tenant research environments
Practical policy enforcement strategies using Kyverno to prevent GPU monopolization, block interactive pod usage, and automatically inject scheduling constraints
Fair resource sharing techniques with Kueue to manage scarce GPU resources across different hardware types while supporting both specific and flexible allocation requests
Organizational change management approaches for gaining stakeholder buy-in, upskilling admin teams, and communicating policy changes to research users
Sponsor
This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/5sK7BFZ-8
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
August 12, 2025 at 06:00AM
AI Meets Kubernetes: Simplifying Developer and Ops Collaboration
Platform engineers and developers often struggle to align on infrastructure needs, leading to platforms that miss the mark and endless iteration loops. But what if AI could bridge this gap? This video explores a three-way collaboration where developers express their requirements in natural language, platform engineers establish guardrails and constraints, and AI intelligently translates developer intent into precise, infrastructure-compliant deployments.
Watch a live demo of the DevOps AI Toolkit (dot-ai) MCP, a project that leverages AI to match developer requests with platform-engineered building blocks, automatically generating optimized Kubernetes configurations. Witness how conversational deployment simplifies and accelerates the deployment process, ensuring deployments meet organizational standards while empowering developers to effortlessly deploy applications tailored to their exact needs.
PlatformEngineering #Kubernetes #AI
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/app-management/ai-meets-kubernetes-simplifying-developer-and-ops-collaboration 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Platform Engineering with AI 01:35 Blacksmith (sponsor) 02:43 The Platform Engineering Problem 08:27 AI Deployment Magic 19:06 When AI Hits Limits 21:02 Beyond Simple Abstractions 23:20 DevOps AI Toolkit Explained
via YouTube https://www.youtube.com/watch?v=8Yzn-9qQpQI
Introducing Headlamp AI Assistant
https://kubernetes.io/blog/2025/08/07/introducing-headlamp-ai-assistant/
This announcement originally appeared on the Headlamp blog.
To simplify Kubernetes management and troubleshooting, we're thrilled to introduce Headlamp AI Assistant: a powerful new plugin for Headlamp that helps you understand and operate your Kubernetes clusters and applications with greater clarity and ease.
Whether you're a seasoned engineer or just getting started, the AI Assistant offers:
Fast time to value: Ask questions like "Is my application healthy?" or "How can I fix this?" without needing deep Kubernetes knowledge.
Deep insights: Start with high-level queries and dig deeper with prompts like "List all the problematic pods" or "How can I fix this pod?"
Focused & relevant: Ask questions in the context of what you're viewing in the UI, such as "What's wrong here?"
Action-oriented: Let the AI take action for you, like "Restart that deployment", with your permission.
Here is a demo of the AI Assistant in action as it helps troubleshoot an application running with issues in a Kubernetes cluster:
Hopping on the AI train
Large Language Models (LLMs) have transformed not just how we access data but also how we interact with it. The rise of tools like ChatGPT opened a world of possibilities, inspiring a wave of new applications. Asking questions or giving commands in natural language is intuitive, especially for users who aren't deeply technical. Now everyone can quickly ask how to do X or Y, without feeling awkward or having to traverse pages and pages of documentation like before.
Therefore, Headlamp AI Assistant brings a conversational UI to Headlamp, powered by LLMs that Headlamp users can configure with their own API keys. It is available as a Headlamp plugin, making it easy to integrate into your existing setup. Users can enable it by installing the plugin and configuring it with their own LLM API keys, giving them control over which model powers the assistant. Once enabled, the assistant becomes part of the Headlamp UI, ready to respond to contextual queries and perform actions directly from the interface.
Context is everything
As expected, the AI Assistant is focused on helping users with Kubernetes concepts. Yet, while there is a lot of value in responding to Kubernetes related questions from Headlamp's UI, we believe that the great benefit of such an integration is when it can use the context of what the user is experiencing in an application. So, the Headlamp AI Assistant knows what you're currently viewing in Headlamp, and this makes the interaction feel more like working with a human assistant.
For example, if a pod is failing, users can simply ask "What's wrong here?" and the AI Assistant will respond with the root cause, like a missing environment variable or a typo in the image name. Follow-up prompts like "How can I fix this?" allow the AI Assistant to suggest a fix, streamlining what used to take multiple steps into a quick, conversational flow.
Sharing the context from Headlamp is not a trivial task though, so it's something we will keep working on perfecting.
Tools
Context from the UI is helpful, but sometimes additional capabilities are needed. If the user is viewing the pod list and wants to identify problematic deployments, switching views should not be necessary. To address this, the AI Assistant includes support for a Kubernetes tool. This allows asking questions like "Get me all deployments with problems" prompting the assistant to fetch and display relevant data from the current cluster. Likewise, if the user requests an action like "Restart that deployment" after the AI points out what deployment needs restarting, it can also do that. In case of "write" operations, the AI Assistant does check with the user for permission to run them.
AI Plugins
Although the initial version of the AI Assistant is already useful for Kubernetes users, future iterations will expand its capabilities. Currently, the assistant supports only the Kubernetes tool, but further integration with Headlamp plugins is underway. Similarly, we could get richer insights for GitOps via the Flux plugin, monitoring through Prometheus, package management with Helm, and more.
And of course, as the popularity of MCP grows, we are looking into how to integrate it as well, for a more plug-and-play fashion.
Try it out!
We hope this first version of the AI Assistant helps users manage Kubernetes clusters more effectively and assist newcomers in navigating the learning curve. We invite you to try out this early version and give us your feedback. The AI Assistant plugin can be installed from Headlamp's Plugin Catalog in the desktop version, or by using the container image when deploying Headlamp. Stay tuned for the future versions of the Headlamp AI Assistant!
via Kubernetes Blog https://kubernetes.io/
August 07, 2025 at 03:00PM
Ep31 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=8TKvzwLIYSQ
MCP Servers Explained: Why Most Are Useless (And How to Fix It)
95% of MCP servers are essentially a waste of time. Many are slower and more complex than the terminal tools agents can already use effectively. But the remaining 5% are game-changers that unlock new, powerful AI capabilities. This video reveals exactly why most MCPs fail, how to identify and avoid redundant architectures, and the right way to build MCPs that truly matter.
Using clear analogies and real-world examples, you'll learn how to design MCP servers that directly reflect user intentions, provide access to otherwise inaccessible services, and combine deterministic code with intelligent agent-driven workflows. By the end, you'll understand the critical architecture and data flow patterns that separate revolutionary MCP servers from redundant ones, enabling you to create AI agents that accomplish complex tasks with ease.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Outskill 👉 Grab your free seat to the 2-Day AI Mastermind: https://link.outskill.com/AIDEVAG2 🔐 100% Discount for the first 1000 people 💥 Dive deep into AI and Learn Automations, Build AI Agents, Make videos & images – all for free! 🎁 Bonuses worth $5100+ if you join and attend ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
AI #MCP #SoftwareArchitecture
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Model Context Protocol (MCP) 01:38 Outskill (sponsor) 03:09 What is the Model Context Protocol (MCP)? 07:06 Why Most MCPs Fail 13:24 MCP Architecture That Works 20:14 MCP Server Design Patterns 25:02 MCP Data Flow Patterns 37:24 Summary
via YouTube https://www.youtube.com/watch?v=7baGJ1bC9zE