1_r/devopsish

1_r/devopsish

54566 bookmarks
Custom sorting
Tuning Linux Swap for Kubernetes: A Deep Dive
Tuning Linux Swap for Kubernetes: A Deep Dive

Tuning Linux Swap for Kubernetes: A Deep Dive

https://kubernetes.io/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/

The Kubernetes NodeSwap feature, likely to graduate to stable in the upcoming Kubernetes v1.34 release, allows swap usage: a significant shift from the conventional practice of disabling swap for performance predictability. This article focuses exclusively on tuning swap on Linux nodes, where this feature is available. By allowing Linux nodes to use secondary storage for additional virtual memory when physical RAM is exhausted, node swap support aims to improve resource utilization and reduce out-of-memory (OOM) kills.

However, enabling swap is not a "turn-key" solution. The performance and stability of your nodes under memory pressure are critically dependent on a set of Linux kernel parameters. Misconfiguration can lead to performance degradation and interfere with Kubelet's eviction logic.

In this blogpost, I'll dive into critical Linux kernel parameters that govern swap behavior. I will explore how these parameters influence Kubernetes workload performance, swap utilization, and crucial eviction mechanisms. I will present various test results showcasing the impact of different configurations, and share my findings on achieving optimal settings for stable and high-performing Kubernetes clusters.

Introduction to Linux swap

At a high level, the Linux kernel manages memory through pages, typically 4KiB in size. When physical memory becomes constrained, the kernel's page replacement algorithm decides which pages to move to swap space. While the exact logic is a sophisticated optimization, this decision-making process is influenced by certain key factors:

Page access patterns (how recently pages are accessed)

Page dirtyness (whether pages have been modified)

Memory pressure (how urgently the system needs free memory)

Anonymous vs File-backed memory

It is important to understand that not all memory pages are the same. The kernel distinguishes between anonymous and file-backed memory.

Anonymous memory: This is memory that is not backed by a specific file on the disk, such as a program's heap and stack. From the application's perspective this is private memory, and when the kernel needs to reclaim these pages, it must write them to a dedicated swap device.

File-backed memory: This memory is backed by a file on a filesystem. This includes a program's executable code, shared libraries, and filesystem caches. When the kernel needs to reclaim these pages, it can simply discard them if they have not been modified ("clean"). If a page has been modified ("dirty"), the kernel must first write the changes back to the file before it can be discarded.

While a system without swap can still reclaim clean file-backed pages memory under pressure by dropping them, it has no way to offload anonymous memory. Enabling swap provides this capability, allowing the kernel to move less-frequently accessed memory pages to disk to conserve memory to avoid system OOM kills.

Key kernel parameters for swap tuning

To effectively tune swap behavior, Linux provides several kernel parameters that can be managed via sysctl.

vm.swappiness: This is the most well-known parameter. It is a value from 0 to 200 (100 in older kernels) that controls the kernel's preference for swapping anonymous memory pages versus reclaiming file-backed memory pages (page cache).

High value (eg: 90+): The kernel will be aggressive in swapping out less-used anonymous memory to make room for file-cache.

Low value (eg: < 10): The kernel will strongly prefer dropping file cache pages over swapping anonymous memory.

vm.min_free_kbytes: This parameter tells the kernel to keep a minimum amount of memory free as a buffer. When the amount of free memory drops below the this safety buffer, the kernel starts more aggressively reclaiming pages (swapping, and eventually handling OOM kills).

Function: It acts as a safety lever to ensure the kernel has enough memory for critical allocation requests that cannot be deferred.

Impact on swap: Setting a higher min_free_kbytes effectively raises the floor for for free memory, causing the kernel to initiate swap earlier under memory pressure.

vm.watermark_scale_factor: This setting controls the gap between different watermarks: min, low and high, which are calculated based on min_free_kbytes.

Watermarks explained:

low: When free memory is below this mark, the kswapd kernel process wakes up to reclaim pages in the background. This is when a swapping cycle begins.

min: When free memory hits this minimum level, then aggressive page reclamation will block process allocation. Failing to reclaim pages will cause OOM kills.

high: Memory reclamation stops once the free memory reaches this level.

Impact: A higher watermark_scale_factor careates a larger buffer between the low and min watermarks. This gives kswapd more time to reclaim memory gradually before the system hits a critical state.

In a typical server workload, you might have a long-running process with some memory that becomes 'cold'. A higher swappiness value can free up RAM by swapping out the cold memory, for other active processes that can benefit from keeping their file-cache.

Tuning the min_free_kbytes and watermark_scale_factor parameters to move the swapping window early will give more room for kswapd to offload memory to disk and prevent OOM kills during sudden memory spikes.

Swap tests and results

To understand the real-impact of these parameters, I designed a series of stress tests.

Test setup

Environment: GKE on Google Cloud

Kubernetes version: 1.33.2

Node configuration: n2-standard-2 (8GiB RAM, 50GB swap on a pd-balanced disk, without encryption), Ubuntu 22.04

Workload: A custom Go application designed to allocate memory at a configurable rate, generate file-cache pressure, and simulate different memory access patterns (random vs sequential).

Monitoring: A sidecar container capturing system metrics every second.

Protection: Critical system components (kubelet, container runtime, sshd) were prevented from swapping by setting memory.swap.max=0 in their respective cgroups.

Test methodology

I ran a stress-test pod on nodes with different swappiness settings (0, 60, and 90) and varied the min_free_kbytes and watermark_scale_factor parameters to observe the outcomes under heavy memory allocation and I/O pressure.

Visualizing swap in action

The graph below, from a 100MBps stress test, shows swap in action. As free memory (in the "Memory Usage" plot) decreases, swap usage (Swap Used (GiB)) and swap-out activity (Swap Out (MiB/s)) increase. Critically, as the system relies more on swap, the I/O activity and corresponding wait time (IO Wait % in the "CPU Usage" plot) also rises, indicating CPU stress.

Findings

My initial tests with default kernel parameters (swappiness=60, min_free_kbytes=68MB, watermark_scale_factor=10) quickly led to OOM kills and even unexpected node restarts under high memory pressure. With selecting appropriate kernel parameters a good balance in node stability and performance can be achieved.

The impact of swappiness

The swappiness parameter directly influences the kernel's choice between reclaiming anonymous memory (swapping) and dropping page cache. To observe this, I ran a test where one pod generated and held file-cache pressure, followed by a second pod allocating anonymous memory at 100MB/s, to observe the kernel preference on reclaim:

My findings reveal a clear trade-off:

swappiness=90: The kernel proactively swapped out the inactive anonymous memory to keep the file cache. This resulted in high and sustained swap usage and significant I/O activity ("Blocks Out"), which in turn caused spikes in I/O wait on the CPU.

swappiness=0: The kernel favored dropping file-cache pages delaying swap consumption. However, it's critical to understand that this does not disable swapping. When memory pressure was high, the kernel still swapped anonymous memory to disk.

The choice is workload-dependent. For workloads sensitive to I/O latency, a lower swappiness is preferable. For workloads that rely on a large and frequently accessed file cache, a higher swappiness may be beneficial, provided the underlying disk is fast enough to handle the load.

Tuning watermarks to prevent eviction and OOM kills

The most critical challenge I encountered was the interaction between rapid memory allocation and Kubelet's eviction mechanism. When my test pod, which was deliberately configured to overcommit memory, allocated it at a high rate (e.g., 300-500 MBps), the system quickly ran out of free memory.

With default watermarks, the buffer for reclamation was too small. Before kswapd could free up enough memory by swapping, the node would hit a critical state, leading to two potential outcomes:

Kubelet eviction If kubelet's eviction manager detected memory.available was below its threshold, it would evict the pod.

OOM killer In some high-rate scenarios, the OOM Killer would activate before eviction could complete, sometimes killing higher priority pods that were not the source of the pressure.

To mitigate this I tuned the watermarks:

Increased min_free_kbytes to 512MiB: This forces the kernel to start reclaiming memory much earlier, providing a larger safety buffer.

Increased watermark_scale_factor to 2000: This widened the gap between the low and high watermarks (from ≈337MB to ≈591MB in my test node's /proc/zoneinfo), effectively increasing the swapping window.

This combination gave kswapd a larger operational zone and more time to swap pages to disk during memory spikes, successfully preventing both premature evictions and OOM kills in my test runs.

Table compares watermark levels from /proc/zoneinfo (Non-NUMA node):

min_free_kbytes=67584KiB and watermark_scale_factor=10

min_free_kbytes=524288KiB and watermark_scale_factor=2000

Node 0, zone Normal   pages free 583273   boost 0   min 10504   low

·kubernetes.io·
Tuning Linux Swap for Kubernetes: A Deep Dive
Building a Carbon and Price-Aware Kubernetes Scheduler with Dave Masselink
Building a Carbon and Price-Aware Kubernetes Scheduler with Dave Masselink

Building a Carbon and Price-Aware Kubernetes Scheduler, with Dave Masselink

https://ku.bz/zk2xM1lfW

Data centers consume over 4% of global electricity and this number is projected to triple in the next few years due to AI workloads.

Dave Masselink, founder of Compute Gardener, discusses how he built a Kubernetes scheduler that makes scheduling decisions based on real-time carbon intensity data from power grids.

You will learn:

How carbon-aware scheduling works - Using real-time grid data to shift workloads to periods when electricity generation has lower carbon intensity, without changing energy consumption

Technical implementation details - Building custom Kubernetes schedulers using the scheduler plugin framework, including pre-filter and filter stages for carbon and time-of-use pricing optimization

Energy measurement strategies - Approaches for tracking power consumption across CPUs, memory, and GPUs

Sponsor

This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/zk2xM1lfW

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

August 19, 2025 at 06:00AM

·kube.fm·
Building a Carbon and Price-Aware Kubernetes Scheduler with Dave Masselink
The real danger of systemd-coredump CVE-2025-4598 | CIQ
The real danger of systemd-coredump CVE-2025-4598 | CIQ
TL;DR: A critical vulnerability in systemd-coredump remains unfixed in Enterprise Linux 9, allowing attackers to steal password hashes and cryptographic keys within seconds - but Rocky Linux from CIQ…
·ciq.com·
The real danger of systemd-coredump CVE-2025-4598 | CIQ
AI & DevOps Toolkit - AI Will Replace Coders - But Not the Way You Think - https://www.youtube.com/watch?v=qBp8d6yBPPg
AI & DevOps Toolkit - AI Will Replace Coders - But Not the Way You Think - https://www.youtube.com/watch?v=qBp8d6yBPPg

AI Will Replace Coders - But Not the Way You Think

After three decades in tech, I've never seen developers this terrified, and for good reason. AI can already write code faster than us, and it's rapidly approaching the point where it might write better code too. But here's what's driving me crazy: everyone is panicking about the wrong thing. They're worried AI will steal their jobs because it can code, which is like a chef fearing unemployment because someone invented a better knife.

Your real value was never in typing syntax or executing commands; that's just the mechanical stuff that happens after all the important thinking is done. The developers who will thrive aren't trying to out-code AI; they're the architects, problem-solvers, and domain experts who understand what needs to be built and why. Your deep knowledge of your industry, your business context, and the messy realities of how things actually work? That's your moat. AI doesn't know why your healthcare platform needs that weird HIPAA workaround, or why your e-commerce flow accommodates that legacy client system. Stop being a code monkey and start being the expert AI needs to not screw everything up. The choice is yours, but the clock is ticking.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Readdy 🔗 https://readdy.ai ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

AIandDevelopers #FutureOfCoding #TechCareerAdvice

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/ai-will-replace-coders---but-not-the-way-you-think

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Coding with AI 01:24 Sponsor (Readdy) 02:51 The Fear: AI Replacing Developers 06:50 The Truth: What Developers Really Do 13:54 The Secret Weapon: Your Domain Knowledge is Your Moat 19:16 The Adaptation: Thriving with AI

via YouTube https://www.youtube.com/watch?v=qBp8d6yBPPg

·youtube.com·
AI & DevOps Toolkit - AI Will Replace Coders - But Not the Way You Think - https://www.youtube.com/watch?v=qBp8d6yBPPg
Last Week in Kubernetes Development - Week Ending August 10 2025
Last Week in Kubernetes Development - Week Ending August 10 2025

Week Ending August 10, 2025

https://lwkd.info/2025/20250814

Developer News

Do you run multiple Kubernetes clusters in your organization? SIG-Multicluster would love to have you answer their survey so that they can decide the SIG’s priorities.

The Kubernetes SIG Release main meeting has been rescheduled, based on a community vote, to a fixed time slot of Thursdays from 2:30–3:15 pm UTC, starting August 21, 2025, and will now follow a bi-weekly cadence. The previously alternating meeting times are discontinued, and the next scheduled meeting before the change has been canceled. The time is tied to UTC, so participants in regions with Daylight Saving Time may see a ±1 hour shift. The calendar has been updated, and questions or feedback can be shared via email or the #sig-release Slack channel.

Release Schedule

Next Deadline: Release day, 27 August

We are currently in Docs Freeze

Kubernetes v1.34.0-rc.0 was released, followed by v1.34.0-rc.1 to address a critical bug fix

Cherry-pick deadlines for the upcoming patch releases 1.33.4, 1.32.8, and 1.31.12 have passed. These patch releases are expected on August 12.

KEP of the Week

KEP 5080: Ordered Namespace Deletion

This KEP introduces a deterministic, security-aware ordered deletion of all resources within a namespace. Previously, namespace deletion could remove resources in a non-deterministic order that could lead to awkward or risky gaps. When deleting a namespace with the OrderedNamespaceDeletion Feature Flag enabled, Kubernetes tears down namespace objects in waves, so that Pods go first and the crucial resources likeNetworkPolicy Don’t disappear while Pods are still running

This KEP is tracked for Stable in 1.34

Other Merges

NodeRestriction to prevent nodes from updating their OwnerReferences

Etcd metrics uses Delete() instead of DeleteLabelValues()

Enable publishing-bot support for v1.34 branch

Prerelease lifecycle for PodCertificateRequest is fixed

Demote KEP-5278 feature gates for ClearingNominatedNodeNameAfterBinding and NominatedNodeNameForExpectation to Alpha

podcertificaterequestcleaner role is now behind a feature-gate

Deprecated

Deprecated Version is removed for api_server_storage_objects

Subprojects and Dependency Updates

coredns/coredns v1.12.3 improves plugin reliability, adds Kubernetes plugin startup timeout, updates route53 to AWS SDK v2, and fixes race conditions

cluster autoscaler chart v9.50.0 scales Kubernetes worker nodes within autoscaling groups

cluster-api v1.11.0-rc.0 for testing

cluster-api-provider-vsphere v1.14.0-rc.0 for testing

gRPC Core 1.74.1 (gee): patch release for grpc/ruby

kOps v1.34.0-alpha.1 introduces new features and significant bug fixes for Azure, adds experimental IPv6 support for bare-metal, and updates key dependencies like containerd and etcd

kompose 1.37.0 includes code refactoring for simplification and updates several core dependencies for improved stability and performance

Shoutouts

Want to thank someone in the community? Drop a note in #shoutouts on Slack.

via Last Week in Kubernetes Development https://lwkd.info/

August 14, 2025 at 03:00PM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending August 10 2025
Introducing gpt-oss | OpenAI
Introducing gpt-oss | OpenAI
gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models
·openai.com·
Introducing gpt-oss | OpenAI
TIL Kylie Robison has a newsletter
TIL Kylie Robison has a newsletter
WIRED's premium newsletters will showcase top-quality reporting and analysis, written by correspondents who are deeply sourced experts in their field.
·wired.com·
TIL Kylie Robison has a newsletter
Storage news round-up - 11 August – Blocks and Files
Storage news round-up - 11 August – Blocks and Files
Open-source storage hardware supplier 45Drives has a strategic partnership with LINBIT, creators of DRBD and LINSTOR, to deliver fully-integrated, enterprise-grade high-availability (HA) storage systems built on open-source technologies. It unites “Drives’ radically transparent hardware model with LINBIT’s production-grade software stack that powers HA deployments for companies like Apple, IBM, and Amazon.” … Cyber data protector […]
·blocksandfiles.com·
Storage news round-up - 11 August – Blocks and Files
AI & DevOps Toolkit - Ep32 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=GjV3PtqVP9Q
AI & DevOps Toolkit - Ep32 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=GjV3PtqVP9Q

Ep32 - Ask Me Anything About Anything

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=GjV3PtqVP9Q

·youtube.com·
AI & DevOps Toolkit - Ep32 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=GjV3PtqVP9Q
How Policies Saved us a Thousand Headaches with Alessandro Pomponio
How Policies Saved us a Thousand Headaches with Alessandro Pomponio

How Policies Saved us a Thousand Headaches, with Alessandro Pomponio

https://ku.bz/5sK7BFZ-8

Alessandro Pomponio from IBM Research explains how his team transformed their chaotic bare-metal clusters into a well-governed, self-service platform for AI and scientific workloads. He walks through their journey from manual cluster interventions to a fully automated GitOps-first architecture using ArgoCD, Kyverno, and Kueue to handle everything from policy enforcement to GPU scheduling.

You will learn:

How to implement GitOps workflows that reduce administrative burden while maintaining governance and visibility across multi-tenant research environments

Practical policy enforcement strategies using Kyverno to prevent GPU monopolization, block interactive pod usage, and automatically inject scheduling constraints

Fair resource sharing techniques with Kueue to manage scarce GPU resources across different hardware types while supporting both specific and flexible allocation requests

Organizational change management approaches for gaining stakeholder buy-in, upskilling admin teams, and communicating policy changes to research users

Sponsor

This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/5sK7BFZ-8

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

August 12, 2025 at 06:00AM

·kube.fm·
How Policies Saved us a Thousand Headaches with Alessandro Pomponio
AI & DevOps Toolkit - AI Meets Kubernetes: Simplifying Developer and Ops Collaboration - https://www.youtube.com/watch?v=8Yzn-9qQpQI
AI & DevOps Toolkit - AI Meets Kubernetes: Simplifying Developer and Ops Collaboration - https://www.youtube.com/watch?v=8Yzn-9qQpQI

AI Meets Kubernetes: Simplifying Developer and Ops Collaboration

Platform engineers and developers often struggle to align on infrastructure needs, leading to platforms that miss the mark and endless iteration loops. But what if AI could bridge this gap? This video explores a three-way collaboration where developers express their requirements in natural language, platform engineers establish guardrails and constraints, and AI intelligently translates developer intent into precise, infrastructure-compliant deployments.

Watch a live demo of the DevOps AI Toolkit (dot-ai) MCP, a project that leverages AI to match developer requests with platform-engineered building blocks, automatically generating optimized Kubernetes configurations. Witness how conversational deployment simplifies and accelerates the deployment process, ensuring deployments meet organizational standards while empowering developers to effortlessly deploy applications tailored to their exact needs.

PlatformEngineering #Kubernetes #AI

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/app-management/ai-meets-kubernetes-simplifying-developer-and-ops-collaboration 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Platform Engineering with AI 01:35 Blacksmith (sponsor) 02:43 The Platform Engineering Problem 08:27 AI Deployment Magic 19:06 When AI Hits Limits 21:02 Beyond Simple Abstractions 23:20 DevOps AI Toolkit Explained

via YouTube https://www.youtube.com/watch?v=8Yzn-9qQpQI

·youtube.com·
AI & DevOps Toolkit - AI Meets Kubernetes: Simplifying Developer and Ops Collaboration - https://www.youtube.com/watch?v=8Yzn-9qQpQI
Bits from the Debian Project
Bits from the Debian Project
Debian is a free operating system (OS) for your computer. An operating system is the set of basic programs and utilities that make your computer run.
·bits.debian.org·
Bits from the Debian Project
AI Engineers and the Hot Vibe Code Summer
AI Engineers and the Hot Vibe Code Summer
Once we were developers, software engineers, and data scientists; but no more! Today we’re all AI Engineers, right? Right!?! Swyx’s recent AI Engineer World’s Fair conference surfaced a lot of enthusiasm around the term by drawing 3,000+ founders and engineers to San Francisco. Microsoft offers an “AI Engineer using Microsoft Azure” course through Udacity. But
·redmonk.com·
AI Engineers and the Hot Vibe Code Summer
Was just wondering if this would become a thing | Introducing Headlamp AI Assistant
Was just wondering if this would become a thing | Introducing Headlamp AI Assistant

Introducing Headlamp AI Assistant

https://kubernetes.io/blog/2025/08/07/introducing-headlamp-ai-assistant/

This announcement originally appeared on the Headlamp blog.

To simplify Kubernetes management and troubleshooting, we're thrilled to introduce Headlamp AI Assistant: a powerful new plugin for Headlamp that helps you understand and operate your Kubernetes clusters and applications with greater clarity and ease.

Whether you're a seasoned engineer or just getting started, the AI Assistant offers:

Fast time to value: Ask questions like "Is my application healthy?" or "How can I fix this?" without needing deep Kubernetes knowledge.

Deep insights: Start with high-level queries and dig deeper with prompts like "List all the problematic pods" or "How can I fix this pod?"

Focused & relevant: Ask questions in the context of what you're viewing in the UI, such as "What's wrong here?"

Action-oriented: Let the AI take action for you, like "Restart that deployment", with your permission.

Here is a demo of the AI Assistant in action as it helps troubleshoot an application running with issues in a Kubernetes cluster:

Hopping on the AI train

Large Language Models (LLMs) have transformed not just how we access data but also how we interact with it. The rise of tools like ChatGPT opened a world of possibilities, inspiring a wave of new applications. Asking questions or giving commands in natural language is intuitive, especially for users who aren't deeply technical. Now everyone can quickly ask how to do X or Y, without feeling awkward or having to traverse pages and pages of documentation like before.

Therefore, Headlamp AI Assistant brings a conversational UI to Headlamp, powered by LLMs that Headlamp users can configure with their own API keys. It is available as a Headlamp plugin, making it easy to integrate into your existing setup. Users can enable it by installing the plugin and configuring it with their own LLM API keys, giving them control over which model powers the assistant. Once enabled, the assistant becomes part of the Headlamp UI, ready to respond to contextual queries and perform actions directly from the interface.

Context is everything

As expected, the AI Assistant is focused on helping users with Kubernetes concepts. Yet, while there is a lot of value in responding to Kubernetes related questions from Headlamp's UI, we believe that the great benefit of such an integration is when it can use the context of what the user is experiencing in an application. So, the Headlamp AI Assistant knows what you're currently viewing in Headlamp, and this makes the interaction feel more like working with a human assistant.

For example, if a pod is failing, users can simply ask "What's wrong here?" and the AI Assistant will respond with the root cause, like a missing environment variable or a typo in the image name. Follow-up prompts like "How can I fix this?" allow the AI Assistant to suggest a fix, streamlining what used to take multiple steps into a quick, conversational flow.

Sharing the context from Headlamp is not a trivial task though, so it's something we will keep working on perfecting.

Tools

Context from the UI is helpful, but sometimes additional capabilities are needed. If the user is viewing the pod list and wants to identify problematic deployments, switching views should not be necessary. To address this, the AI Assistant includes support for a Kubernetes tool. This allows asking questions like "Get me all deployments with problems" prompting the assistant to fetch and display relevant data from the current cluster. Likewise, if the user requests an action like "Restart that deployment" after the AI points out what deployment needs restarting, it can also do that. In case of "write" operations, the AI Assistant does check with the user for permission to run them.

AI Plugins

Although the initial version of the AI Assistant is already useful for Kubernetes users, future iterations will expand its capabilities. Currently, the assistant supports only the Kubernetes tool, but further integration with Headlamp plugins is underway. Similarly, we could get richer insights for GitOps via the Flux plugin, monitoring through Prometheus, package management with Helm, and more.

And of course, as the popularity of MCP grows, we are looking into how to integrate it as well, for a more plug-and-play fashion.

Try it out!

We hope this first version of the AI Assistant helps users manage Kubernetes clusters more effectively and assist newcomers in navigating the learning curve. We invite you to try out this early version and give us your feedback. The AI Assistant plugin can be installed from Headlamp's Plugin Catalog in the desktop version, or by using the container image when deploying Headlamp. Stay tuned for the future versions of the Headlamp AI Assistant!

via Kubernetes Blog https://kubernetes.io/

August 07, 2025 at 03:00PM

·kubernetes.io·
Was just wondering if this would become a thing | Introducing Headlamp AI Assistant
Desktop Devlog - Markdown, Vim, local files and more - Desktop - Atuin Community
Desktop Devlog - Markdown, Vim, local files and more - Desktop - Atuin Community
0.0.93 fix: sql query overflow in fullscreen mode feat: support multiple sql queries in a single sql block feat: add Vim support to most codemirror blocks (enable in settings, all blocks coming soon) feat: add markdown exporting feat: add URL copy button to top bar Vim in code editors We’ve added Vim mode to the core code editors - mainly the script and terminal blocks. It’ll be coming to all blocks in the next release! Markdown export Available via File → Export → Markdown with any runbook o...
·forum.atuin.sh·
Desktop Devlog - Markdown, Vim, local files and more - Desktop - Atuin Community
Who do I know that's working on Crossplanr? | [PGSQL] Extend Grant kind to support more than database object · Issue #217 · crossplane-contrib/provider-sql
Who do I know that's working on Crossplanr? | [PGSQL] Extend Grant kind to support more than database object · Issue #217 · crossplane-contrib/provider-sql
Support schema,objects,objectType specifications on Grant kind to grand permission on other object than the database. This feature request will cover the following issues: #161 #72 #145 What proble...
·github.com·
Who do I know that's working on Crossplanr? | [PGSQL] Extend Grant kind to support more than database object · Issue #217 · crossplane-contrib/provider-sql
AI & DevOps Toolkit - Ep31 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=8TKvzwLIYSQ
AI & DevOps Toolkit - Ep31 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=8TKvzwLIYSQ

Ep31 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, regular guest, will be here to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=8TKvzwLIYSQ

·youtube.com·
AI & DevOps Toolkit - Ep31 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=8TKvzwLIYSQ
AI & DevOps Toolkit - MCP Servers Explained: Why Most Are Useless (And How to Fix It) - https://www.youtube.com/watch?v=7baGJ1bC9zE
AI & DevOps Toolkit - MCP Servers Explained: Why Most Are Useless (And How to Fix It) - https://www.youtube.com/watch?v=7baGJ1bC9zE

MCP Servers Explained: Why Most Are Useless (And How to Fix It)

95% of MCP servers are essentially a waste of time. Many are slower and more complex than the terminal tools agents can already use effectively. But the remaining 5% are game-changers that unlock new, powerful AI capabilities. This video reveals exactly why most MCPs fail, how to identify and avoid redundant architectures, and the right way to build MCPs that truly matter.

Using clear analogies and real-world examples, you'll learn how to design MCP servers that directly reflect user intentions, provide access to otherwise inaccessible services, and combine deterministic code with intelligent agent-driven workflows. By the end, you'll understand the critical architecture and data flow patterns that separate revolutionary MCP servers from redundant ones, enabling you to create AI agents that accomplish complex tasks with ease.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Outskill 👉 Grab your free seat to the 2-Day AI Mastermind: https://link.outskill.com/AIDEVAG2 🔐 100% Discount for the first 1000 people 💥 Dive deep into AI and Learn Automations, Build AI Agents, Make videos & images – all for free! 🎁 Bonuses worth $5100+ if you join and attend ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

AI #MCP #SoftwareArchitecture

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Model Context Protocol (MCP) 01:38 Outskill (sponsor) 03:09 What is the Model Context Protocol (MCP)? 07:06 Why Most MCPs Fail 13:24 MCP Architecture That Works 20:14 MCP Server Design Patterns 25:02 MCP Data Flow Patterns 37:24 Summary

via YouTube https://www.youtube.com/watch?v=7baGJ1bC9zE

·youtube.com·
AI & DevOps Toolkit - MCP Servers Explained: Why Most Are Useless (And How to Fix It) - https://www.youtube.com/watch?v=7baGJ1bC9zE
Anthropic Revokes OpenAI's Access to Claude
Anthropic Revokes OpenAI's Access to Claude
OpenAI lost access to the Claude API this week after Anthropic claimed the company was violating its terms of service.
·wired.com·
Anthropic Revokes OpenAI's Access to Claude