54924 bookmarks

Newest

Why wind farms attract so much misinformation and conspiracy theory

If you think climate change is a hoax, you might believe wind turbines poison groundwater.

1_r/devopsish

·arstechnica.com·Aug 24, 2025

Why wind farms attract so much misinformation and conspiracy theory

Developer gets 4 years for activating network “kill switch” to avenge his firing

Disgruntled developer was caught after naming the “kill switch” after himself.

1_r/devopsish

·arstechnica.com·Aug 24, 2025

Developer gets 4 years for activating network “kill switch” to avenge his firing

Why Did a $10 Billion Startup Let Me Vibe-Code for Them—and Why Did I Love It?

I spent two days at Notion and saw an industry in upheaval. I also shipped some actual code.

1_r/devopsish

·wired.com·Aug 24, 2025

Why Did a $10 Billion Startup Let Me Vibe-Code for Them—and Why Did I Love It?

To SSH is human, but that doesn’t mean we should - Sidero Labs

SSH is like opening the hood of your car while driving 70mph to adjust the engine. It works fine, until it doesn’t… Consider this: You go weeks with everything running smoothly. You follow the process, write great code, and don’t SSH into a node right before going to bed on Friday night. One day, an […]

1_r/devopsish

·siderolabs.com·Aug 21, 2025

To SSH is human, but that doesn’t mean we should - Sidero Labs

Play stupid games, win stupid prizes | Elon Musk must face lawsuit claiming he ran illegal $1 million election lottery

Elon Musk was ordered on Wednesday by a federal judge to face a lawsuit accusing him of defrauding voters into signing a petition for a chance to win his giveaway.

2_News

·cnbc.com·Aug 21, 2025

Play stupid games, win stupid prizes | Elon Musk must face lawsuit claiming he ran illegal $1 million election lottery

Last Week in Kubernetes Development - Week Ending August 17 2025

Week Ending August 17, 2025

https://lwkd.info/2025/20250820

Developer News

A medium-severity vulnerability (CVE-2025-5187, CVSS 6.7) affects Kubernetes clusters using the NodeRestriction admission controller without OwnerReferencesPermissionEnforcement. It allows a compromised node to delete its own Node object by patching OwnerReferences, then recreate it with altered taints or labels, bypassing normal delete restrictions. Update to the latest patch release (1.33.4, 1.32.8, or 1.31.12) to close this security hole.

Release Schedule

Next Deadline: Release day, 27 August

We are in the final week before releasing 1.34. Make sure to respond quickly to any blocker issues or test failures your SIG is tagged on.

Patch releases 1.33.4, 1.32.8, and 1.31.12 were published this week, built with Go 1.24.5 and 1.23.11 respectively. These patch releases primarily addresses an exploitable security hole so admins should update at the next availble downtime. Kubernetes 1.31 enters maintenance mode on Aug 28, 2025; the End of Life date for Kubernetes 1.31 is Oct 28, 2025.

Featured PRs

133409: Make podcertificaterequestcleaner role feature-gated

This PR restricts the creation of RBAC permissions for the podcertificaterequestcleaner controller behind a feature gate. The ClusterRole and ClusterRoleBinding for this controller are now only created when the related feature is enabled; This change helps reduce unnecessary permissions in clusters where the controller is not in use; It supports a more secure and minimal RBAC configuration by avoiding unused roles.

KEP of the Week

KEP 2340: Consistent Reads from Cache

This KEP introduces a mechanism to serve most reads from the watch cache while maintaining the same consistency guarantees as serving reads from etcd. Previously, the Get and List requests were guaranteed to be Consistent reads and were served from etcd using a “quorum read”. Serving reads from the watch cache is more performant and scalable than reading them from etcd, deserializing them, applying selectors, converting them to the desired version, and then garbage collecting all the objects that were allocated during the whole process.

This KEP is tracked for Stable in 1.34

Other Merges

Prevent data race around claimsToAllocate

Clarify staging repository READMEs

Version Updates

Bumped Go Version to 1.23.12 for publishing bot rules.

Bumped dependencies and images to Go 1.24.6 and distroless iptables

Subprojects and Dependency Updates

Ingress-NGINX v1.13.1 updates NGINX to v2.2.1, Go to v1.24.6, and includes bug fixes and improvements; Helm Chart v4.13.1 adds helm-test target and includes the updated controller

Shoutouts

Want to thank someone in the community? Drop a note in #shoutouts on Slack.

via Last Week in Kubernetes Development https://lwkd.info/

August 20, 2025 at 06:00PM

1_r/devopsish

·lwkd.info·Aug 20, 2025

Last Week in Kubernetes Development - Week Ending August 17 2025

Building a custom telescope mount with harmonic drives and ESP32

How I went from buying a €200 tracker to building a custom telescope mount with harmonic drives, ESP32, and way more engineering than necessary

1_r/devopsish

·svendewaerhert.com·Aug 20, 2025

Building a custom telescope mount with harmonic drives and ESP32

Top pediatricians buck RFK Jr.’s anti-vaccine meddling on COVID shot guidance

The American Academy of Pediatrics recommends children under age 2 get vaccinated.

2_News

·arstechnica.com·Aug 20, 2025

Top pediatricians buck RFK Jr.’s anti-vaccine meddling on COVID shot guidance

Tuning Linux Swap for Kubernetes: A Deep Dive

https://kubernetes.io/blog/2025/08/19/tuning-linux-swap-for-kubernetes-a-deep-dive/

The Kubernetes NodeSwap feature, likely to graduate to stable in the upcoming Kubernetes v1.34 release, allows swap usage: a significant shift from the conventional practice of disabling swap for performance predictability. This article focuses exclusively on tuning swap on Linux nodes, where this feature is available. By allowing Linux nodes to use secondary storage for additional virtual memory when physical RAM is exhausted, node swap support aims to improve resource utilization and reduce out-of-memory (OOM) kills.

However, enabling swap is not a "turn-key" solution. The performance and stability of your nodes under memory pressure are critically dependent on a set of Linux kernel parameters. Misconfiguration can lead to performance degradation and interfere with Kubelet's eviction logic.

In this blogpost, I'll dive into critical Linux kernel parameters that govern swap behavior. I will explore how these parameters influence Kubernetes workload performance, swap utilization, and crucial eviction mechanisms. I will present various test results showcasing the impact of different configurations, and share my findings on achieving optimal settings for stable and high-performing Kubernetes clusters.

Introduction to Linux swap

At a high level, the Linux kernel manages memory through pages, typically 4KiB in size. When physical memory becomes constrained, the kernel's page replacement algorithm decides which pages to move to swap space. While the exact logic is a sophisticated optimization, this decision-making process is influenced by certain key factors:

Page access patterns (how recently pages are accessed)

Page dirtyness (whether pages have been modified)

Memory pressure (how urgently the system needs free memory)

Anonymous vs File-backed memory

It is important to understand that not all memory pages are the same. The kernel distinguishes between anonymous and file-backed memory.

Anonymous memory: This is memory that is not backed by a specific file on the disk, such as a program's heap and stack. From the application's perspective this is private memory, and when the kernel needs to reclaim these pages, it must write them to a dedicated swap device.

File-backed memory: This memory is backed by a file on a filesystem. This includes a program's executable code, shared libraries, and filesystem caches. When the kernel needs to reclaim these pages, it can simply discard them if they have not been modified ("clean"). If a page has been modified ("dirty"), the kernel must first write the changes back to the file before it can be discarded.

While a system without swap can still reclaim clean file-backed pages memory under pressure by dropping them, it has no way to offload anonymous memory. Enabling swap provides this capability, allowing the kernel to move less-frequently accessed memory pages to disk to conserve memory to avoid system OOM kills.

Key kernel parameters for swap tuning

To effectively tune swap behavior, Linux provides several kernel parameters that can be managed via sysctl.

vm.swappiness: This is the most well-known parameter. It is a value from 0 to 200 (100 in older kernels) that controls the kernel's preference for swapping anonymous memory pages versus reclaiming file-backed memory pages (page cache).

High value (eg: 90+): The kernel will be aggressive in swapping out less-used anonymous memory to make room for file-cache.

Low value (eg: < 10): The kernel will strongly prefer dropping file cache pages over swapping anonymous memory.

vm.min_free_kbytes: This parameter tells the kernel to keep a minimum amount of memory free as a buffer. When the amount of free memory drops below the this safety buffer, the kernel starts more aggressively reclaiming pages (swapping, and eventually handling OOM kills).

Function: It acts as a safety lever to ensure the kernel has enough memory for critical allocation requests that cannot be deferred.

Impact on swap: Setting a higher min_free_kbytes effectively raises the floor for for free memory, causing the kernel to initiate swap earlier under memory pressure.

vm.watermark_scale_factor: This setting controls the gap between different watermarks: min, low and high, which are calculated based on min_free_kbytes.

Watermarks explained:

low: When free memory is below this mark, the kswapd kernel process wakes up to reclaim pages in the background. This is when a swapping cycle begins.

min: When free memory hits this minimum level, then aggressive page reclamation will block process allocation. Failing to reclaim pages will cause OOM kills.

high: Memory reclamation stops once the free memory reaches this level.

Impact: A higher watermark_scale_factor careates a larger buffer between the low and min watermarks. This gives kswapd more time to reclaim memory gradually before the system hits a critical state.

In a typical server workload, you might have a long-running process with some memory that becomes 'cold'. A higher swappiness value can free up RAM by swapping out the cold memory, for other active processes that can benefit from keeping their file-cache.

Tuning the min_free_kbytes and watermark_scale_factor parameters to move the swapping window early will give more room for kswapd to offload memory to disk and prevent OOM kills during sudden memory spikes.

Swap tests and results

To understand the real-impact of these parameters, I designed a series of stress tests.

Test setup

Environment: GKE on Google Cloud

Kubernetes version: 1.33.2

Node configuration: n2-standard-2 (8GiB RAM, 50GB swap on a pd-balanced disk, without encryption), Ubuntu 22.04

Workload: A custom Go application designed to allocate memory at a configurable rate, generate file-cache pressure, and simulate different memory access patterns (random vs sequential).

Monitoring: A sidecar container capturing system metrics every second.

Protection: Critical system components (kubelet, container runtime, sshd) were prevented from swapping by setting memory.swap.max=0 in their respective cgroups.

Test methodology

I ran a stress-test pod on nodes with different swappiness settings (0, 60, and 90) and varied the min_free_kbytes and watermark_scale_factor parameters to observe the outcomes under heavy memory allocation and I/O pressure.

Visualizing swap in action

The graph below, from a 100MBps stress test, shows swap in action. As free memory (in the "Memory Usage" plot) decreases, swap usage (Swap Used (GiB)) and swap-out activity (Swap Out (MiB/s)) increase. Critically, as the system relies more on swap, the I/O activity and corresponding wait time (IO Wait % in the "CPU Usage" plot) also rises, indicating CPU stress.

Findings

My initial tests with default kernel parameters (swappiness=60, min_free_kbytes=68MB, watermark_scale_factor=10) quickly led to OOM kills and even unexpected node restarts under high memory pressure. With selecting appropriate kernel parameters a good balance in node stability and performance can be achieved.

The impact of swappiness

The swappiness parameter directly influences the kernel's choice between reclaiming anonymous memory (swapping) and dropping page cache. To observe this, I ran a test where one pod generated and held file-cache pressure, followed by a second pod allocating anonymous memory at 100MB/s, to observe the kernel preference on reclaim:

My findings reveal a clear trade-off:

swappiness=90: The kernel proactively swapped out the inactive anonymous memory to keep the file cache. This resulted in high and sustained swap usage and significant I/O activity ("Blocks Out"), which in turn caused spikes in I/O wait on the CPU.

swappiness=0: The kernel favored dropping file-cache pages delaying swap consumption. However, it's critical to understand that this does not disable swapping. When memory pressure was high, the kernel still swapped anonymous memory to disk.

The choice is workload-dependent. For workloads sensitive to I/O latency, a lower swappiness is preferable. For workloads that rely on a large and frequently accessed file cache, a higher swappiness may be beneficial, provided the underlying disk is fast enough to handle the load.

Tuning watermarks to prevent eviction and OOM kills

The most critical challenge I encountered was the interaction between rapid memory allocation and Kubelet's eviction mechanism. When my test pod, which was deliberately configured to overcommit memory, allocated it at a high rate (e.g., 300-500 MBps), the system quickly ran out of free memory.

With default watermarks, the buffer for reclamation was too small. Before kswapd could free up enough memory by swapping, the node would hit a critical state, leading to two potential outcomes:

Kubelet eviction If kubelet's eviction manager detected memory.available was below its threshold, it would evict the pod.

OOM killer In some high-rate scenarios, the OOM Killer would activate before eviction could complete, sometimes killing higher priority pods that were not the source of the pressure.

To mitigate this I tuned the watermarks:

Increased min_free_kbytes to 512MiB: This forces the kernel to start reclaiming memory much earlier, providing a larger safety buffer.

Increased watermark_scale_factor to 2000: This widened the gap between the low and high watermarks (from ≈337MB to ≈591MB in my test node's /proc/zoneinfo), effectively increasing the swapping window.

This combination gave kswapd a larger operational zone and more time to swap pages to disk during memory spikes, successfully preventing both premature evictions and OOM kills in my test runs.

Table compares watermark levels from /proc/zoneinfo (Non-NUMA node):

min_free_kbytes=67584KiB and watermark_scale_factor=10

min_free_kbytes=524288KiB and watermark_scale_factor=2000

Node 0, zone Normal pages free 583273 boost 0 min 10504 low

1_r/devopsish

·kubernetes.io·Aug 19, 2025

Tuning Linux Swap for Kubernetes: A Deep Dive

Building a Carbon and Price-Aware Kubernetes Scheduler with Dave Masselink

Building a Carbon and Price-Aware Kubernetes Scheduler, with Dave Masselink

https://ku.bz/zk2xM1lfW

Data centers consume over 4% of global electricity and this number is projected to triple in the next few years due to AI workloads.

Dave Masselink, founder of Compute Gardener, discusses how he built a Kubernetes scheduler that makes scheduling decisions based on real-time carbon intensity data from power grids.

You will learn:

How carbon-aware scheduling works - Using real-time grid data to shift workloads to periods when electricity generation has lower carbon intensity, without changing energy consumption

Technical implementation details - Building custom Kubernetes schedulers using the scheduler plugin framework, including pre-filter and filter stages for carbon and time-of-use pricing optimization

Energy measurement strategies - Approaches for tracking power consumption across CPUs, memory, and GPUs

Sponsor

This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/zk2xM1lfW

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

August 19, 2025 at 06:00AM

1_r/devopsish

·kube.fm·Aug 19, 2025

Building a Carbon and Price-Aware Kubernetes Scheduler with Dave Masselink

The real danger of systemd-coredump CVE-2025-4598 | CIQ

TL;DR: A critical vulnerability in systemd-coredump remains unfixed in Enterprise Linux 9, allowing attackers to steal password hashes and cryptographic keys within seconds - but Rocky Linux from CIQ…

1_r/devopsish

·ciq.com·Aug 18, 2025

The real danger of systemd-coredump CVE-2025-4598 | CIQ

AI & DevOps Toolkit - AI Will Replace Coders - But Not the Way You Think - https://www.youtube.com/watch?v=qBp8d6yBPPg

AI Will Replace Coders - But Not the Way You Think

After three decades in tech, I've never seen developers this terrified, and for good reason. AI can already write code faster than us, and it's rapidly approaching the point where it might write better code too. But here's what's driving me crazy: everyone is panicking about the wrong thing. They're worried AI will steal their jobs because it can code, which is like a chef fearing unemployment because someone invented a better knife.

Your real value was never in typing syntax or executing commands; that's just the mechanical stuff that happens after all the important thinking is done. The developers who will thrive aren't trying to out-code AI; they're the architects, problem-solvers, and domain experts who understand what needs to be built and why. Your deep knowledge of your industry, your business context, and the messy realities of how things actually work? That's your moat. AI doesn't know why your healthcare platform needs that weird HIPAA workaround, or why your e-commerce flow accommodates that legacy client system. Stop being a code monkey and start being the expert AI needs to not screw everything up. The choice is yours, but the clock is ticking.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Readdy 🔗 https://readdy.ai ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

AIandDevelopers #FutureOfCoding #TechCareerAdvice

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/ai-will-replace-coders---but-not-the-way-you-think

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Coding with AI 01:24 Sponsor (Readdy) 02:51 The Fear: AI Replacing Developers 06:50 The Truth: What Developers Really Do 13:54 The Secret Weapon: Your Domain Knowledge is Your Moat 19:16 The Adaptation: Thriving with AI

via YouTube https://www.youtube.com/watch?v=qBp8d6yBPPg

1_r/devopsish

·youtube.com·Aug 18, 2025

AI & DevOps Toolkit - AI Will Replace Coders - But Not the Way You Think - https://www.youtube.com/watch?v=qBp8d6yBPPg

Container Security: Techniques, Misconfigurations, and Attack Path

Investigate container security by exploring attack paths and misconfigurations in Docker and Kubernetes to improve security practices

1_r/devopsish

·offensivebytes.com·Aug 16, 2025

Container Security: Techniques, Misconfigurations, and Attack Path

yokecd/yoke: Kubernetes Package Management as Code; infrastructure as code, but actually.

Kubernetes Package Management as Code; infrastructure as code, but actually. - yokecd/yoke

1_r/devopsish

·github.com·Aug 16, 2025

yokecd/yoke: Kubernetes Package Management as Code; infrastructure as code, but actually.

What we learned, and didn't learn, from the Michigan report - ESPN

The NCAA delivered its punishment to Michigan and Connor Stalions along with a 74-page explanation.

7_Sports

·espn.com·Aug 15, 2025

What we learned, and didn't learn, from the Michigan report - ESPN

Hackers Hijacked Google’s Gemini AI With a Poisoned Calendar Invite to Take Over a Smart Home

For likely the first time ever, security researchers have shown how AI can be hacked to create real world havoc, allowing them to turn off lights, open smart shutters, and more.

1_r/devopsish

·wired.com·Aug 15, 2025

Hackers Hijacked Google’s Gemini AI With a Poisoned Calendar Invite to Take Over a Smart Home

We Rewrote the Ghostty GTK Application

1_r/devopsish

·mitchellh.com·Aug 15, 2025

We Rewrote the Ghostty GTK Application

Last Week in Kubernetes Development - Week Ending August 10 2025

Week Ending August 10, 2025

https://lwkd.info/2025/20250814

Developer News

Do you run multiple Kubernetes clusters in your organization? SIG-Multicluster would love to have you answer their survey so that they can decide the SIG’s priorities.

The Kubernetes SIG Release main meeting has been rescheduled, based on a community vote, to a fixed time slot of Thursdays from 2:30–3:15 pm UTC, starting August 21, 2025, and will now follow a bi-weekly cadence. The previously alternating meeting times are discontinued, and the next scheduled meeting before the change has been canceled. The time is tied to UTC, so participants in regions with Daylight Saving Time may see a ±1 hour shift. The calendar has been updated, and questions or feedback can be shared via email or the #sig-release Slack channel.

Release Schedule

Next Deadline: Release day, 27 August

We are currently in Docs Freeze

Kubernetes v1.34.0-rc.0 was released, followed by v1.34.0-rc.1 to address a critical bug fix

Cherry-pick deadlines for the upcoming patch releases 1.33.4, 1.32.8, and 1.31.12 have passed. These patch releases are expected on August 12.

KEP of the Week

KEP 5080: Ordered Namespace Deletion

This KEP introduces a deterministic, security-aware ordered deletion of all resources within a namespace. Previously, namespace deletion could remove resources in a non-deterministic order that could lead to awkward or risky gaps. When deleting a namespace with the OrderedNamespaceDeletion Feature Flag enabled, Kubernetes tears down namespace objects in waves, so that Pods go first and the crucial resources likeNetworkPolicy Don’t disappear while Pods are still running

This KEP is tracked for Stable in 1.34

Other Merges

NodeRestriction to prevent nodes from updating their OwnerReferences

Etcd metrics uses Delete() instead of DeleteLabelValues()

Enable publishing-bot support for v1.34 branch

Prerelease lifecycle for PodCertificateRequest is fixed

Demote KEP-5278 feature gates for ClearingNominatedNodeNameAfterBinding and NominatedNodeNameForExpectation to Alpha

podcertificaterequestcleaner role is now behind a feature-gate

Deprecated

Deprecated Version is removed for api_server_storage_objects

Subprojects and Dependency Updates

coredns/coredns v1.12.3 improves plugin reliability, adds Kubernetes plugin startup timeout, updates route53 to AWS SDK v2, and fixes race conditions

cluster autoscaler chart v9.50.0 scales Kubernetes worker nodes within autoscaling groups

cluster-api v1.11.0-rc.0 for testing

cluster-api-provider-vsphere v1.14.0-rc.0 for testing

gRPC Core 1.74.1 (gee): patch release for grpc/ruby

kOps v1.34.0-alpha.1 introduces new features and significant bug fixes for Azure, adds experimental IPv6 support for bare-metal, and updates key dependencies like containerd and etcd

kompose 1.37.0 includes code refactoring for simplification and updates several core dependencies for improved stability and performance

Shoutouts

Want to thank someone in the community? Drop a note in #shoutouts on Slack.

via Last Week in Kubernetes Development https://lwkd.info/

August 14, 2025 at 03:00PM

1_r/devopsish

·lwkd.info·Aug 14, 2025

Last Week in Kubernetes Development - Week Ending August 10 2025

Introducing gpt-oss | OpenAI

gpt-oss-120b and gpt-oss-20b push the frontier of open-weight reasoning models

1_r/devopsish

·openai.com·Aug 14, 2025

Introducing gpt-oss | OpenAI

TIL Kylie Robison has a newsletter

WIRED's premium newsletters will showcase top-quality reporting and analysis, written by correspondents who are deeply sourced experts in their field.

1_r/devopsish

·wired.com·Aug 14, 2025

TIL Kylie Robison has a newsletter

Storage news round-up - 11 August – Blocks and Files

Open-source storage hardware supplier 45Drives has a strategic partnership with LINBIT, creators of DRBD and LINSTOR, to deliver fully-integrated, enterprise-grade high-availability (HA) storage systems built on open-source technologies. It unites “Drives’ radically transparent hardware model with LINBIT’s production-grade software stack that powers HA deployments for companies like Apple, IBM, and Amazon.” … Cyber data protector […]

1_r/devopsish

·blocksandfiles.com·Aug 14, 2025

Storage news round-up - 11 August – Blocks and Files

Governance Part 3: New Contributors and Pathways to Leadership | Fast Wonder

1_r/devopsish

·fastwonderblog.com·Aug 13, 2025

Governance Part 3: New Contributors and Pathways to Leadership | Fast Wonder

Three notorious cybercrime gangs appear to be collaborating

: Scattered Spider, ShinyHunters, and Lapsus$ spent the weekend bragging to each other on a Telegram channel

1_r/devopsish

·theregister.com·Aug 12, 2025

Three notorious cybercrime gangs appear to be collaborating

AI & DevOps Toolkit - Ep32 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=GjV3PtqVP9Q

Ep32 - Ask Me Anything About Anything

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=GjV3PtqVP9Q

1_r/devopsish

·youtube.com·Aug 12, 2025

AI & DevOps Toolkit - Ep32 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=GjV3PtqVP9Q

How Policies Saved us a Thousand Headaches with Alessandro Pomponio

How Policies Saved us a Thousand Headaches, with Alessandro Pomponio

https://ku.bz/5sK7BFZ-8

Alessandro Pomponio from IBM Research explains how his team transformed their chaotic bare-metal clusters into a well-governed, self-service platform for AI and scientific workloads. He walks through their journey from manual cluster interventions to a fully automated GitOps-first architecture using ArgoCD, Kyverno, and Kueue to handle everything from policy enforcement to GPU scheduling.

You will learn:

How to implement GitOps workflows that reduce administrative burden while maintaining governance and visibility across multi-tenant research environments

Practical policy enforcement strategies using Kyverno to prevent GPU monopolization, block interactive pod usage, and automatically inject scheduling constraints

Fair resource sharing techniques with Kueue to manage scarce GPU resources across different hardware types while supporting both specific and flexible allocation requests

Organizational change management approaches for gaining stakeholder buy-in, upskilling admin teams, and communicating policy changes to research users

Sponsor

This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/5sK7BFZ-8

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

August 12, 2025 at 06:00AM

1_r/devopsish

·kube.fm·Aug 12, 2025

How Policies Saved us a Thousand Headaches with Alessandro Pomponio

AI & DevOps Toolkit - AI Meets Kubernetes: Simplifying Developer and Ops Collaboration - https://www.youtube.com/watch?v=8Yzn-9qQpQI

AI Meets Kubernetes: Simplifying Developer and Ops Collaboration

Platform engineers and developers often struggle to align on infrastructure needs, leading to platforms that miss the mark and endless iteration loops. But what if AI could bridge this gap? This video explores a three-way collaboration where developers express their requirements in natural language, platform engineers establish guardrails and constraints, and AI intelligently translates developer intent into precise, infrastructure-compliant deployments.

Watch a live demo of the DevOps AI Toolkit (dot-ai) MCP, a project that leverages AI to match developer requests with platform-engineered building blocks, automatically generating optimized Kubernetes configurations. Witness how conversational deployment simplifies and accelerates the deployment process, ensuring deployments meet organizational standards while empowering developers to effortlessly deploy applications tailored to their exact needs.

PlatformEngineering #Kubernetes #AI

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/app-management/ai-meets-kubernetes-simplifying-developer-and-ops-collaboration 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to Platform Engineering with AI 01:35 Blacksmith (sponsor) 02:43 The Platform Engineering Problem 08:27 AI Deployment Magic 19:06 When AI Hits Limits 21:02 Beyond Simple Abstractions 23:20 DevOps AI Toolkit Explained

via YouTube https://www.youtube.com/watch?v=8Yzn-9qQpQI

1_r/devopsish

·youtube.com·Aug 11, 2025

AI & DevOps Toolkit - AI Meets Kubernetes: Simplifying Developer and Ops Collaboration - https://www.youtube.com/watch?v=8Yzn-9qQpQI

Bits from the Debian Project

Debian is a free operating system (OS) for your computer. An operating system is the set of basic programs and utilities that make your computer run.

1_r/devopsish

·bits.debian.org·Aug 10, 2025

Bits from the Debian Project

AI Engineers and the Hot Vibe Code Summer

Once we were developers, software engineers, and data scientists; but no more! Today we’re all AI Engineers, right? Right!?! Swyx’s recent AI Engineer World’s Fair conference surfaced a lot of enthusiasm around the term by drawing 3,000+ founders and engineers to San Francisco. Microsoft offers an “AI Engineer using Microsoft Azure” course through Udacity. But

1_r/devopsish

·redmonk.com·Aug 10, 2025

AI Engineers and the Hot Vibe Code Summer

The Future of Product Management Is AI-Native

Takeaways from My Conversation with Marily Nika

1_r/devopsish

·oreilly.com·Aug 9, 2025

The Future of Product Management Is AI-Native

Was just wondering if this would become a thing | Introducing Headlamp AI Assistant

Introducing Headlamp AI Assistant

https://kubernetes.io/blog/2025/08/07/introducing-headlamp-ai-assistant/

This announcement originally appeared on the Headlamp blog.

To simplify Kubernetes management and troubleshooting, we're thrilled to introduce Headlamp AI Assistant: a powerful new plugin for Headlamp that helps you understand and operate your Kubernetes clusters and applications with greater clarity and ease.

Whether you're a seasoned engineer or just getting started, the AI Assistant offers:

Fast time to value: Ask questions like "Is my application healthy?" or "How can I fix this?" without needing deep Kubernetes knowledge.

Deep insights: Start with high-level queries and dig deeper with prompts like "List all the problematic pods" or "How can I fix this pod?"

Focused & relevant: Ask questions in the context of what you're viewing in the UI, such as "What's wrong here?"

Action-oriented: Let the AI take action for you, like "Restart that deployment", with your permission.

Here is a demo of the AI Assistant in action as it helps troubleshoot an application running with issues in a Kubernetes cluster:

Hopping on the AI train

Large Language Models (LLMs) have transformed not just how we access data but also how we interact with it. The rise of tools like ChatGPT opened a world of possibilities, inspiring a wave of new applications. Asking questions or giving commands in natural language is intuitive, especially for users who aren't deeply technical. Now everyone can quickly ask how to do X or Y, without feeling awkward or having to traverse pages and pages of documentation like before.

Therefore, Headlamp AI Assistant brings a conversational UI to Headlamp, powered by LLMs that Headlamp users can configure with their own API keys. It is available as a Headlamp plugin, making it easy to integrate into your existing setup. Users can enable it by installing the plugin and configuring it with their own LLM API keys, giving them control over which model powers the assistant. Once enabled, the assistant becomes part of the Headlamp UI, ready to respond to contextual queries and perform actions directly from the interface.

Context is everything

As expected, the AI Assistant is focused on helping users with Kubernetes concepts. Yet, while there is a lot of value in responding to Kubernetes related questions from Headlamp's UI, we believe that the great benefit of such an integration is when it can use the context of what the user is experiencing in an application. So, the Headlamp AI Assistant knows what you're currently viewing in Headlamp, and this makes the interaction feel more like working with a human assistant.

For example, if a pod is failing, users can simply ask "What's wrong here?" and the AI Assistant will respond with the root cause, like a missing environment variable or a typo in the image name. Follow-up prompts like "How can I fix this?" allow the AI Assistant to suggest a fix, streamlining what used to take multiple steps into a quick, conversational flow.

Sharing the context from Headlamp is not a trivial task though, so it's something we will keep working on perfecting.

Tools

Context from the UI is helpful, but sometimes additional capabilities are needed. If the user is viewing the pod list and wants to identify problematic deployments, switching views should not be necessary. To address this, the AI Assistant includes support for a Kubernetes tool. This allows asking questions like "Get me all deployments with problems" prompting the assistant to fetch and display relevant data from the current cluster. Likewise, if the user requests an action like "Restart that deployment" after the AI points out what deployment needs restarting, it can also do that. In case of "write" operations, the AI Assistant does check with the user for permission to run them.

AI Plugins

Although the initial version of the AI Assistant is already useful for Kubernetes users, future iterations will expand its capabilities. Currently, the assistant supports only the Kubernetes tool, but further integration with Headlamp plugins is underway. Similarly, we could get richer insights for GitOps via the Flux plugin, monitoring through Prometheus, package management with Helm, and more.

And of course, as the popularity of MCP grows, we are looking into how to integrate it as well, for a more plug-and-play fashion.

Try it out!

We hope this first version of the AI Assistant helps users manage Kubernetes clusters more effectively and assist newcomers in navigating the learning curve. We invite you to try out this early version and give us your feedback. The AI Assistant plugin can be installed from Headlamp's Plugin Catalog in the desktop version, or by using the container image when deploying Headlamp. Stay tuned for the future versions of the Headlamp AI Assistant!

via Kubernetes Blog https://kubernetes.io/

August 07, 2025 at 03:00PM

1_r/devopsish

·kubernetes.io·Aug 9, 2025

Was just wondering if this would become a thing | Introducing Headlamp AI Assistant