1_r/devopsish
Why Your Infrastructure AI Sucks (And How to Fix It)
Discover why your AI agent is completely failing at infrastructure management and learn to build an AI-powered Internal Developer Platform that actually works. Most organizations are treating AI like a search engine, asking vague questions and getting generic answers that break in production. This video reveals the five critical components that transform useless AI into intelligent infrastructure automation.
You'll learn to build capabilities discovery using Vector databases for semantic search across Kubernetes resources, capture organizational patterns from tribal knowledge and documentation, create enforceable policies that guide AI toward compliance, implement proper context management to avoid the bloated mess most systems become, and design intelligent workflows that guide users to the right solutions instead of relying on guesswork. Watch as we demonstrate the complete transformation from a generic AI response to a fully functional PostgreSQL deployment that follows organizational patterns, enforces compliance policies, and deploys correctly the first time.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Tuple 🔗 https://tuple.app/DOT 👉 Promo code: DOT2025 ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
AIInfrastructure #InternalDeveloperPlatform #KubernetesAI
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/internal-developer-platforms/why-your-infrastructure-ai-sucks-and-how-to-fix-it 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai 🎬 Stop Blaming AI: Vector DBs + RAG = Game Changer: https://youtu.be/zqpJr1qZhTg 🎬 Why Kubernetes Discovery Sucks for AI (And How Vector DBs Fix It): https://youtu.be/MSNstHj4rmk
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 AI for Infrastructure Challenges 01:42 Tuple (sponsor) 03:16 Why Your AI Agent Is Useless 09:52 Kubernetes API Discovery That Actually Works 13:41 Organizational Knowledge AI Can Actually Use 17:49 Stop Breaking Production With AI 22:17 The Context Window Disaster Nobody Talks About 25:16 Smart Conversations That Get Results 29:34 Your Complete AI-Powered IDP Blueprint
via YouTube https://www.youtube.com/watch?v=Ma3gKmuXahc
The Making of Flux: The Scale, a KubeFM Original Series
In this episode, Philippe Ensarguet, VP of Software Engineering at Orange, and Arnab Chatterjee, Global Head of Container & AI Platforms at Nomura, share how large enterprises are adopting Flux to drive reliable, compliant, and scalable platforms.
How Orange uses Flux to manage bare-metal Kubernetes through its SYLVR project.
Why Nomura relies on GitOps to balance agility with governance in financial services.
How Flux helps enterprises achieve resilience, compliance, and repeatability at scale.
Sponsor
Join the Flux maintainers and community at FluxCon, November 11th in Atlanta—register here
More info
Find all the links and info for this episode here: https://ku.bz/tWcHlJm7M
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
October 13, 2025 at 06:00AM
Week Ending October 5, 2025
https://lwkd.info/2025/20251010
Developer News
Joaquim Rocha has been nominated to be one of the new SIG UI leads. Congrats Joaquim!
Folks are discussing the deprecation of cgroups v1. Find the whole discussion in the mailing list here
There are some updates in the release informing and blocking jobs to improve alpha/beta coverage. Find the full list of jobs moved to release informing and blocking status here
Release Schedule
Next Deadline: Enhancements Freeze, October 16
All enhancements are expected to have met the requirements by the freeze. Those that don’t meet the requirements will be removed from the milestone and will require an Exception.
Kubernetes v1.35.0-alpha.1 is out!
The cherry-pick deadline for patch releases is Oct 10.
Steering Committee Election
The Steering Committee Election voting ends on Friday, 24th October, AoE. You can check your eligibility to vote in the voting app, and file an exception request by October 22 if you need an exception. Don’t forget to cast your votes if you haven’t already!
Featured PRs
133697: Codify feature gate dependencies
With this PR, feature gate dependencies can be explicitly declared and enforced. This has been ad-hoc or implicit in the past. Components will now refuse to start if a feature is enabled without its required dependencies. Feature Owners should review the backfilled dependencies, while users who manually toggle feature gates must ensure dependent features are also enabled—especially noting that AllAlpha=true now requires AllBeta=true or equivalent beta features to be set.
KEP of the Week
KEP 859: Include kubectl command metadata in http request headers
This KEP aims to add extra HTTP headers to kubectl requests sent to the Kubernetes apiserver. These headers would share details such as which kubectl command was used, the flags included, a session ID, and whether the command is deprecated. This would help cluster administrators understand how users interact with the cluster, making it easier to debug issues, track usage, and gather insights, without exposing any sensitive data.
This KEP is tracked for GA in v1.35
Other Merges
Disable SchedulerAsyncAPICalls feature gate to prevent scheduler performance issues under high API server load.
Add path normalization to error matcher for improved field validation.
DeviceClass now enforces a maximum of 32 selectors and configs via declarative validation.
Add declarative validation +k8s:maxItems tag to ResourceClaim
HPA controller now exposes desired_replicas metric to track scaling history.
Fix preemptor pod behavior to prevent endless scheduling loops during slow victim deletion.
Feature gate dependencies are now explicit and validated at startup, preventing enabling a feature if its dependencies are disabled.
kube-scheduler introduces lightweight AssumeCache in VolumeBinding plugin to fix occasional pod scheduling delays.
Version Updates
etcd to v3.6.5
Subprojects and Dependency Updates
cluster-api v1.11.2 extends Kubernetes support to v1.34 for both management and workload clusters, adds CoreDNS migration v1.0.28, and introduces Metal3 as an IPAM provider.
cluster-api v1.10.7 adds Kubernetes v1.33 compatibility and updates CoreDNS migration to v1.0.28.
coredns v1.13.1 updates Go to v1.25.2 to address security issues, improves performance, and enhances the sign plugin by rejecting invalid UTF-8 tokens.
coredns v1.13.0 introduces a new Nomad plugin, fixes Corefile loop and import issues, improves shutdown handling, and hardens gRPC and reload behavior.
containerd API v1.10.0-beta.1 adds a mount manager and aligns with containerd 2.2 APIs (pre-release).
kOps v1.34.0-beta.1 updates AWS and Azure components (VPC CNI v1.20.2, Cilium v1.18.2, Calico v3.30.3), upgrades etcd to v3.6.5, drops Canal support, and removes Kubernetes 1.28 compatibility.
autoscaler vertical-pod-autoscaler v1.5.1 updates the default VPA version and client-go dependency to improve stability.
autoscaler cluster-autoscaler-chart v0.1.1 introduces automatic resource adjustment for workloads through Helm.
csi-driver-nfs v4.12.0 updates Go to 1.24, fixes a goroutine leak, and adds support for creating multiple storage classes with Helm.
csi-driver-smb v1.19.0 improves secret handling with special characters, updates CSI sidecars and resizer to v1.14.0, and adds Helm support for multiple storage classes.
headlamp v0.36.0 adds support for EndpointSlice resources, label-based search, and clipboard copy for resource names. It improves table sorting memory, standardizes resource naming, and enhances Helm charts with optional PodDisruptionBudget, backend TLS termination, and security context updates. The release also fixes several UI issues, improves plugin management, and updates shipped Prometheus and App Catalog plugins.
Shoutouts
Drew Hagen – I’d like to take a moment to acknowledge @Matteo for the seriously impressive leadership of a newer release branch management shadow program for the 1.34 release, and all the amazing work putting together strong documentation for branch management!! I remember my experience releasing alpha 3 being very clear what to do and going really smooth. Very little tribal knowledge. And we did most releases async, which I think speaks to how strong this handbook is. I thank you for still being around to observe and help, even if it meant some later nights in your time zone. @xmudrii @jimangel Great work! Y’all have set the foundation for many more cycles to come. Thank you for all of your patience, guidance and support. It was really great learning and working with you all @Angelos Kolaitis @satyampsoni
via Last Week in Kubernetes Development https://lwkd.info/
October 10, 2025 at 09:03AM
Ep36 - Ask Me Anything About Anything
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=iZoTwl8BWCI
How We Integrated Native macOS Workloads with Kubernetes, with Vitalii Horbachov
Vitalii Horbachov explains how Agoda built macOS VZ Kubelet, a custom solution that registers macOS hosts as Kubernetes nodes and spins up macOS VMs using Apple's native virtualization framework. He details their journey from managing 200 Mac minis with bash scripts to a Kubernetes-native approach that handles 20,000 iOS tests at scale.
You will learn:
How to build hybrid runtime pods that combine macOS VMs with Docker sidecar containers for complex CI/CD workflows
Custom OCI image format implementation for managing 55-60GB macOS VM images with layered copy-on-write disks and digest validation
Networking and security challenges including Apple entitlements, direct NIC access, and implementing kubectl exec over SSH
Real-world adoption considerations including MDM-based host lifecycle management and the build vs. buy decision for Apple infrastructure at scale
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/q_JS76SvM
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
October 07, 2025 at 06:00AM
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/
Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.
Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.
The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.
The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.
Map view of Karpenter Resources and how they relate to Kubernetes resources
Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.
Visualization of Karpenter Metrics
Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .
Scaling decisions
Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.
Config editor with validation support
Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.
Real time view of Karpenter resources
View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.
Dashboard for Pending Pods
View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.
Karpenter Providers
This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.
Provider Name
Tested
Extra provider specific info supported
AWS
✅
✅
Azure
✅
✅
AlibabaCloud
❌
❌
Bizfly Cloud
❌
❌
Cluster API
❌
❌
GCP
❌
❌
Proxmox
❌
❌
Oracle Cloud Infrastructure (OCI)
❌
❌
Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).
How to use
Please see the plugins/karpenter/README.md for instructions on how to use.
Feedback and Questions
Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.
via Kubernetes Blog https://kubernetes.io/
October 05, 2025 at 08:00PM
Kubernetes Controllers Deep Dive: How They Really Work
Most people using Kubernetes know how to write YAML and run kubectl apply, but when things break, they're completely lost. The secret they're missing? Understanding controllers - the beating heart that makes Kubernetes actually work. Controllers are what automatically restart your crashed pods, scale your applications, and make custom resources feel native to the platform.
This video dives deep into the real mechanics of how Kubernetes controllers operate. You'll discover how controllers consume and emit events to coordinate with each other, how the reconciliation loop continuously maintains your desired state, and how the Watch API efficiently streams changes without overwhelming the system. We'll explore custom resource definitions that extend Kubernetes, controller communication patterns, and the event-driven architecture that makes everything self-healing. Whether you're debugging cluster issues or building your own controllers, this knowledge will transform how you think about Kubernetes from just throwing YAML at the wall to truly understanding the orchestration engine underneath.
KubernetesControllers #Kubernetes #DevOpsEngineering
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/kubernetes-controllers-deep-dive-how-they-really-work 🔗 Kubernetes: https://kubernetes.io
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Controllers Deep Dive 01:18 Kubernetes Control Loops Explained 04:12 How Kubernetes Controllers Watch Events 07:35 Kubernetes Event Emission 11:56 Kubernetes Reconciliation Loop 17:12 Kubernetes Watch API 21:01 Kubernetes Custom Resource Definitions (CRDs) 21:13 Kubernetes Controller Communication 25:22 Kubernetes Controllers Mastery
via YouTube https://www.youtube.com/watch?v=kss081c8EqY
The Making of Flux: The Rewrite, a KubeFM Original Series
In this episode, Michael Bridgen (the engineer who wrote Flux's first lines) and Stefan Prodan (the maintainer who led the V2 rewrite) share how Flux grew from a fragile hack-day script into a production-grade GitOps toolkit.
How early Flux addressed the risks of manual, unsafe Kubernetes upgrades
Why the complete V2 rewrite was critical for stability, scalability, and adoption
What the maintainers learned about building a sustainable, community-driven open-source project
Sponsor
Join the Flux maintainers and community at FluxCon, November 11th in Salt Lake City—register here
More info
Find all the links and info for this episode here: https://ku.bz/bgkgn227-
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
October 06, 2025 at 06:00AM
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
https://kubernetes.io/blog/2025/09/23/introducing-headlamp-plugin-for-karpenter/
Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.
Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.
The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.
The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.
Map view of Karpenter Resources and how they relate to Kubernetes resources
Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.
Visualization of Karpenter Metrics
Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .
Scaling decisions
Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.
Config editor with validation support
Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.
Real time view of Karpenter resources
View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.
Dashboard for Pending Pods
View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.
Karpenter Providers
This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.
Provider Name
Tested
Extra provider specific info supported
AWS
✅
✅
Azure
✅
✅
AlibabaCloud
❌
❌
Bizfly Cloud
❌
❌
Cluster API
❌
❌
GCP
❌
❌
Proxmox
❌
❌
Oracle Cloud Infrastructure (OCI)
❌
❌
Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).
How to use
Please see the plugins/karpenter/README.md for instructions on how to use.
Feedback and Questions
Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.
via Kubernetes Blog https://kubernetes.io/
September 22, 2025 at 08:00PM
Week Ending September 28, 2025
https://lwkd.info/2025/20251002
Developer News
Instead of reviving the WG API Expression working group, a new SIG API Machinery subproject meeting on Declarative APIs and Linters was held on Sept 23, 2025, at 9 AM PST. The subproject carried the same goals as the proposed WG, and meeting details were shared in the Agenda & Notes document.
The WG AI Gateway has officially launched with a Slack channel, #wg-ai-gateway, and a mailing list. Meetings will begin next week, and the community is encouraged to join and participate.
Release Schedule
Next Deadline: PRR Freeze, October 9
Kubernetes v1.35 is moving along — APAC friendly meetings are running and enhancement opt ins are open.
Starting from v1.35, PRR Freeze is a hard deadline. No new KEPs may be opted in after the PRR Freeze deadline. Read more about about the new PRR Freeze rules here. If your KEP misses the PRR Freeze deadline, you need to submit an exception for your KEP within 3 days after PRR Freeze. Read more about the exception process here. If you have any questions, feel free to reach out in the #sig-release or the #prod-readiness channels in Slack.
If you’re an enhancement owner, make sure your KEP is up to date (status: implementable,milestone: v1.35, test plan + PRR filled) before PRR Freeze on Oct 9 (AoE) / Oct 10, 12:00 UTC.
The next cherry-pick deadline for patch releases is Oct 10.
Featured PRs
134330: Add resource version comparison function in client-go along with conformance
This PR introduces a helper function for comparing Kubernetes resource versions; Resource versions are used for concurrency control and watch operations, but until now, they could only be compared as opaque strings; The new function allows direct comparison of resource versions for objects of the same type; Alongside this, conformance tests have been added to ensure consistent handling across GA resources, making resource version behavior clearer and more reliable.
KEP of the Week
KEP-4412: Projected service account tokens for Kubelet image credential providers
This KEP proposes a secret-less image-pull flow that leverages ephemeral Kubernetes Service Account (KSA) tokens instead of long-lived ImagePullSecrets or node-wide kubelet credential providers. A pod-bound, short-lived KSA token would be used (or exchanged) to obtain transient, workload-scoped image-pull credentials before the pod starts, avoiding persisted secrets in the API or node and allowing external validators to rely on OIDC-like token semantics. This ties image-pull authorization to the workload identity, simplifies secret rotation and management, and reduces the security risk posed by long-lived, hard-to-rotate credentials.
This KEP is tracked for beta in v1.34.
Other Merges
Deallocate extended resource claims on pod completion
Introduce k8s:customUnique tag to control listmap uniqueness validation
Add +enum tag to DeviceAllocationMode type
kubeadm: wait for apiserver using a local client, not the control-plane endpoint
Revert async preemption corner-case fix — undoes prior change to scheduler preemption behavior
kubeadm removes the RootlessControlPlane feature gate as UserNamespacesSupport becomes the replacement
Enable SSATags linter to enforce +listType on lists in APIs
API Dispatcher drops goroutine limit to avoid throughput regression under high latency
Kubelet and controller: enable more asynchronous node status updates and improve tracing/logging
DRA: allocator selection uses correct “incubating” implementation by default
kube-proxy: list available endpoints in /statusz
Restore partial functionality of AuditEventFrom
Add explicit feature gate dependencies with validation
Kubernetes is now built with Go v1.24.7
Promotions
Graduate ControlPlaneKubeletLocalMode to GA
Version Updates
Update publishing rules to use Go v1.24.7
Subprojects and Dependency Updates
cluster-autoscaler v1.34.0 promotes In-Place Updates to Beta, adds Capacity Buffer CRD/controller, improves scale-up logic across multiple providers, and deprecates older flags/APIs
cluster-autoscaler-chart v0.1.0 automatically adjusts resources for workloads
gRPC v1.75.1 adds Python 3.14 support, fixes Python async shutdown race, and refines interpreter exit handling
helm-chart-aws-cloud-controller-manager v0.0.10 installs Cloud Controller Manager for AWS Cloud Provider
ingress-nginx helm-chart v4.13.3 updates Ingress-Nginx to controller v1.13.3
nerdctl v2.1.6 reserves ports in rootful mode to prevent conflicts
Shoutouts
No shoutouts this week. Want to thank someone for special efforts to improve Kubernetes? Tag them in the #shoutouts channel.
via Last Week in Kubernetes Development https://lwkd.info/
October 02, 2025 at 06:25AM