Link Sharing
Ep38 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=-nYVMVQosHc
Our Journey to GitOps: Migrating to ArgoCD with Zero Downtime, with Andrew Jeffree
Andrew Jeffree from SafetyCulture walks through their complete migration of 250+ microservices from a fragile Helm-based setup to GitOps with ArgoCD, all without any downtime. He explains how they replaced YAML configurations with a domain-specific language built in CUE, creating a better developer experience while adding stronger validation and reducing operational pain points.
You will learn:
Zero-downtime migration techniques using temporary deployments with prune-last sync options to ensure healthy services before removing legacy ones
How CUE lang improves on YAML by providing schema validation, early error detection, and a cleaner interface for developers
Human-centric platform engineering approaches that prioritize developer experience and reduce on-call burden through empathy-driven design decisions
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/Xvyp1_Qcv
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
October 28, 2025 at 06:00AM
Self-Healing Kubernetes: When to Use AI vs Traditional Automation
Tired of being woken up at 2 AM to manually troubleshoot Kubernetes incidents that could be fixed automatically? This video explores how to build intelligent self-healing systems that watch Kubernetes events, analyze problems, and remediate issues before they ruin your weekend. We'll break down the complete automation pipeline—from understanding how Kubernetes events work and what makes them ideal triggers, to implementing a maturity progression from manual firefighting through rule-based automation to AI-assisted remediation.
Learn when traditional automation works best (alerting and known patterns), where AI genuinely excels (analysis and unknown scenarios), and how to strategically combine both approaches. We'll cover the three phases of incident response—alerting, analysis, and remediation—and show you how to build systems that handle knowns with efficient controllers while leveraging AI for novel problems. The key is creating feedback loops that continuously graduate unknowns into automated knowns, progressively shrinking the surface area where human intervention is needed. Includes links to open-source projects demonstrating these principles in production.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: JFrog Fly 🔗 https://jfrog.com/fly_viktor ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Kubernetes #SelfHealingSystems #AIAutomation
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/self-healing-kubernetes-when-to-use-ai-vs-traditional-automation 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Remediation 01:15 JFrog fly (sponsor) 02:43 Kubernetes Events Explained 06:21 Kubernetes Automation Pipeline 12:46 AI-Powered Kubernetes Remediation 19:26 Building Self-Healing Systems
via YouTube https://www.youtube.com/watch?v=rIdcJYLtCdo
Week Ending October 19, 2025
https://lwkd.info/2025/20251022
Developer News
SIG-Etcd has found another potential upgrade failure preventing some users upgrading to etcd 3.6. The blog gives instructions on steps to avoid it, mainly updating to 3.5.24.
Release Schedule
Next Deadline: Docs Deadline for placeholder PRs, October 23
The deadline for opening your placeholder docs PRs is coming up soon. If you have a KEP tracked for v1.35, make sure that you have a placeholder PR in k/website for your docs before the deadline.
THe v1.35 Enhancements Freeze is in effect from October 17th. Out of the 101 KEPs opted in for the release, 75 made the cut for enhancements freeze.
Steering Committee Election
The Steering Committee Election voting ends later this week on Friday, 24th October, AoE. You can check your eligibility to vote in the voting app. Don’t forget to cast your votes if you haven’t already!
The deadline to file an exception request is 22nd October, AoE. Submit an exception request soon if you think you’re eligible!
KEP of the Week
KEP-4742: Expose Node Topology Labels via Downward API
This KEP introduces a built-in Kubernetes admission plugin that automatically copies node topology labels (like zone, region, or rack) onto Pods. It allows Pods to access this topology data through the Downward API without using privileged init containers or custom scripts. The change simplifies topology-aware workloads such as distributed AI/ML training, CNI optimizations, and sharded databases, making topology awareness a secure and native part of Kubernetes.
This KEP is tracked for beta in v1.35.
Other Merges
Declarative validation tags have a StabilityLevel
Test external VolumeGroupSnapshots in 1.35
AllocationConfigSource is validated
APF properly counts legacy watches
Declarative Validation rollout: DeviceClassName, update, ResourceClaim, maxItems, DRA fields, DeviceAllocationMode
Simplify kube-cross builds
Promotions
ExecProbeTimeout to GA
max-allowable-numa-nodes to GA
Deprecated
storage.k8s.io/v1alpha1 is no longer served
Version Updates
Golang update: 1.24.9 in 1.31 through 1.34, 1.25.3 in 1.35
etcd to v3.5.23, just in time to replace it with 3.5.24
Shoutouts
Rayan Das – A big shout-out to the v1.35 Enhancements shadows ( @dchan @jmickey @aibarbetta @Subhasmita @Faeka Ansari) for their hard work leading up to Enhancements Freeze yesterday.
via Last Week in Kubernetes Development https://lwkd.info/
October 22, 2025 at 07:55PM