1_r/devopsish
Ep36 - Ask Me Anything About Anything
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=iZoTwl8BWCI
How We Integrated Native macOS Workloads with Kubernetes, with Vitalii Horbachov
Vitalii Horbachov explains how Agoda built macOS VZ Kubelet, a custom solution that registers macOS hosts as Kubernetes nodes and spins up macOS VMs using Apple's native virtualization framework. He details their journey from managing 200 Mac minis with bash scripts to a Kubernetes-native approach that handles 20,000 iOS tests at scale.
You will learn:
How to build hybrid runtime pods that combine macOS VMs with Docker sidecar containers for complex CI/CD workflows
Custom OCI image format implementation for managing 55-60GB macOS VM images with layered copy-on-write disks and digest validation
Networking and security challenges including Apple entitlements, direct NIC access, and implementing kubectl exec over SSH
Real-world adoption considerations including MDM-based host lifecycle management and the build vs. buy decision for Apple infrastructure at scale
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/q_JS76SvM
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
October 07, 2025 at 06:00AM
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
https://kubernetes.io/blog/2025/10/06/introducing-headlamp-plugin-for-karpenter/
Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.
Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.
The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.
The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.
Map view of Karpenter Resources and how they relate to Kubernetes resources
Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.
Visualization of Karpenter Metrics
Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .
Scaling decisions
Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.
Config editor with validation support
Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.
Real time view of Karpenter resources
View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.
Dashboard for Pending Pods
View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.
Karpenter Providers
This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.
Provider Name
Tested
Extra provider specific info supported
AWS
✅
✅
Azure
✅
✅
AlibabaCloud
❌
❌
Bizfly Cloud
❌
❌
Cluster API
❌
❌
GCP
❌
❌
Proxmox
❌
❌
Oracle Cloud Infrastructure (OCI)
❌
❌
Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).
How to use
Please see the plugins/karpenter/README.md for instructions on how to use.
Feedback and Questions
Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.
via Kubernetes Blog https://kubernetes.io/
October 05, 2025 at 08:00PM
Kubernetes Controllers Deep Dive: How They Really Work
Most people using Kubernetes know how to write YAML and run kubectl apply, but when things break, they're completely lost. The secret they're missing? Understanding controllers - the beating heart that makes Kubernetes actually work. Controllers are what automatically restart your crashed pods, scale your applications, and make custom resources feel native to the platform.
This video dives deep into the real mechanics of how Kubernetes controllers operate. You'll discover how controllers consume and emit events to coordinate with each other, how the reconciliation loop continuously maintains your desired state, and how the Watch API efficiently streams changes without overwhelming the system. We'll explore custom resource definitions that extend Kubernetes, controller communication patterns, and the event-driven architecture that makes everything self-healing. Whether you're debugging cluster issues or building your own controllers, this knowledge will transform how you think about Kubernetes from just throwing YAML at the wall to truly understanding the orchestration engine underneath.
KubernetesControllers #Kubernetes #DevOpsEngineering
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/kubernetes-controllers-deep-dive-how-they-really-work 🔗 Kubernetes: https://kubernetes.io
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Controllers Deep Dive 01:18 Kubernetes Control Loops Explained 04:12 How Kubernetes Controllers Watch Events 07:35 Kubernetes Event Emission 11:56 Kubernetes Reconciliation Loop 17:12 Kubernetes Watch API 21:01 Kubernetes Custom Resource Definitions (CRDs) 21:13 Kubernetes Controller Communication 25:22 Kubernetes Controllers Mastery
via YouTube https://www.youtube.com/watch?v=kss081c8EqY
The Making of Flux: The Rewrite, a KubeFM Original Series
In this episode, Michael Bridgen (the engineer who wrote Flux's first lines) and Stefan Prodan (the maintainer who led the V2 rewrite) share how Flux grew from a fragile hack-day script into a production-grade GitOps toolkit.
How early Flux addressed the risks of manual, unsafe Kubernetes upgrades
Why the complete V2 rewrite was critical for stability, scalability, and adoption
What the maintainers learned about building a sustainable, community-driven open-source project
Sponsor
Join the Flux maintainers and community at FluxCon, November 11th in Salt Lake City—register here
More info
Find all the links and info for this episode here: https://ku.bz/bgkgn227-
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
October 06, 2025 at 06:00AM
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
https://kubernetes.io/blog/2025/09/23/introducing-headlamp-plugin-for-karpenter/
Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.
Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.
The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.
The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.
Map view of Karpenter Resources and how they relate to Kubernetes resources
Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.
Visualization of Karpenter Metrics
Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .
Scaling decisions
Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.
Config editor with validation support
Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.
Real time view of Karpenter resources
View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.
Dashboard for Pending Pods
View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.
Karpenter Providers
This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.
Provider Name
Tested
Extra provider specific info supported
AWS
✅
✅
Azure
✅
✅
AlibabaCloud
❌
❌
Bizfly Cloud
❌
❌
Cluster API
❌
❌
GCP
❌
❌
Proxmox
❌
❌
Oracle Cloud Infrastructure (OCI)
❌
❌
Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).
How to use
Please see the plugins/karpenter/README.md for instructions on how to use.
Feedback and Questions
Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.
via Kubernetes Blog https://kubernetes.io/
September 22, 2025 at 08:00PM
Week Ending September 28, 2025
https://lwkd.info/2025/20251002
Developer News
Instead of reviving the WG API Expression working group, a new SIG API Machinery subproject meeting on Declarative APIs and Linters was held on Sept 23, 2025, at 9 AM PST. The subproject carried the same goals as the proposed WG, and meeting details were shared in the Agenda & Notes document.
The WG AI Gateway has officially launched with a Slack channel, #wg-ai-gateway, and a mailing list. Meetings will begin next week, and the community is encouraged to join and participate.
Release Schedule
Next Deadline: PRR Freeze, October 9
Kubernetes v1.35 is moving along — APAC friendly meetings are running and enhancement opt ins are open.
Starting from v1.35, PRR Freeze is a hard deadline. No new KEPs may be opted in after the PRR Freeze deadline. Read more about about the new PRR Freeze rules here. If your KEP misses the PRR Freeze deadline, you need to submit an exception for your KEP within 3 days after PRR Freeze. Read more about the exception process here. If you have any questions, feel free to reach out in the #sig-release or the #prod-readiness channels in Slack.
If you’re an enhancement owner, make sure your KEP is up to date (status: implementable,milestone: v1.35, test plan + PRR filled) before PRR Freeze on Oct 9 (AoE) / Oct 10, 12:00 UTC.
The next cherry-pick deadline for patch releases is Oct 10.
Featured PRs
134330: Add resource version comparison function in client-go along with conformance
This PR introduces a helper function for comparing Kubernetes resource versions; Resource versions are used for concurrency control and watch operations, but until now, they could only be compared as opaque strings; The new function allows direct comparison of resource versions for objects of the same type; Alongside this, conformance tests have been added to ensure consistent handling across GA resources, making resource version behavior clearer and more reliable.
KEP of the Week
KEP-4412: Projected service account tokens for Kubelet image credential providers
This KEP proposes a secret-less image-pull flow that leverages ephemeral Kubernetes Service Account (KSA) tokens instead of long-lived ImagePullSecrets or node-wide kubelet credential providers. A pod-bound, short-lived KSA token would be used (or exchanged) to obtain transient, workload-scoped image-pull credentials before the pod starts, avoiding persisted secrets in the API or node and allowing external validators to rely on OIDC-like token semantics. This ties image-pull authorization to the workload identity, simplifies secret rotation and management, and reduces the security risk posed by long-lived, hard-to-rotate credentials.
This KEP is tracked for beta in v1.34.
Other Merges
Deallocate extended resource claims on pod completion
Introduce k8s:customUnique tag to control listmap uniqueness validation
Add +enum tag to DeviceAllocationMode type
kubeadm: wait for apiserver using a local client, not the control-plane endpoint
Revert async preemption corner-case fix — undoes prior change to scheduler preemption behavior
kubeadm removes the RootlessControlPlane feature gate as UserNamespacesSupport becomes the replacement
Enable SSATags linter to enforce +listType on lists in APIs
API Dispatcher drops goroutine limit to avoid throughput regression under high latency
Kubelet and controller: enable more asynchronous node status updates and improve tracing/logging
DRA: allocator selection uses correct “incubating” implementation by default
kube-proxy: list available endpoints in /statusz
Restore partial functionality of AuditEventFrom
Add explicit feature gate dependencies with validation
Kubernetes is now built with Go v1.24.7
Promotions
Graduate ControlPlaneKubeletLocalMode to GA
Version Updates
Update publishing rules to use Go v1.24.7
Subprojects and Dependency Updates
cluster-autoscaler v1.34.0 promotes In-Place Updates to Beta, adds Capacity Buffer CRD/controller, improves scale-up logic across multiple providers, and deprecates older flags/APIs
cluster-autoscaler-chart v0.1.0 automatically adjusts resources for workloads
gRPC v1.75.1 adds Python 3.14 support, fixes Python async shutdown race, and refines interpreter exit handling
helm-chart-aws-cloud-controller-manager v0.0.10 installs Cloud Controller Manager for AWS Cloud Provider
ingress-nginx helm-chart v4.13.3 updates Ingress-Nginx to controller v1.13.3
nerdctl v2.1.6 reserves ports in rootful mode to prevent conflicts
Shoutouts
No shoutouts this week. Want to thank someone for special efforts to improve Kubernetes? Tag them in the #shoutouts channel.
via Last Week in Kubernetes Development https://lwkd.info/
October 02, 2025 at 06:25AM
Ep35 - Ask Me Anything About Anything
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=ym9AX4kEkss
Scaling CI horizontally with Buildkite, Kubernetes, and multiple pipelines, with Ben Poland
Ben Poland walks through Faire's complete CI transformation, from a single Jenkins instance struggling with thousands of lines of Groovy to a distributed Buildkite system running across multiple Kubernetes clusters.
He details the technical challenges of running CI workloads at scale, including API rate limiting, etcd pressure points, and the trade-offs of splitting monolithic pipelines into service-scoped ones.
You will learn:
How to architect CI systems that match team ownership and eliminate shared failure points across services
Kubernetes scaling patterns for CI workloads, including multi-cluster strategies, predictive node provisioning, and handling API throttling
Performance optimization techniques like Git mirroring, node-level caching, and spot instance management for variable CI demands
Migration strategies and lessons learned from moving away from monolithic CI, including proof-of-concept approaches and avoiding the sunk cost fallacy
Sponsor
This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io
More info
Find all the links and info for this episode here: https://ku.bz/klBmzMY5-
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
September 30, 2025 at 06:00AM