1_r/devopsish

1_r/devopsish

54941 bookmarks
Custom sorting
Last Week in Kubernetes Development - Week Ending November 02 2025
Last Week in Kubernetes Development - Week Ending November 02 2025

Week Ending November 02, 2025

https://lwkd.info/2025/20251106

Developer News

The 2025 Steering Committee Election results are announced. Congratulations to Kat Cosgrove, Paco Xu, Rita Zhang and Maciej Szulik for being elected for their 2 year term in the steering committee. Maciej and Paco are returning steering committee members. Thank you to all the candidates and all the community members for voting. An election retro call is happening on 19th November, 8AM PT. If you have any feedback about the steering elections this year, please add it to the retro doc.

WG-LTS is winding down after the conclusion from all the discussions that a community supported LTS within the Kubernetes project is probably not the right answer. The Compatibility Versions feature is cited as an alternative for safer upgrades.

The Kustomize project is seeking proposals for a new logo. If you have any ideas for a new logo, do post it in the open issue!

Release Schedule

Next Deadline: Code Freeze, 7th November

The code freeze and test freeze deadline is on Friday 7th November 2025, 12:00 UTC. Please open an early exception for your KEP if you think you need more time!

Kubernetes v1.35.0-alpha.3 is live.

Featured PRs

[KEP-4330] add min-compatibility-version to control plane

This PR is part of a larger effort to introduce “compatibility versions” to control plane components and features, eventually permitting upgrades and rollbacks that span more than one Kubernetes version safely. This PR adds the field to apiserver, controller-manager, and scheduler.

KEP of the Week

KEP-4827: Component Statusz

As part of Kubernetes march towards structured data for everything, this KEP introduces a structured, standardized endpoint for health and status checking. It will enhance observability and enable building new monitoring and performance tools. Statusz is kicking off with v1alpha1 in 1.35

Other Merges

New k8s-resource-fully-qualified-name format for Declarative Validation

Enhance several different E2E tests (plus many more, kudos Lukasz Szaszkiewicz) to support EnableWatchListClient

CRD Conditions include an ObservedGeneration to deter race conditions

DRA APIs: migrate several [DRA validations] (https://github.com/kubernetes/kubernetes/pull/134963), use EachKey to map resources, make DeviceAttribute a Union type,

New tests to support Deployments terminating pods during Recreate and RollingUpdate

Benchmarking Shared Informers now

Allow some kubeadm functions to be exported

Support Declarative Validation for StorageClass

Use informer.RunWithContext in controller tests

Test stepwise volume expansion

Prevent AllocationMode: All failure

Allow DRA to process inactive workloads with Allocatable=0

ContextualLogging migrations: cpumanager

JWKS fetch metrics for structured authentication

Pod Generation E2E tests promoted to conformance

Promotions

KUBECTL_COMMAND_HEADERS to GA

InPlacePodVerticalScaling to GA

StorageVersionMigration to beta

SystemWatchdog to GA

MutableCSINodeAllocatableCount to beta

DeploymentReplicaSetTerminatingReplicas to beta

Deprecated

BlockOwnerDeletion is removed from resource claims

Stop providing taint keys in Pod statuses when scheduling fails

DynamicResourceAllocation feature gate locked on; will be removed in a few releases

Remove kubelet --pod-infra-container-image switch

Subprojects and Dependency Updates

containerd v2.2.0-rc.0 (pre-release) introduces a mount manager, adds conf.d include support in the default configuration, and supports back references in the garbage collector. It improves CRI with ListPodSandboxMetrics and image-volume subpaths, adds parallel image unpack and referrers fetcher, updates the EROFS snapshotter, enables OpenTelemetry traces and WASM plugin support in NRI, improves shim reload performance, and postpones v2.2 deprecations to v2.3.

nerdctl v2.2.0-rc.0 fixes a namestore directory regression, adds mount-manager support, and introduces new checkpoint commands (create, ls, rm). It adds a --estargz-gzip-helper flag for image conversion and updates bundled dependencies, including containerd v2.2.0-rc.0, runc v1.3.2, BuildKit v0.25.1, and Stargz Snapshotter v0.18.0.

cloud-provider-vsphere v1.33.1 updates CAPI to v1.10.1 and CAPV to v1.13.0, enables weekly security checks, updates API calls to use FQDN, and fixes Service deletion when VirtualMachineService is not found. It also bumps Kubernetes to v1.33.5 and refreshes documentation.

cloud-provider-vsphere v1.32.3 provides dependency updates across test suites, upgrades controller-runtime to v0.19.6 and govmomi to v0.46.3, introduces weekly security checks, and fixes VirtualMachineService deletion. It also adopts FQDN for Supervisor API calls and includes CVE patches.

vsphere-cpi-chart-1.33.1 and vsphere-cpi-chart-1.32.3 update Helm charts for vSphere CPI to align with recent vSphere provider releases.

ingress-nginx helm-chart-4.14.0, 4.13.4, and 4.12.8 deliver updated Helm charts for the NGINX ingress controller with alignment to current controller and Kubernetes versions.

cluster-autoscaler v1.30.7 backports the OCI CloudProvider feature to the v1.30 line and publishes multi-architecture images (v1.30.7).

prometheus v3.7.3 fixes a UI redirect regression involving -web.external-url and -web.route-prefix, resolves federation issues for some native histograms, corrects promtool check config failures when --lint=none is specified, and eliminates a remote-write queue resharding deadlock.

Shoutouts

Sreeram Venkitesh - Shoutout to everyone who helped run the 2025 steering committee elections smoothly - The EOs and alternate EOs: @cblecker @Nina Polshakova @Arujjwal @Rey Lejano, K8s infra liaison @mahamed, and @jberkus for all the support from the very beginning and also for helping us with Elekto. Also big thanks to all the previous (and continuing) steering committee members for their support in making the election a smooth and successful one. Thank you all!

via Last Week in Kubernetes Development https://lwkd.info/

November 06, 2025 at 02:37AM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending November 02 2025
Graphs in your head or how to assess a Kubernetes workload with Oleksii Kolodiazhnyi
Graphs in your head or how to assess a Kubernetes workload with Oleksii Kolodiazhnyi

Graphs in your head, or how to assess a Kubernetes workload, with Oleksii Kolodiazhnyi

https://ku.bz/zDThxGQsP

Understanding what's actually happening inside a complex Kubernetes system is one of the biggest challenges architects face.

Oleksii Kolodiazhnyi, Senior Architect at Mirantis, shares his structured approach to Kubernetes workload assessment. He breaks down how to move from high-level business understanding to detailed technical analysis, using visualization tools and systematic documentation.

You will learn:

A top-down assessment methodology that starts with business cases and use cases before diving into technical details

Practical visualization techniques using tools like KubeView, K9s, and Helm dashboard to quickly understand resource interactions

Systematic resource discovery approaches for different scenarios, from well-documented Helm-based deployments to legacy applications with hard-coded configurations buried in containers

Documentation strategies for creating consumable artifacts that serve different audiences, from business stakeholders to new team members joining the project

Sponsor

This episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimization

More info

Find all the links and info for this episode here: https://ku.bz/zDThxGQsP

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

November 04, 2025 at 05:00AM

·kube.fm·
Graphs in your head or how to assess a Kubernetes workload with Oleksii Kolodiazhnyi
DevOps & AI Toolkit - Best AI Models for DevOps & SRE: Real-World Agent Testing - https://www.youtube.com/watch?v=r84kQ5IMIQM
DevOps & AI Toolkit - Best AI Models for DevOps & SRE: Real-World Agent Testing - https://www.youtube.com/watch?v=r84kQ5IMIQM

Best AI Models for DevOps & SRE: Real-World Agent Testing

A comprehensive, data-driven comparison of 10 leading large language models (LLMs) from Google, Anthropic, OpenAI, xAI, DeepSeek, and Mistral, specifically tested for DevOps, SRE, and platform engineering workflows. Instead of relying on traditional benchmarks or marketing claims, this evaluation runs real agent workflows through production scenarios: Kubernetes operations, cluster analysis, policy generation, manifest creation, and systematic troubleshooting—all with actual timeout constraints. The results reveal shocking gaps between benchmark promises and production reality: 70% of models couldn't complete tasks in reasonable timeframes, premium "reasoning" models failed on tasks cheaper alternatives handled easily, and the most expensive model ($120 per million output tokens) failed more tests than it passed.

The evaluation measures five key dimensions: overall performance quality, reliability and completion rates, consistency across different tasks, cost-performance value, and context window efficiency. Five distinct test scenarios push models through endurance tests (100+ consecutive interactions), rapid pattern recognition (5-minute workflows), comprehensive policy compliance analysis, extreme context pressure (100,000+ token loads), and systematic investigation loops requiring intelligent troubleshooting. The rankings reveal clear performance tiers, with Claude Haiku emerging as the overall winner for its exceptional efficiency and price-performance ratio, while Claude Sonnet takes the reliability crown with 98% completion rates. The video provides specific recommendations on which models to use, which to avoid, and why cost doesn't always correlate with capability in production environments.

LLMComparison #DevOps #AIforEngineers

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/best-ai-models-for-devops--sre-real-world-agent-testing 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai 🎬 Analysis report: https://github.com/vfarcic/dot-ai/blob/main/eval/analysis/platform/synthesis-report.md

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Large Language Models (LLMs) Compared 01:54 How I Compare Large Language Models 05:01 LLM Evaluation Criteria and Test Scenarios 13:23 AI Model Benchmark Results 27:34 AI Model Rankings and Recommendations

via YouTube https://www.youtube.com/watch?v=r84kQ5IMIQM

·youtube.com·
DevOps & AI Toolkit - Best AI Models for DevOps & SRE: Real-World Agent Testing - https://www.youtube.com/watch?v=r84kQ5IMIQM
Whitmer: Multi-billion-dollar Saline Township data center ‘largest investment in Michigan history’ • Michigan Advance
Whitmer: Multi-billion-dollar Saline Township data center ‘largest investment in Michigan history’ • Michigan Advance
A string of announcements from DTE Energy, Open AI and Related Digital coalesced into another notice from Michigan Gov. Gretchen Whitmer late Thursday afternoon, confirming plans to move forward with development of a data center in Saline Township – following a legal settlement between the developer and the host community. Whitmer lauded the multi-billion dollar […]
·michiganadvance.com·
Whitmer: Multi-billion-dollar Saline Township data center ‘largest investment in Michigan history’ • Michigan Advance
Last Week in Kubernetes Development - Week Ending October 26 2025
Last Week in Kubernetes Development - Week Ending October 26 2025

Week Ending October 26, 2025

https://lwkd.info/2025/20251031

Developer News

The steering committee election voting period closed last week. The results will be announced in the public steering meeting next Wednesday.

Some reminders for folks attending KubeCon NA 2025 about the Kubernetes Contributor Hour and the SIG/WG Meet and Greet

Release Schedule

Next Deadline: Code Freeze, 7th November

With the feature blog freeze in place, KEP assignees are expected to open placeholder PRs for their blogs. Please reach out to the Release Comms team for more information. We’re one week away from the v1.35 code freeze. Get your PRs ready and don’t forget to file an early exception if you anticipate any delays!

October patch releases have been skipped altogether.

KEP of the Week

KEP-5007: DRA: Device Binding Conditions

This KEP introduces BindingConditions, enabling the scheduler to delay Pod binding until external resources such as fabric-attached GPUs or FPGAs are confirmed ready. This improves scheduling reliability by preventing premature bindings that could lead to Pod failures or require manual intervention. The mechanism also supports asynchronous or failure-prone scenarios, including remote accelerators and FPGA reprogramming.

This KEP is tracked for beta in v1.35.

Other Merges

DRA resources use eachKey declarative validation to mirror map-key checks and keep generated DV in sync with handwritten rules

CSI NodePublishVolumeRequest now carries pod service account tokens in the gRPC secrets field instead of volume_context

DRA DeviceAttribute now declares its non-discriminated union with +k8s:unionMember, so declarative validation can enforce “exactly one value set”

Add +k8s:maxLength (and +k8s:optional) to NetworkDeviceData so generated DV can cap interfaceName / hardwareAddress lengths and match handwritten validation

Wire storage.k8s.io (StorageClass) into declarative validation and mark provisioner as +k8s:required, so generated DV now matches the old handwritten strategy on create/update

StorageVersionMigration (SVM) graduates to v1beta1 and drops the old v1alpha1/unused fields, so clusters must clean up any storage.k8s.io/v1alpha1 SVM objects before upgrading

kubectl finally drops support for the long-deprecated certificates.k8s.io/v1beta1 CertificateSigningRequest.

Add mtlsclient and mtlsserver for the mtls validations

apiserver cacher’s lister_watcher now exposes WatchList semantics

Enable declarative validation for resource.k8s.io ResourceSlice (v1/v1beta1/v1beta2)

Introduce pod queuing in endpoint/slice controllers

Add k8s-resource-fully-qualified-name format

Implements synthetic create authz permission check for exec, attach, and portforward

Enable Declarative Validation(DV) support for ClusterRole and RoleBinding

Replace HandleCrash and HandleError calls to use context-aware alternative

Bump supported etcd version to v3.5.24 for release v1.32, v1.33, and v1.3

Promotions

Pod Generation to GA

ContainerRestartRules to beta

RelaxedServiceNameValidation to beta

PreferSameTrafficDistribution to GA

Version Updates

etcd sdk to v3.6.5

system-validators to v1.12.1

Subprojects and Dependency Updates

containerd v2.2.0-rc.0 (pre-release) adds a mount manager, supports conf.d includes in the default config, and adds back-references in the garbage collector. It improves CRI with ListPodSandboxMetrics and image-volume subpaths, adds parallel image unpack and a referrers fetcher, updates EROFS snapshotter, enables OTEL traces and WASM plugin support in NRI, speeds shim reloads, and postpones some deprecations to 2.3.

containerd API v1.10.0-rc.0 (pre-release) aligns with containerd 2.2, introducing the mount manager and parallel unpack support in the API.

prometheus v3.7.3 fixes a UI redirect regression with -web.external-url and -web.route-prefix, corrects federation for some native histograms, fixes a promtool check config failure when --lint=none is set, and resolves a remote-write queue resharding deadlock.

via Last Week in Kubernetes Development https://lwkd.info/

October 31, 2025 at 02:41PM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending October 26 2025
US company with access to biggest telecom firms uncovers breach by nation-state hackers
US company with access to biggest telecom firms uncovers breach by nation-state hackers
Hackers working for an unnamed nation-state breached networks at Ribbon Communications , a key U.S. telecommunications services company, and remained within the firm’s systems for nearly a year without being detected, a company spokesperson confirmed in a statement on Wednesday.
·reuters.com·
US company with access to biggest telecom firms uncovers breach by nation-state hackers
Fedora Linux 43 is here! - Fedora Magazine
Fedora Linux 43 is here! - Fedora Magazine
I’m excited to announce my very first Fedora Linux release as the new Fedora Project Leader. Fedora Linux 43 is here! 43 releases! Wow that’s a lot. I was thinking about proposing special tetracontakaitrigon stickers to celebrate this release, but I’m not sure anyone would notice they weren’t circles. Thank you and congrats to everyone […]
·fedoramagazine.org·
Fedora Linux 43 is here! - Fedora Magazine
DevOps & AI Toolkit - Ep38 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=-nYVMVQosHc
DevOps & AI Toolkit - Ep38 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=-nYVMVQosHc

Ep38 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=-nYVMVQosHc

·youtube.com·
DevOps & AI Toolkit - Ep38 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=-nYVMVQosHc
Sales 101: What Your Sales Team Does (And How DevRel Fits In)
Sales 101: What Your Sales Team Does (And How DevRel Fits In)
Stop treating sales like the enemy. Learn what your sales team actually does, decode pipeline and ARR, and discover how DevRel impacts deals without selling out.
·dev.to·
Sales 101: What Your Sales Team Does (And How DevRel Fits In)
Open Source Initiative now accepting your application for Executive Director
Open Source Initiative now accepting your application for Executive Director
The Open Source Initiative is seeking its next Executive Director (ED), the chief executive and strategic leader of the OSI, responsible for advancing its mission, growing and diversifying its funding base, and fostering a global, inclusive community of stakeholders. The ED will be a visible ambassador for OSI to build consensus around key initiatives, including the next version of the Open Source Al definition.
·opensource.org·
Open Source Initiative now accepting your application for Executive Director
Amazon laying off about 14,000 corporate workers
Amazon laying off about 14,000 corporate workers
The company said it's cutting roles in order to help make the company leaner and less bureaucratic, while it looks to invest in generative AI.
·cnbc.com·
Amazon laying off about 14,000 corporate workers
Our Journey to GitOps: Migrating to ArgoCD with Zero Downtime with Andrew Jeffree
Our Journey to GitOps: Migrating to ArgoCD with Zero Downtime with Andrew Jeffree

Our Journey to GitOps: Migrating to ArgoCD with Zero Downtime, with Andrew Jeffree

https://ku.bz/Xvyp1_Qcv

Andrew Jeffree from SafetyCulture walks through their complete migration of 250+ microservices from a fragile Helm-based setup to GitOps with ArgoCD, all without any downtime. He explains how they replaced YAML configurations with a domain-specific language built in CUE, creating a better developer experience while adding stronger validation and reducing operational pain points.

You will learn:

Zero-downtime migration techniques using temporary deployments with prune-last sync options to ensure healthy services before removing legacy ones

How CUE lang improves on YAML by providing schema validation, early error detection, and a cleaner interface for developers

Human-centric platform engineering approaches that prioritize developer experience and reduce on-call burden through empathy-driven design decisions

Sponsor

This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/Xvyp1_Qcv

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 28, 2025 at 06:00AM

·kube.fm·
Our Journey to GitOps: Migrating to ArgoCD with Zero Downtime with Andrew Jeffree
DevOps & AI Toolkit - Self-Healing Kubernetes: When to Use AI vs Traditional Automation - https://www.youtube.com/watch?v=rIdcJYLtCdo
DevOps & AI Toolkit - Self-Healing Kubernetes: When to Use AI vs Traditional Automation - https://www.youtube.com/watch?v=rIdcJYLtCdo

Self-Healing Kubernetes: When to Use AI vs Traditional Automation

Tired of being woken up at 2 AM to manually troubleshoot Kubernetes incidents that could be fixed automatically? This video explores how to build intelligent self-healing systems that watch Kubernetes events, analyze problems, and remediate issues before they ruin your weekend. We'll break down the complete automation pipeline—from understanding how Kubernetes events work and what makes them ideal triggers, to implementing a maturity progression from manual firefighting through rule-based automation to AI-assisted remediation.

Learn when traditional automation works best (alerting and known patterns), where AI genuinely excels (analysis and unknown scenarios), and how to strategically combine both approaches. We'll cover the three phases of incident response—alerting, analysis, and remediation—and show you how to build systems that handle knowns with efficient controllers while leveraging AI for novel problems. The key is creating feedback loops that continuously graduate unknowns into automated knowns, progressively shrinking the surface area where human intervention is needed. Includes links to open-source projects demonstrating these principles in production.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: JFrog Fly 🔗 https://jfrog.com/fly_viktor ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Kubernetes #SelfHealingSystems #AIAutomation

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/self-healing-kubernetes-when-to-use-ai-vs-traditional-automation 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Remediation 01:15 JFrog fly (sponsor) 02:43 Kubernetes Events Explained 06:21 Kubernetes Automation Pipeline 12:46 AI-Powered Kubernetes Remediation 19:26 Building Self-Healing Systems

via YouTube https://www.youtube.com/watch?v=rIdcJYLtCdo

·youtube.com·
DevOps & AI Toolkit - Self-Healing Kubernetes: When to Use AI vs Traditional Automation - https://www.youtube.com/watch?v=rIdcJYLtCdo