1_r/devopsish

1_r/devopsish

54941 bookmarks
Custom sorting
The Making of Flux: The Rewrite a KubeFM Original Series
The Making of Flux: The Rewrite a KubeFM Original Series

The Making of Flux: The Rewrite, a KubeFM Original Series

https://ku.bz/bgkgn227-

In this episode, Michael Bridgen (the engineer who wrote Flux's first lines) and Stefan Prodan (the maintainer who led the V2 rewrite) share how Flux grew from a fragile hack-day script into a production-grade GitOps toolkit.

How early Flux addressed the risks of manual, unsafe Kubernetes upgrades

Why the complete V2 rewrite was critical for stability, scalability, and adoption

What the maintainers learned about building a sustainable, community-driven open-source project

Sponsor

Join the Flux maintainers and community at FluxCon, November 11th in Salt Lake City—register here

More info

Find all the links and info for this episode here: https://ku.bz/bgkgn227-

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 06, 2025 at 06:00AM

·kube.fm·
The Making of Flux: The Rewrite a KubeFM Original Series
lasantosr/intelli-shell
lasantosr/intelli-shell
Like IntelliSense, but for shells. Contribute to lasantosr/intelli-shell development by creating an account on GitHub.
·github.com·
lasantosr/intelli-shell
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

Introducing Headlamp Plugin for Karpenter - Scaling and Visibility

https://kubernetes.io/blog/2025/09/23/introducing-headlamp-plugin-for-karpenter/

Headlamp is an open‑source, extensible Kubernetes SIG UI project designed to let you explore, manage, and debug cluster resources.

Karpenter is a Kubernetes Autoscaling SIG node provisioning project that helps clusters scale quickly and efficiently. It launches new nodes in seconds, selects appropriate instance types for workloads, and manages the full node lifecycle, including scale-down.

The new Headlamp Karpenter Plugin adds real-time visibility into Karpenter’s activity directly from the Headlamp UI. It shows how Karpenter resources relate to Kubernetes objects, displays live metrics, and surfaces scaling events as they happen. You can inspect pending pods during provisioning, review scaling decisions, and edit Karpenter-managed resources with built-in validation. The Karpenter plugin was made as part of a LFX mentor project.

The Karpenter plugin for Headlamp aims to make it easier for Kubernetes users and operators to understand, debug, and fine-tune autoscaling behavior in their clusters. Now we will give a brief tour of the Headlamp plugin.

Map view of Karpenter Resources and how they relate to Kubernetes resources

Easily see how Karpenter Resources like NodeClasses, NodePool and NodeClaims connect with core Kubernetes resources like Pods, Nodes etc.

Visualization of Karpenter Metrics

Get instant insights of Resource Usage v/s Limits, Allowed disruptions, Pending Pods, Provisioning Latency and many more .

Scaling decisions

Shows which instances are being provisioned for your workloads and understand the reason behind why Karpenter made those choices. Helpful while debugging.

Config editor with validation support

Make live edits to Karpenter configurations. The editor includes diff previews and resource validation for safer adjustments.

Real time view of Karpenter resources

View and track Karpenter specific resources in real time such as “NodeClaims” as your cluster scales up and down.

Dashboard for Pending Pods

View all pending pods with unmet scheduling requirements/Failed Scheduling highlighting why they couldn't be scheduled.

Karpenter Providers

This plugin should work with most Karpenter providers, but has only so far been tested on the ones listed in the table. Additionally, each provider gives some extra information, and the ones in the table below are displayed by the plugin.

Provider Name

Tested

Extra provider specific info supported

AWS

Azure

AlibabaCloud

Bizfly Cloud

Cluster API

GCP

Proxmox

Oracle Cloud Infrastructure (OCI)

Please submit an issue if you test one of the untested providers or if you want support for this provider (PRs also gladly accepted).

How to use

Please see the plugins/karpenter/README.md for instructions on how to use.

Feedback and Questions

Please submit an issue if you use Karpenter and have any other ideas or feedback. Or come to the Kubernetes slack headlamp channel for a chat.

via Kubernetes Blog https://kubernetes.io/

September 22, 2025 at 08:00PM

·kubernetes.io·
Introducing Headlamp Plugin for Karpenter - Scaling and Visibility
Last Week in Kubernetes Development - Week Ending September 28 2025
Last Week in Kubernetes Development - Week Ending September 28 2025

Week Ending September 28, 2025

https://lwkd.info/2025/20251002

Developer News

Instead of reviving the WG API Expression working group, a new SIG API Machinery subproject meeting on Declarative APIs and Linters was held on Sept 23, 2025, at 9 AM PST. The subproject carried the same goals as the proposed WG, and meeting details were shared in the Agenda & Notes document.

The WG AI Gateway has officially launched with a Slack channel, #wg-ai-gateway, and a mailing list. Meetings will begin next week, and the community is encouraged to join and participate.

Release Schedule

Next Deadline: PRR Freeze, October 9

Kubernetes v1.35 is moving along — APAC friendly meetings are running and enhancement opt ins are open.

Starting from v1.35, PRR Freeze is a hard deadline. No new KEPs may be opted in after the PRR Freeze deadline. Read more about about the new PRR Freeze rules here. If your KEP misses the PRR Freeze deadline, you need to submit an exception for your KEP within 3 days after PRR Freeze. Read more about the exception process here. If you have any questions, feel free to reach out in the #sig-release or the #prod-readiness channels in Slack.

If you’re an enhancement owner, make sure your KEP is up to date (status: implementable,milestone: v1.35, test plan + PRR filled) before PRR Freeze on Oct 9 (AoE) / Oct 10, 12:00 UTC.

The next cherry-pick deadline for patch releases is Oct 10.

Featured PRs

134330: Add resource version comparison function in client-go along with conformance

This PR introduces a helper function for comparing Kubernetes resource versions; Resource versions are used for concurrency control and watch operations, but until now, they could only be compared as opaque strings; The new function allows direct comparison of resource versions for objects of the same type; Alongside this, conformance tests have been added to ensure consistent handling across GA resources, making resource version behavior clearer and more reliable.

KEP of the Week

KEP-4412: Projected service account tokens for Kubelet image credential providers

This KEP proposes a secret-less image-pull flow that leverages ephemeral Kubernetes Service Account (KSA) tokens instead of long-lived ImagePullSecrets or node-wide kubelet credential providers. A pod-bound, short-lived KSA token would be used (or exchanged) to obtain transient, workload-scoped image-pull credentials before the pod starts, avoiding persisted secrets in the API or node and allowing external validators to rely on OIDC-like token semantics. This ties image-pull authorization to the workload identity, simplifies secret rotation and management, and reduces the security risk posed by long-lived, hard-to-rotate credentials.

This KEP is tracked for beta in v1.34.

Other Merges

Deallocate extended resource claims on pod completion

Introduce k8s:customUnique tag to control listmap uniqueness validation

Add +enum tag to DeviceAllocationMode type

kubeadm: wait for apiserver using a local client, not the control-plane endpoint

Revert async preemption corner-case fix — undoes prior change to scheduler preemption behavior

kubeadm removes the RootlessControlPlane feature gate as UserNamespacesSupport becomes the replacement

Enable SSATags linter to enforce +listType on lists in APIs

API Dispatcher drops goroutine limit to avoid throughput regression under high latency

Kubelet and controller: enable more asynchronous node status updates and improve tracing/logging

DRA: allocator selection uses correct “incubating” implementation by default

kube-proxy: list available endpoints in /statusz

Restore partial functionality of AuditEventFrom

Add explicit feature gate dependencies with validation

Kubernetes is now built with Go v1.24.7

Promotions

Graduate ControlPlaneKubeletLocalMode to GA

Version Updates

Update publishing rules to use Go v1.24.7

Subprojects and Dependency Updates

cluster-autoscaler v1.34.0 promotes In-Place Updates to Beta, adds Capacity Buffer CRD/controller, improves scale-up logic across multiple providers, and deprecates older flags/APIs

cluster-autoscaler-chart v0.1.0 automatically adjusts resources for workloads

gRPC v1.75.1 adds Python 3.14 support, fixes Python async shutdown race, and refines interpreter exit handling

helm-chart-aws-cloud-controller-manager v0.0.10 installs Cloud Controller Manager for AWS Cloud Provider

ingress-nginx helm-chart v4.13.3 updates Ingress-Nginx to controller v1.13.3

nerdctl v2.1.6 reserves ports in rootful mode to prevent conflicts

Shoutouts

No shoutouts this week. Want to thank someone for special efforts to improve Kubernetes? Tag them in the #shoutouts channel.

via Last Week in Kubernetes Development https://lwkd.info/

October 02, 2025 at 06:25AM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending September 28 2025
francoismichel/ssh3: SSH3: faster and rich secure shell using HTTP/3, checkout our article here: https://arxiv.org/abs/2312.08396 and our Internet-Draft: https://datatracker.ietf.org/doc/draft-michel-ssh3/
francoismichel/ssh3: SSH3: faster and rich secure shell using HTTP/3, checkout our article here: https://arxiv.org/abs/2312.08396 and our Internet-Draft: https://datatracker.ietf.org/doc/draft-michel-ssh3/
SSH3: faster and rich secure shell using HTTP/3, checkout our article here: https://arxiv.org/abs/2312.08396 and our Internet-Draft: https://datatracker.ietf.org/doc/draft-michel-ssh3/ - francoismi...
·github.com·
francoismichel/ssh3: SSH3: faster and rich secure shell using HTTP/3, checkout our article here: https://arxiv.org/abs/2312.08396 and our Internet-Draft: https://datatracker.ietf.org/doc/draft-michel-ssh3/
atuinsh/desktop - Runbooks that run
atuinsh/desktop - Runbooks that run
📖 Runbooks that run . Contribute to atuinsh/desktop development by creating an account on GitHub.
Runbooks that run
·github.com·
atuinsh/desktop - Runbooks that run
Atuin Desktop: Runbooks that Run — Now Open Source
Atuin Desktop: Runbooks that Run — Now Open Source
Atuin Desktop looks like a doc, but runs like your terminal. Script blocks, embedded terminals, database clients and prometheus charts - all in one place.
·blog.atuin.sh·
Atuin Desktop: Runbooks that Run — Now Open Source
AI & DevOps Toolkit - Ep35 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=ym9AX4kEkss
AI & DevOps Toolkit - Ep35 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=ym9AX4kEkss

Ep35 - Ask Me Anything About Anything

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=ym9AX4kEkss

·youtube.com·
AI & DevOps Toolkit - Ep35 - Ask Me Anything About Anything - https://www.youtube.com/watch?v=ym9AX4kEkss
Claude Code for VS Code - Visual Studio Marketplace
Claude Code for VS Code - Visual Studio Marketplace
Extension for Visual Studio Code - Claude Code for VS Code: Harness the power of Claude Code without leaving your IDE
·marketplace.visualstudio.com·
Claude Code for VS Code - Visual Studio Marketplace
Enabling Claude Code to work more autonomously \ Anthropic
Enabling Claude Code to work more autonomously \ Anthropic
Introducing Claude Code upgrades: native VS Code extension, terminal UX updates, and checkpoints for autonomous development. Handle complex tasks with confidence.
·anthropic.com·
Enabling Claude Code to work more autonomously \ Anthropic
Scaling CI horizontally with Buildkite Kubernetes and multiple pipelines with Ben Poland
Scaling CI horizontally with Buildkite Kubernetes and multiple pipelines with Ben Poland

Scaling CI horizontally with Buildkite, Kubernetes, and multiple pipelines, with Ben Poland

https://ku.bz/klBmzMY5-

Ben Poland walks through Faire's complete CI transformation, from a single Jenkins instance struggling with thousands of lines of Groovy to a distributed Buildkite system running across multiple Kubernetes clusters.

He details the technical challenges of running CI workloads at scale, including API rate limiting, etcd pressure points, and the trade-offs of splitting monolithic pipelines into service-scoped ones.

You will learn:

How to architect CI systems that match team ownership and eliminate shared failure points across services

Kubernetes scaling patterns for CI workloads, including multi-cluster strategies, predictive node provisioning, and handling API throttling

Performance optimization techniques like Git mirroring, node-level caching, and spot instance management for variable CI demands

Migration strategies and lessons learned from moving away from monolithic CI, including proof-of-concept approaches and avoiding the sunk cost fallacy

Sponsor

This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/klBmzMY5-

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

September 30, 2025 at 06:00AM

·kube.fm·
Scaling CI horizontally with Buildkite Kubernetes and multiple pipelines with Ben Poland
AI & DevOps Toolkit - How I Tamed Chaotic AI Coding with Simple Workflow Commands - https://www.youtube.com/watch?v=LUFJuj1yIik
AI & DevOps Toolkit - How I Tamed Chaotic AI Coding with Simple Workflow Commands - https://www.youtube.com/watch?v=LUFJuj1yIik

How I Tamed Chaotic AI Coding with Simple Workflow Commands

Tired of AI coding agents that jump between tasks chaotically and lose track of context? This video demonstrates a complete systematic workflow for AI-assisted development that keeps both you and your AI agent focused and organized from initial idea through production deployment.

I'll walk you through my entire PRD-based development system, showing real implementation of a complex feature from start to finish. You'll see how to create comprehensive technical requirements with AI analysis, track progress systematically, handle inevitable plan changes, prioritize tasks intelligently, and complete features with full traceability. The workflow uses simple MCP commands like /prd-create, /prd-next, /prd-update-progress, and /prd-done to guide systematic development without requiring complex external tools. By the end, you'll understand how to transform chaotic AI coding sessions into structured, professional development workflows that actually ship reliable software.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: OutSkill 👉 Grab your free seat to the 2-Day AI Mastermind: https://link.outskill.com/AIDOS2 🔐 100% Discount for the first 1000 people 💥 Dive deep into AI and Learn Automations, Build AI Agents, Make videos & images – all for free! 🎁 Bonuses worth $5100+ if you join and attend ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

AICoding #PRDWorkflow #ClaudeCode

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/development/how-i-tamed-chaotic-ai-coding-with-simple-workflow-commands 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai 🎬 Stop Wasting Time: Turn AI Prompts and Context Into Production Code: https://youtu.be/XwWCFINXIoU

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction 01:50 AI Development Workflow 05:03 Outskill (sponsor) 06:25 Create PRDs with AI 12:27 Find Active PRDs with AI 14:17 Start PRD Implementation with AI 18:21 Track Development Progress with AI 20:50 AI Task Prioritization 22:41 Update PRD Decisions with AI 24:56 Complete PRD Workflow with AI 28:44 Key Takeaways

via YouTube https://www.youtube.com/watch?v=LUFJuj1yIik

·youtube.com·
AI & DevOps Toolkit - How I Tamed Chaotic AI Coding with Simple Workflow Commands - https://www.youtube.com/watch?v=LUFJuj1yIik
week one
week one
It’s officially been one week of unemployment. I’m floored by how many people subscribed, so I just wanted to start by saying thank you truly from the bottom of my heart. A lot of the people who subscribed have been watching me grow for the better part of
·kyliebytes.com·
week one
Last Week in Kubernetes Development - Week Ending September 21 2025
Last Week in Kubernetes Development - Week Ending September 21 2025

Week Ending September 21, 2025

https://lwkd.info/2025/20250925

Developer News

Ray Wainman shared that he is stepping down as co-lead of SIG Autoscaling, and Adrian Moisey will step into the role alongside Jack Francis.

From the SIG K8s Infra leaders Davanum Srinivas (@dims) and Benjamin Elder (@bentheelder) are stepping down and nominating Ciprian Hacman (@hakman) and Dylan Page (@GenPage) as new chairs.

Release Schedule

Next Deadline: PRR Freeze, October 9

The Kubernetes v1.35 release cycle has officially started and we are now collecting enhancements. Work with your SIG leads to get a lead-opted-in label for your KEPs to get them added to the v1.35 cycle.

Please note that the PRR Freeze is a hard deadline starting v1.35. You can read more about the PRR Freeze deadline here and the exception process here.

Other Merges

Replace HandleCrash with HandleCrashWithContext in apiserver — adds contextual logging

Add case-insensitive DNS subdomain validation via k8s-long-name-caseless format — lets long names be validated without forcing lower case

Enable declarative validation for DeviceClass type in resource APIs and resource APIs (v1, v1beta1, v1beta2) — validation-gen tags + tests.

Ensure cacher and etcd3 use consistent key schema requirements

Add RunWithContext variant to EstablishingController — enables context-aware cancellation and richer logging for controller actions

Use iifname in kube-proxy’s nftables mode for interface matching — improves correct filtering by interface name.

Add k8s-label-key & k8s-label-value formats for declarative validation — enables using those formats in +k8s:format= tags so label keys/values are validated automatically

Honor KUBEADM_UPGRADE_DRYRUN_DIR during kubeadm upgrades

Replace WaitForNamedCacheSync with WaitForNamedCacheSyncWithContext in pkg/controller/ and pkg/controller/garbagecollector

Add fine-grained metrics to distinguish declarative validationmismatches & panics — includes a validation_identifier label for better diagnostics.

Add metric for StatefulSet MaxUnavailable violations — tracks when availability drops below spec’s threshold

Enforce API conventions for Conditions fields — ensures metav1.Condition is used and markers/tags follow standard format

Make admission & pod-security admission checks respect emulation version

Add proper goroutine management in kube-controller-manager to prevent leaks

Update MutatingAdmissionPolicy storage version to use v1beta1

Promotions

Graduate ControlPlaneKubeletLocalMode to GA in kubeadm

Deprecated

Set the deprecated version to 1.34.0 for apiserver_storage_objects metric

Remove automaxprocs workaround now that Go 1.25 manages GOMAXPROCS automatically

Version Updates

golangci-lint to v2.4.0

go language version upgraded to v1.25

system-validators to v1.11.1

Bump Go to 1.25.1, update dependencies & distroless iptables images

Subprojects and Dependency Updates

etcd v3.6.5 fixes lease renewals, snapshot/defrag corruption, removes a flag, builds with Go 1.24.7

kubebuilder v4.9.0 upgrades deps, updates Helm CRDs, fixes Docker builds and CRD handling

prometheus v3.6.0 adds PromQL duration funcs, new TSDB blocks API, OTLP/tracing tweaks, bug fixes

vertical-pod-autoscaler v1.5.0 makes In-Place Updates Beta, deprecates Auto mode, adds metrics, supports K8s 1.34

via Last Week in Kubernetes Development https://lwkd.info/

September 25, 2025 at 07:19PM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending September 21 2025
Announcing Changed Block Tracking API support (alpha)
Announcing Changed Block Tracking API support (alpha)

Announcing Changed Block Tracking API support (alpha)

https://kubernetes.io/blog/2025/09/25/csi-changed-block-tracking/

We're excited to announce the alpha support for a changed block tracking mechanism. This enhances the Kubernetes storage ecosystem by providing an efficient way for CSI storage drivers to identify changed blocks in PersistentVolume snapshots. With a driver that can use the feature, you could benefit from faster and more resource-efficient backup operations.

If you're eager to try this feature, you can skip to the Getting Started section.

What is changed block tracking?

Changed block tracking enables storage systems to identify and track modifications at the block level between snapshots, eliminating the need to scan entire volumes during backup operations. The improvement is a change to the Container Storage Interface (CSI), and also to the storage support in Kubernetes itself. With the alpha feature enabled, your cluster can:

Identify allocated blocks within a CSI volume snapshot

Determine changed blocks between two snapshots of the same volume

Streamline backup operations by focusing only on changed data blocks

For Kubernetes users managing large datasets, this API enables significantly more efficient backup processes. Backup applications can now focus only on the blocks that have changed, rather than processing entire volumes.

Note: As of now, the Changed Block Tracking API is supported only for block volumes and not for file volumes. CSI drivers that manage file-based storage systems will not be able to implement this capability.

Benefits of changed block tracking support in Kubernetes

As Kubernetes adoption grows for stateful workloads managing critical data, the need for efficient backup solutions becomes increasingly important. Traditional full backup approaches face challenges with:

Long backup windows: Full volume backups can take hours for large datasets, making it difficult to complete within maintenance windows.

High resource utilization: Backup operations consume substantial network bandwidth and I/O resources, especially for large data volumes and data-intensive applications.

Increased storage costs: Repetitive full backups store redundant data, causing storage requirements to grow linearly even when only a small percentage of data actually changes between backups.

The Changed Block Tracking API addresses these challenges by providing native Kubernetes support for incremental backup capabilities through the CSI interface.

Key components

The implementation consists of three primary components:

CSI SnapshotMetadata Service API: An API, offered by gRPC, that provides volume snapshot and changed block data.

SnapshotMetadataService API: A Kubernetes CustomResourceDefinition (CRD) that advertises CSI driver metadata service availability and connection details to cluster clients.

External Snapshot Metadata Sidecar: An intermediary component that connects CSI drivers to backup applications via a standardized gRPC interface.

Implementation requirements

Storage provider responsibilities

If you're an author of a storage integration with Kubernetes and want to support the changed block tracking feature, you must implement specific requirements:

Implement CSI RPCs: Storage providers need to implement the SnapshotMetadata service as defined in the CSI specifications protobuf. This service requires server-side streaming implementations for the following RPCs:

GetMetadataAllocated: For identifying allocated blocks in a snapshot

GetMetadataDelta: For determining changed blocks between two snapshots

Storage backend capabilities: Ensure the storage backend has the capability to track and report block-level changes.

Deploy external components: Integrate with the external-snapshot-metadata sidecar to expose the snapshot metadata service.

Register custom resource: Register the SnapshotMetadataService resource using a CustomResourceDefinition and create a SnapshotMetadataService custom resource that advertises the availability of the metadata service and provides connection details.

Support error handling: Implement proper error handling for these RPCs according to the CSI specification requirements.

Backup solution responsibilities

A backup solution looking to leverage this feature must:

Set up authentication: The backup application must provide a Kubernetes ServiceAccount token when using the Kubernetes SnapshotMetadataService API. Appropriate access grants, such as RBAC RoleBindings, must be established to authorize the backup application ServiceAccount to obtain such tokens.

Implement streaming client-side code: Develop clients that implement the streaming gRPC APIs defined in the schema.proto file. Specifically:

Implement streaming client code for GetMetadataAllocated and GetMetadataDelta methods

Handle server-side streaming responses efficiently as the metadata comes in chunks

Process the SnapshotMetadataResponse message format with proper error handling

The external-snapshot-metadata GitHub repository provides a convenient iterator support package to simplify client implementation.

Handle large dataset streaming: Design clients to efficiently handle large streams of block metadata that could be returned for volumes with significant changes.

Optimize backup processes: Modify backup workflows to use the changed block metadata to identify and only transfer changed blocks to make backups more efficient, reducing both backup duration and resource consumption.

Getting started

To use changed block tracking in your cluster:

Ensure your CSI driver supports volume snapshots and implements the snapshot metadata capabilities with the required external-snapshot-metadata sidecar

Make sure the SnapshotMetadataService custom resource is registered using CRD

Verify the presence of a SnapshotMetadataService custom resource for your CSI driver

Create clients that can access the API using appropriate authentication (via Kubernetes ServiceAccount tokens)

The API provides two main functions:

GetMetadataAllocated: Lists blocks allocated in a single snapshot

GetMetadataDelta: Lists blocks changed between two snapshots

What’s next?

Depending on feedback and adoption, the Kubernetes developers hope to push the CSI Snapshot Metadata implementation to Beta in the future releases.

Where can I learn more?

For those interested in trying out this new feature:

Official Kubernetes CSI Developer Documentation

The enhancement proposal for the snapshot metadata feature.

GitHub repository for implementation and release status of external-snapshot-metadata

Complete gRPC protocol definitions for snapshot metadata API: schema.proto

Example snapshot metadata client implementation: snapshot-metadata-lister

End-to-end example with csi-hostpath-driver: example documentation

How do I get involved?

This project, like all of Kubernetes, is the result of hard work by many contributors from diverse backgrounds working together. On behalf of SIG Storage, I would like to offer a huge thank you to the contributors who helped review the design and implementation of the project, including but not limited to the following:

Ben Swartzlander (bswartz)

Carl Braganza (carlbraganza)

Daniil Fedotov (hairyhum)

Ivan Sim (ihcsim)

Nikhil Ladha (Nikhil-Ladha)

Prasad Ghangal (PrasadG193)

Praveen M (iPraveenParihar)

Rakshith R (Rakshith-R)

Xing Yang (xing-yang)

Thank also to everyone who has contributed to the project, including others who helped review the KEP and the CSI spec PR

For those interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). We always welcome new contributors.

The SIG also holds regular Data Protection Working Group meetings. New attendees are welcome to join our discussions.

via Kubernetes Blog https://kubernetes.io/

September 25, 2025 at 09:00AM

·kubernetes.io·
Announcing Changed Block Tracking API support (alpha)
CHAOSScon Africa 2025 Recap - CHAOSS
CHAOSScon Africa 2025 Recap - CHAOSS
CHAOSScon Africa 2025 brought together open source enthusiasts in Africa and beyond to share insights on community health, metrics, data, and inclusivity in open scource.
·chaoss.community·
CHAOSScon Africa 2025 Recap - CHAOSS
Cloud models · Ollama Blog
Cloud models · Ollama Blog
Cloud models are now in preview, letting you run larger models with fast, datacenter-grade hardware. You can keep using your local tools while running larger models that wouldn’t fit on a personal computer.
·ollama.com·
Cloud models · Ollama Blog
Not Every Problem Needs Kubernetes with Danyl Novhorodov
Not Every Problem Needs Kubernetes with Danyl Novhorodov

Not Every Problem Needs Kubernetes, with Danyl Novhorodov

https://ku.bz/BYhFw8RwW

Danyl Novhorodov, a veteran .NET engineer and architect at Eneco, presents his controversial thesis that 90% of teams don't actually need Kubernetes. He walks through practical decision-making frameworks, explores powerful alternatives like BEAM runtimes and Actor models, and explains why starting with modular monoliths often beats premature microservices adoption.

You will learn:

The COST decision framework - How to evaluate infrastructure choices based on Complexity, Ownership, Skills, and Time rather than industry hype

Platform engineering vs. managed services - How to honestly assess whether your team can compete with AWS, Azure, and Google's managed container platforms

Evolutionary architecture approach - Why modular monoliths with clear boundaries often provide better foundations than distributed systems from day one

Sponsor

This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/BYhFw8RwW

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

September 23, 2025 at 06:00AM

·kube.fm·
Not Every Problem Needs Kubernetes with Danyl Novhorodov
Kubernetes v1.34: Pod Level Resources Graduated to Beta
Kubernetes v1.34: Pod Level Resources Graduated to Beta

Kubernetes v1.34: Pod Level Resources Graduated to Beta

https://kubernetes.io/blog/2025/09/22/kubernetes-v1-34-pod-level-resources/

On behalf of the Kubernetes community, I am thrilled to announce that the Pod Level Resources feature has graduated to Beta in the Kubernetes v1.34 release and is enabled by default! This significant milestone introduces a new layer of flexibility for defining and managing resource allocation for your Pods. This flexibility stems from the ability to specify CPU and memory resources for the Pod as a whole. Pod level resources can be combined with the container-level specifications to express the exact resource requirements and limits your application needs.

Pod-level specification for resources

Until recently, resource specifications that applied to Pods were primarily defined at the individual container level. While effective, this approach sometimes required duplicating or meticulously calculating resource needs across multiple containers within a single Pod. As a beta feature, Kubernetes allows you to specify the CPU, memory and hugepages resources at the Pod-level. This means you can now define resource requests and limits for an entire Pod, enabling easier resource sharing without requiring granular, per-container management of these resources where it's not needed.

Why does Pod-level specification matter?

This feature enhances resource management in Kubernetes by offering flexible resource management at both the Pod and container levels.

It provides a consolidated approach to resource declaration, reducing the need for meticulous, per-container management, especially for Pods with multiple containers.

Pod-level resources enable containers within a pod to share unused resoures amongst themselves, promoting efficient utilization within the pod. For example, it prevents sidecar containers from becoming performance bottlenecks. Previously, a sidecar (e.g., a logging agent or service mesh proxy) hitting its individual CPU limit could be throttled and slow down the entire Pod, even if the main application container had plenty of spare CPU. With pod-level resources, the sidecar and the main container can share Pod's resource budget, ensuring smooth operation during traffic spikes - either the whole Pod is throttled or all containers work.

When both pod-level and container-level resources are specified, pod-level requests and limits take precedence. This gives you – and cluster administrators - a powerful way to enforce overall resource boundaries for your Pods.

For scheduling, if a pod-level request is explicitly defined, the scheduler uses that specific value to find a suitable node, insteaf of the aggregated requests of the individual containers. At runtime, the pod-level limit acts as a hard ceiling for the combined resource usage of all containers. Crucially, this pod-level limit is the absolute enforcer; even if the sum of the individual container limits is higher, the total resource consumption can never exceed the pod-level limit.

Pod-level resources are prioritized in influencing the Quality of Service (QoS) class of the Pod.

For Pods running on Linux nodes, the Out-Of-Memory (OOM) score adjustment calculation considers both pod-level and container-level resources requests.

Pod-level resources are designed to be compatible with existing Kubernetes functionalities, ensuring a smooth integration into your workflows.

How to specify resources for an entire Pod

Using PodLevelResources feature gate requires Kubernetes v1.34 or newer for all cluster components, including the control plane and every node. This feature gate is in beta and enabled by default in v1.34.

Example manifest

You can specify CPU, memory and hugepages resources directly in the Pod spec manifest at the resources field for the entire Pod.

Here’s an example demonstrating a Pod with both CPU and memory requests and limits defined at the Pod level:

apiVersion: v1 kind: Pod metadata: name: pod-resources-demo namespace: pod-resources-example spec:

The 'resources' field at the Pod specification level defines the overall

resource budget for all containers within this Pod combined.

resources: # Pod-level resources

'limits' specifies the maximum amount of resources the Pod is allowed to use.

The sum of the limits of all containers in the Pod cannot exceed these values.

limits: cpu: "1" # The entire Pod cannot use more than 1 CPU core. memory: "200Mi" # The entire Pod cannot use more than 200 MiB of memory.

'requests' specifies the minimum amount of resources guaranteed to the Pod.

This value is used by the Kubernetes scheduler to find a node with enough capacity.

requests: cpu: "1" # The Pod is guaranteed 1 CPU core when scheduled. memory: "100Mi" # The Pod is guaranteed 100 MiB of memory when scheduled. containers:

  • name: main-app-container image: nginx ... # This container has no resource requests or limits specified.
  • name: auxiliary-container image: fedora command: ["sleep", "inf"] ... # This container has no resource requests or limits specified.

In this example, the pod-resources-demo Pod as a whole requests 1 CPU and 100 MiB of memory, and is limited to 1 CPU and 200 MiB of memory. The containers within will operate under these overall Pod-level constraints, as explained in the next section.

Interaction with container-level resource requests or limits

When both pod-level and container-level resources are specified, pod-level requests and limits take precedence. This means the node allocates resources based on the pod-level specifications.

Consider a Pod with two containers where pod-level CPU and memory requests and limits are defined, and only one container has its own explicit resource definitions:

apiVersion: v1 kind: Pod metadata: name: pod-resources-demo namespace: pod-resources-example spec: resources: limits: cpu: "1" memory: "200Mi" requests: cpu: "1" memory: "100Mi" containers:

  • name: main-app-container image: nginx resources: requests: cpu: "0.5" memory: "50Mi"
  • name: auxiliary-container image: fedora command: [ "sleep", "inf"] # This container has no resource requests or limits specified.

Pod-Level Limits: The pod-level limits (cpu: "1", memory: "200Mi") establish an absolute boundary for the entire Pod. The sum of resources consumed by all its containers is enforced at this ceiling and cannot be surpassed.

Resource Sharing and Bursting: Containers can dynamically borrow any unused capacity, allowing them to burst as needed, so long as the Pod's aggregate usage stays within the overall limit.

Pod-Level Requests: The pod-level requests (cpu: "1", memory: "100Mi") serve as the foundational resource guarantee for the entire Pod. This value informs the scheduler's placement decision and represents the minimum resources the Pod can rely on during node-level contention.

Container-Level Requests: Container-level requests create a priority system within the Pod's guaranteed budget. Because main-app-container has an explicit request (cpu: "0.5", memory: "50Mi"), it is given precedence for its share of resources under resource pressure over the auxiliary-container, which has no such explicit claim.

Limitations

First of all, in-place resize of pod-level resources is not supported for Kubernetes v1.34 (or earlier). Attempting to modify the pod-level resource limits or requests on a running Pod results in an error: the resize is rejected. The v1.34 implementation of Pod level resources focuses on allowing initial declaration of an overall resource envelope, that applies to the entire Pod. That is distinct from in-place pod resize, which (despite what the name might suggest) allows you to make dynamic adjustments to container resource requests and limits, within a running Pod, and potentially without a container restart. In-place resizing is also not yet a stable feature; it graduated to Beta in the v1.33 release.

Only CPU, memory, and hugepages resources can be specified at pod-level.

Pod-level resources are not supported for Windows pods. If the Pod specification explicitly targets Windows (e.g., by setting spec.os.name: "windows"), the API server will reject the Pod during the validation step. If the Pod is not explicitly marked for Windows but is scheduled to a Windows node (e.g., via a nodeSelector), the Kubelet on that Windows node will reject the Pod during its admission process.

The Topology Manager, Memory Manager and CPU Manager do not align pods and containers based on pod-level resources as these resource managers don't currently support pod-level resources.

Getting started and providing feedback

Ready to explore Pod Level Resources feature? You'll need a Kubernetes cluster running version 1.34 or later. Remember to enable the PodLevelResources feature gate across your control plane and all nodes.

As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels:

Slack: #sig-node

Mailing list

Open Community Issues/PRs

via Kubernetes Blog https://kubernetes.io/

September 22, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.34: Pod Level Resources Graduated to Beta
Blog: Spotlight on the Kubernetes Steering Committee
Blog: Spotlight on the Kubernetes Steering Committee

Blog: Spotlight on the Kubernetes Steering Committee

https://www.kubernetes.dev/blog/2025/09/22/k8s-steering-spotlight-2025/

This interview was conducted in August 2024, and due to the dynamic nature of the Steering Committee membership and election process it might not represent the actual composition accurately. The topics covered are, however, overwhelmingly relevant to understand its scope of work. As we approach the Steering Committee elections, it provides useful insights into the workings of the Committee.

The Kubernetes Steering Committee is the backbone of the Kubernetes project, ensuring that its vibrant community and governance structures operate smoothly and effectively. While the technical brilliance of Kubernetes is often spotlighted through its Special Interest Groups (SIGs) and Working Groups (WGs), the unsung heroes quietly steering the ship are the members of the Steering Committee. They tackle complex organizational challenges, empower contributors, and foster the thriving open source ecosystem that Kubernetes is celebrated for.

But what does it really take to lead one of the world’s largest open source communities? What are the hidden challenges, and what drives these individuals to dedicate their time and effort to such an impactful role? In this exclusive conversation, we sit down with current Steering Committee (SC) members — Ben, Nabarun, Paco, Patrick, and Maciej — to uncover the rewarding, and sometimes demanding, realities of steering Kubernetes. From their personal journeys and motivations to the committee’s vital responsibilities and future outlook, this Spotlight offers a rare behind-the-scenes glimpse into the people who keep Kubernetes on course.

Introductions

Sandipan: Can you tell us a little bit about yourself?

Ben: Hi, I’m Benjamin Elder, also known as BenTheElder. I started in Kubernetes as a Google Summer of Code student in 2015 and have been working at Google in the space since 2017. I have contributed a lot to many areas but especially build, CI, test tooling, etc. My favorite project so far was building KIND. I have been on the release team, a chair of SIG Testing, and currently a tech lead of SIG Testing and SIG K8s Infra.

Nabarun: Hi, I am Nabarun from India. I have been working on Kubernetes since 2019. I have been contributing across multiple areas in Kubernetes: SIG ContribEx (where I am also a chair), API Machinery, Architecture, and SIG Release, where I contributed to several releases including being the Release Team Lead of Kubernetes 1.21.

Paco: I am Paco from China. I worked as an open source team lead in DaoCloud, Shanghai. In the community, I participate mainly in kubeadm, SIG Node and SIG Testing. Besides, I helped in KCD China and was co-chair of the recent KubeCon+CloudNativeCon China 2024 in Hong Kong.

Patrick: Hello! I’m Patrick. I’ve contributed to Kubernetes since 2018. I started in SIG Storage and then got involved in more and more areas. Nowadays, I am a SIG Testing tech lead, logging infrastructure maintainer, organizer of the Structured Logging and Device Management working groups, contributor in SIG Scheduling, and of course member of the Steering Committee. My main focus area currently is Dynamic Resource Allocation (DRA), a new API for accelerators.

Maciej: Hey, my name is Maciej and I’ve been working on Kubernetes since late 2014 in various areas, including controllers, apiserver and kubectl. Aside from being part of the Steering Committee, I’m also helping guide SIG CLI, SIG Apps and WG Batch.

About the Steering Committee

Sandipan: What does Steering do?

Ben: The charter is the definitive answer, but I see Steering as helping resolve Kubernetes-organization-level “people problems” (as opposed to technical problems), such as clarifying project governance and liaising with the Cloud Native Computing Foundation (for example, to request additional resources and support) and other CNCF projects.

Maciej: Our charter nicely describes all the responsibilities. In short, we make sure the project runs smoothly by supporting our maintainers and contributors in their daily tasks.

Patrick: Ideally, we don’t do anything 😀 All of the day-to-day business has been delegated to SIGs and WGs. Steering gets involved when something pops up where it isn’t obvious who should handle it or when conflicts need to be resolved.

**Sandipan: And how is Steering different from SIGs?

Ben: From a governance perspective: Steering delegates all of the ownership of subprojects to the SIGs and/or committees (Security Response, Code Of Conduct, etc.). They’re very different. The SIGs own pieces of the project, and Steering handles some of the overarching people and policy issues. You’ll find all of the software development, releasing, communications and documentation work happening in the SIGs and committees.

Maciej: SIGs or WGs are primarily concerned with the technical direction of a particular area in Kubernetes. Steering, on the other hand, is primarily concerned with ensuring all the SIGs, WGs, and most importantly maintainers have everything they need to run the project smoothly. This includes anything from ensuring financing of our CI systems, through governance structures and policies all the way to supporting individual maintainers in various inquiries.

**Sandipan: You’ve mentioned projects, could you give us an example of a project Steering has worked on recently?

Ben: We’ve been discussing the logistics to sync a better definition of the project’s official maintainers to the CNCF, which are used, for example, to vote for the Technical Oversight Committee (TOC). Currently that list is the Steering Committee, with SIG Contributor Experience and Infra + Release leads having access to the CNCF service desk. This isn’t well standardized yet across CNCF projects but I think it’s important.

Maciej: For the past year I’ve been sitting on the SC, I believe the majority of tasks we’ve been involved in were around providing letters supporting visa applications. Also, like every year, we’ve been helping all the SIGs and WGs with their annual reports.

Patrick: Apparently it has been a quiet year since Maciej and I joined the Steering Committee at the end of 2023. That’s exactly how it should be.

Sandipan: Do you have any examples of projects that came to Steering, which you then redirected to SIGs?

Ben: We often get requests for test/build related resources that we redirect to SIG K8s Infra + SIG Testing, or more specifically about releasing for subprojects that we redirect to SIG K8s Infra / SIG Release.

The road to the Steering Committee

Sandipan: What motivated you to be part of the Steering Committee? What has your journey been like?

Ben: I had a few people reach out and prompt me to run, but I was motivated by my passion for this community and the project. I think we have something really special going here and I care deeply about the ongoing success. I’ve been involved in this space my whole career and while there’s always rough edges, this community has been really supportive and I hope we can keep it that way.

Paco: After my journey to the Kubernetes Contributor Summit EU 2023, I met and chatted with many maintainers and members there, and attended the steering AMA there for the first time as there hadn’t been a contributor summit in China since 2019, and I started to connect with contributors in China to make it later the year. Through conversations at KCS EU and with local contributors, I realized that it is quite important to make it easy to start a contributor journey for APAC contributors and want to attract more contributors to the community. Then, I was elected just after the KCS CN 2023.

Patrick: I had done a lot of technical work, of which some affects and (hopefully) benefits all contributors to Kubernetes (linting and testing) and users (better log output). I saw joining the Steering Committee as an opportunity to help also with the organizational aspects of running a big open source project.

Maciej: I’ve been going through the idea of running for SC for a while now. My biggest drive was conversations with various members of our community. Eventually last year, I decided to follow their advice, and got elected :-)

Sandipan: What is your favorite part of being part of Steering?

Ben: When we get to help contributors directly. For example, sometimes extensive contributors reach out for an official letter from Steering explaining their contribution and its value for visa support. When we get to just purely help out Kubernetes contributors, that’s my favorite part.

Patrick: It’s a good place to learn more about how the project is actually run, directly from the other great people who are doing it.

Maciej: The same thing as with the project — it’s always the people that surround us, that give us opportunities to collaborate and create something interesting and exciting.

Sandipan: What do you think is most challenging about being part of Steering?

Ben: I think we’ve all spent a lot of time grappling with the sustainability issues in the project and not having a single great answer to solve them. A lot of people are working on these problems but we have limited time and resources. We’ve officially delegated most of this (for example, to SIGs Contributor Experience and K8s Infra), but I think we all still consider it very important and deserving of more time and energy, yet we only have so much and the answers are not obvious. The balancing act is hard.

Paco: Sustainability of contributors and maintainers is one of the most challenging aspects to me. I am constantly advocating for OSS users and employers to join the community. Community is a place that developers can learn from each other, discuss issues they encounter, and share their experience or solutions. Ensuring everyone in the community to feel supported and valued is crucial for the long-term health of the project.

Patrick: There is documentation about how things are done,

·kubernetes.dev·
Blog: Spotlight on the Kubernetes Steering Committee