Suggested Reads

Suggested Reads

54832 bookmarks
Newest
Kubernetes v1.33: From Secrets to Service Accounts: Kubernetes Image Pulls Evolved
Kubernetes v1.33: From Secrets to Service Accounts: Kubernetes Image Pulls Evolved

Kubernetes v1.33: From Secrets to Service Accounts: Kubernetes Image Pulls Evolved

https://kubernetes.io/blog/2025/05/07/kubernetes-v1-33-wi-for-image-pulls/

Kubernetes has steadily evolved to reduce reliance on long-lived credentials stored in the API. A prime example of this shift is the transition of Kubernetes Service Account (KSA) tokens from long-lived, static tokens to ephemeral, automatically rotated tokens with OpenID Connect (OIDC)-compliant semantics. This advancement enables workloads to securely authenticate with external services without needing persistent secrets.

However, one major gap remains: image pull authentication. Today, Kubernetes clusters rely on image pull secrets stored in the API, which are long-lived and difficult to rotate, or on node-level kubelet credential providers, which allow any pod running on a node to access the same credentials. This presents security and operational challenges.

To address this, Kubernetes is introducing Service Account Token Integration for Kubelet Credential Providers, now available in alpha. This enhancement allows credential providers to use pod-specific service account tokens to obtain registry credentials, which kubelet can then use for image pulls — eliminating the need for long-lived image pull secrets.

The problem with image pull secrets

Currently, Kubernetes administrators have two primary options for handling private container image pulls:

Image pull secrets stored in the Kubernetes API

These secrets are often long-lived because they are hard to rotate.

They must be explicitly attached to a service account or pod.

Compromise of a pull secret can lead to unauthorized image access.

Kubelet credential providers

These providers fetch credentials dynamically at the node level.

Any pod running on the node can access the same credentials.

There’s no per-workload isolation, increasing security risks.

Neither approach aligns with the principles of least privilege or ephemeral authentication, leaving Kubernetes with a security gap.

The solution: Service Account token integration for Kubelet credential providers

This new enhancement enables kubelet credential providers to use workload identity when fetching image registry credentials. Instead of relying on long-lived secrets, credential providers can use service account tokens to request short-lived credentials tied to a specific pod’s identity.

This approach provides:

Workload-specific authentication: Image pull credentials are scoped to a particular workload.

Ephemeral credentials: Tokens are automatically rotated, eliminating the risks of long-lived secrets.

Seamless integration: Works with existing Kubernetes authentication mechanisms, aligning with cloud-native security best practices.

How it works

  1. Service Account tokens for credential providers

Kubelet generates short-lived, automatically rotated tokens for service accounts if the credential provider it communicates with has opted into receiving a service account token for image pulls. These tokens conform to OIDC ID token semantics and are provided to the credential provider as part of the CredentialProviderRequest. The credential provider can then use this token to authenticate with an external service.

  1. Image registry authentication flow

When a pod starts, the kubelet requests credentials from a credential provider.

If the credential provider has opted in, the kubelet generates a service account token for the pod.

The service account token is included in the CredentialProviderRequest, allowing the credential provider to authenticate and exchange it for temporary image pull credentials from a registry (e.g. AWS ECR, GCP Artifact Registry, Azure ACR).

The kubelet then uses these credentials to pull images on behalf of the pod.

Benefits of this approach

Security: Eliminates long-lived image pull secrets, reducing attack surfaces.

Granular Access Control: Credentials are tied to individual workloads rather than entire nodes or clusters.

Operational Simplicity: No need for administrators to manage and rotate image pull secrets manually.

Improved Compliance: Helps organizations meet security policies that prohibit persistent credentials in the cluster.

What's next?

For Kubernetes v1.34, we expect to ship this feature in beta while continuing to gather feedback from users.

In the coming releases, we will focus on:

Implementing caching mechanisms to improve performance for token generation.

Giving more flexibility to credential providers to decide how the registry credentials returned to the kubelet are cached.

Making the feature work with Ensure Secret Pulled Images to ensure pods that use an image are authorized to access that image when service account tokens are used for authentication.

You can learn more about this feature on the service account token for image pulls page in the Kubernetes documentation.

You can also follow along on the KEP-4412 to track progress across the coming Kubernetes releases.

Try it out

To try out this feature:

Ensure you are running Kubernetes v1.33 or later.

Enable the ServiceAccountTokenForKubeletCredentialProviders feature gate on the kubelet.

Ensure credential provider support: Modify or update your credential provider to use service account tokens for authentication.

Update the credential provider configuration to opt into receiving service account tokens for the credential provider by configuring the tokenAttributes field.

Deploy a pod that uses the credential provider to pull images from a private registry.

We would love to hear your feedback on this feature. Please reach out to us on the

sig-auth-authenticators-dev

channel on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/).

How to get involved

If you are interested in getting involved in the development of this feature, sharing feedback, or participating in any other ongoing SIG Auth projects, please reach out on the

sig-auth

channel on Kubernetes Slack.

You are also welcome to join the bi-weekly SIG Auth meetings, held every other Wednesday.

via Kubernetes Blog https://kubernetes.io/

May 07, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: From Secrets to Service Accounts: Kubernetes Image Pulls Evolved
DevOps Toolkit - Ep21 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=TWKmRwBaEEU
DevOps Toolkit - Ep21 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=TWKmRwBaEEU

Ep21 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, Cloud, Kubernetes, Platform Engineering, containers, or anything else. We'll have special guests Scott Rosenberg and Ramiro Berrelleza to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 Codefresh GitOps Cloud: https://codefresh.io ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=TWKmRwBaEEU

·youtube.com·
DevOps Toolkit - Ep21 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=TWKmRwBaEEU
Kubernetes v1.33: Fine-grained SupplementalGroups Control Graduates to Beta
Kubernetes v1.33: Fine-grained SupplementalGroups Control Graduates to Beta

Kubernetes v1.33: Fine-grained SupplementalGroups Control Graduates to Beta

https://kubernetes.io/blog/2025/05/06/kubernetes-v1-33-fine-grained-supplementalgroups-control-beta/

The new field, supplementalGroupsPolicy, was introduced as an opt-in alpha feature for Kubernetes v1.31 and has graduated to beta in v1.33; the corresponding feature gate (SupplementalGroupsPolicy) is now enabled by default. This feature enables to implement more precise control over supplemental groups in containers that can strengthen the security posture, particularly in accessing volumes. Moreover, it also enhances the transparency of UID/GID details in containers, offering improved security oversight.

Please be aware that this beta release contains some behavioral breaking change. See The Behavioral Changes Introduced In Beta and Upgrade Considerations sections for details.

Motivation: Implicit group memberships defined in /etc/group in the container image

Although the majority of Kubernetes cluster admins/users may not be aware, kubernetes, by default, merges group information from the Pod with information defined in /etc/group in the container image.

Let's see an example, below Pod manifest specifies runAsUser=1000, runAsGroup=3000 and supplementalGroups=4000 in the Pod's security context.

apiVersion: v1 kind: Pod metadata: name: implicit-groups spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] containers:

  • name: ctr image: registry.k8s.io/e2e-test-images/agnhost:2.45 command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false

What is the result of id command in the ctr container? The output should be similar to this:

uid=1000 gid=3000 groups=3000,4000,50000

Where does group ID 50000 in supplementary groups (groups field) come from, even though 50000 is not defined in the Pod's manifest at all? The answer is /etc/group file in the container image.

Checking the contents of /etc/group in the container image should show below:

user-defined-in-image:x:1000: group-defined-in-image:x:50000:user-defined-in-image

This shows that the container's primary user 1000 belongs to the group 50000 in the last entry.

Thus, the group membership defined in /etc/group in the container image for the container's primary user is implicitly merged to the information from the Pod. Please note that this was a design decision the current CRI implementations inherited from Docker, and the community never really reconsidered it until now.

What's wrong with it?

The implicitly merged group information from /etc/group in the container image poses a security risk. These implicit GIDs can't be detected or validated by policy engines because there's no record of them in the Pod manifest. This can lead to unexpected access control issues, particularly when accessing volumes (see kubernetes/kubernetes#112879 for details) because file permission is controlled by UID/GIDs in Linux.

Fine-grained supplemental groups control in a Pod: supplementaryGroupsPolicy

To tackle the above problem, Pod's .spec.securityContext now includes supplementalGroupsPolicy field.

This field lets you control how Kubernetes calculates the supplementary groups for container processes within a Pod. The available policies are:

Merge: The group membership defined in /etc/group for the container's primary user will be merged. If not specified, this policy will be applied (i.e. as-is behavior for backward compatibility).

Strict: Only the group IDs specified in fsGroup, supplementalGroups, or runAsGroup are attached as supplementary groups to the container processes. Group memberships defined in /etc/group for the container's primary user are ignored.

Let's see how Strict policy works. Below Pod manifest specifies supplementalGroupsPolicy: Strict:

apiVersion: v1 kind: Pod metadata: name: strict-supplementalgroups-policy spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] supplementalGroupsPolicy: Strict containers:

  • name: ctr image: registry.k8s.io/e2e-test-images/agnhost:2.45 command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false

The result of id command in the ctr container should be similar to this:

uid=1000 gid=3000 groups=3000,4000

You can see Strict policy can exclude group 50000 from groups!

Thus, ensuring supplementalGroupsPolicy: Strict (enforced by some policy mechanism) helps prevent the implicit supplementary groups in a Pod.

Note: A container with sufficient privileges can change its process identity. The supplementalGroupsPolicy only affect the initial process identity. See the following section for details.

Attached process identity in Pod status

This feature also exposes the process identity attached to the first container process of the container via .status.containerStatuses[].user.linux field. It would be helpful to see if implicit group IDs are attached.

... status: containerStatuses:

  • name: ctr user: linux: gid: 3000 supplementalGroups:
  • 3000
  • 4000 uid: 1000 ...

Note: Please note that the values in status.containerStatuses[].user.linux field is the firstly attached process identity to the first container process in the container. If the container has sufficient privilege to call system calls related to process identity (e.g. setuid(2), setgid(2) or setgroups(2), etc.), the container process can change its identity. Thus, the actual process identity will be dynamic.

Strict Policy requires newer CRI versions

Actually, CRI runtime (e.g. containerd, CRI-O) plays a core role for calculating supplementary group ids to be attached to the containers. Thus, SupplementalGroupsPolicy=Strict requires a CRI runtime that support this feature (SupplementalGroupsPolicy: Merge can work with the CRI runtime which does not support this feature because this policy is fully backward compatible policy).

Here are some CRI runtimes that support this feature, and the versions you need to be running:

containerd: v2.0 or later

CRI-O: v1.31 or later

And, you can see if the feature is supported in the Node's .status.features.supplementalGroupsPolicy field.

apiVersion: v1 kind: Node ... status: features: supplementalGroupsPolicy: true

The behavioral changes introduced in beta

In the alpha release, when a Pod with supplementalGroupsPolicy: Strict was scheduled to a node that did not support the feature (i.e., .status.features.supplementalGroupsPolicy=false), the Pod's supplemental groups policy silently fell back to Merge.

In v1.33, this has entered beta to enforce the policy more strictly, where kubelet rejects pods whose nodes cannot ensure the specified policy. If your pod is rejected, you will see warning events with reason=SupplementalGroupsPolicyNotSupported like below:

apiVersion: v1 kind: Event ... type: Warning reason: SupplementalGroupsPolicyNotSupported message: "SupplementalGroupsPolicy=Strict is not supported in this node" involvedObject: apiVersion: v1 kind: Pod ...

Upgrade consideration

If you're already using this feature, especially the supplementalGroupsPolicy: Strict policy, we assume that your cluster's CRI runtimes already support this feature. In that case, you don't need to worry about the pod rejections described above.

However, if your cluster:

uses the supplementalGroupsPolicy: Strict policy, but

its CRI runtimes do NOT yet support the feature (i.e., .status.features.supplementalGroupsPolicy=false),

you need to prepare the behavioral changes (pod rejection) when upgrading your cluster.

We recommend several ways to avoid unexpected pod rejections:

Upgrading your cluster's CRI runtimes together with kubernetes or before the upgrade

Putting some label to your nodes describing CRI runtime supports this feature or not and also putting label selector to pods with Strict policy to select such nodes (but, you will need to monitor the number of Pending pods in this case instead of pod rejections).

Getting involved

This feature is driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you!

How can I learn more?

Configure a Security Context for a Pod or Container for the further details of supplementalGroupsPolicy

KEP-3619: Fine-grained SupplementalGroups control

via Kubernetes Blog https://kubernetes.io/

May 06, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: Fine-grained SupplementalGroups Control Graduates to Beta
Beyond the Repository: Best practices for open source ecosystems researchers
Beyond the Repository: Best practices for open source ecosystems researchers
Much of the existing research about open source elects to study software repositories instead of ecosystems. An open source repository most often refers to the artifacts recorded in a version control system and occasionally includes interactions around ...
·dl.acm.org·
Beyond the Repository: Best practices for open source ecosystems researchers
Marcus Noble's tips on giving technical talks
Marcus Noble's tips on giving technical talks
I've been giving talks at meetups and conferences for a few years now. I started off after the encouragement of my friends giving their own talk and looking so cool doing it! It's taken a while but I think I'm at a stage now where I'm not only good at it (at least I hope so 😅) but I feel confident and comfortable while doing it. I want everyone to have that same confidence and I want to hear ALL OF YOU giving talks too! You have stories to tell, lessons to share and experience to pass on. So here is my learnings on how I approach giving a talk in front of a crowd of techies, mainly focussed on technical talks but most of this should apply to most public speaking.
·marcusnoble.co.uk·
Marcus Noble's tips on giving technical talks
The valley of engineering despair
The valley of engineering despair
I have delivered a lot of successful engineering projects. When I start on a project, I’m now very (perhaps unreasonably) confident that I will ship it…
·seangoedecke.com·
The valley of engineering despair
A New Kali Linux Archive Signing Key | Kali Linux Blog
A New Kali Linux Archive Signing Key | Kali Linux Blog
TL;DR Bad news for Kali Linux users! In the coming day(s), apt update is going to fail for pretty much everyone out there: Missing key 827C8569F2518CC677FECA1AED65462EC8D5E4C5, which is needed to verify signature. Reason is, we had to roll a new signing key for the Kali repository. You need to download and install the new key manually, here’s the one-liner:
·kali.org·
A New Kali Linux Archive Signing Key | Kali Linux Blog
CNCF and Synadia Reach an Agreement on NATS
CNCF and Synadia Reach an Agreement on NATS
For a minute there, it looked like we were in for an ugly, legal fight over control of the NATS messaging system. But Synadia has backed off, and all's well now.
·thenewstack.io·
CNCF and Synadia Reach an Agreement on NATS
Kubernetes upgrades: beyond the one-click update with Tanat Lokejaroenlarb
Kubernetes upgrades: beyond the one-click update with Tanat Lokejaroenlarb

Kubernetes upgrades: beyond the one-click update, with Tanat Lokejaroenlarb

https://ku.bz/VVHFfXGl_

Discover how Adevinta manages Kubernetes upgrades at scale in this episode with Tanat Lokejaroenlarb. Tanat shares his team's journey from time-consuming blue-green deployments to efficient in-place upgrades for their multi-tenant Kubernetes platform SHIP, detailing the engineering decisions and operational challenges they overcame.

You will learn:

How to transition from blue-green to in-place Kubernetes upgrades while maintaining service reliability

Techniques for tracking and addressing API deprecations using tools like Pluto and Kube-no-trouble

Strategies for minimizing SLO impact during node rebuilds through serialized approaches and proper PDB configuration

Why a phased upgrade approach with "cluster waves" provides safer production deployments even with thorough testing

Sponsor

This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/VVHFfXGl_

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

May 06, 2025 at 06:00AM

·kube.fm·
Kubernetes upgrades: beyond the one-click update with Tanat Lokejaroenlarb
Stop the madness! Weather radar is not a weapon. It saves lives every day! | A militarized conspiracy theorist group believes radars are ‘weather weapons’ and is trying to destroy them | CNN
Stop the madness! Weather radar is not a weapon. It saves lives every day! | A militarized conspiracy theorist group believes radars are ‘weather weapons’ and is trying to destroy them | CNN
National Weather Service offices around the country are on guard after recent threats to agency infrastructure — specifically Doppler weather radars — from a violent militia-style group, emails from the National Oceanic and Atmospheric Administration’s security office show.
·cnn.com·
Stop the madness! Weather radar is not a weapon. It saves lives every day! | A militarized conspiracy theorist group believes radars are ‘weather weapons’ and is trying to destroy them | CNN
Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA
Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA

Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA

https://kubernetes.io/blog/2025/05/05/kubernetes-v1-33-prevent-persistentvolume-leaks-when-deleting-out-of-order-graduate-to-ga/

I am thrilled to announce that the feature to prevent PersistentVolume (or PVs for short) leaks when deleting out of order has graduated to General Availability (GA) in Kubernetes v1.33! This improvement, initially introduced as a beta feature in Kubernetes v1.31, ensures that your storage resources are properly reclaimed, preventing unwanted leaks.

How did reclaim work in previous Kubernetes releases?

PersistentVolumeClaim (or PVC for short) is a user's request for storage. A PV and PVC are considered Bound if a newly created PV or a matching PV is found. The PVs themselves are backed by volumes allocated by the storage backend.

Normally, if the volume is to be deleted, then the expectation is to delete the PVC for a bound PV-PVC pair. However, there are no restrictions on deleting a PV before deleting a PVC.

For a Bound PV-PVC pair, the ordering of PV-PVC deletion determines whether the PV reclaim policy is honored. The reclaim policy is honored if the PVC is deleted first; however, if the PV is deleted prior to deleting the PVC, then the reclaim policy is not exercised. As a result of this behavior, the associated storage asset in the external infrastructure is not removed.

PV reclaim policy with Kubernetes v1.33

With the graduation to GA in Kubernetes v1.33, this issue is now resolved. Kubernetes now reliably honors the configured Delete reclaim policy, even when PVs are deleted before their bound PVCs. This is achieved through the use of finalizers, ensuring that the storage backend releases the allocated storage resource as intended.

How does it work?

For CSI volumes, the new behavior is achieved by adding a finalizer external-provisioner.volume.kubernetes.io/finalizer on new and existing PVs. The finalizer is only removed after the storage from the backend is deleted. Addition or removal of finalizer is handled by external-provisioner `

An example of a PV with the finalizer, notice the new finalizer in the finalizers list

kubectl get pv pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 -o yaml

apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: csi.example.driver.com creationTimestamp: "2021-11-17T19:28:56Z" finalizers:

  • kubernetes.io/pv-protection
  • external-provisioner.volume.kubernetes.io/finalizer name: pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 resourceVersion: "194711" uid: 087f14f2-4157-4e95-8a70-8294b039d30e spec: accessModes:
  • ReadWriteOnce capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: example-vanilla-block-pvc namespace: default resourceVersion: "194677" uid: a7b7e3ba-f837-45ba-b243-dec7d8aaed53 csi: driver: csi.example.driver.com fsType: ext4 volumeAttributes: storage.kubernetes.io/csiProvisionerIdentity: 1637110610497-8081-csi.example.driver.com type: CNS Block Volume volumeHandle: 2dacf297-803f-4ccc-afc7-3d3c3f02051e persistentVolumeReclaimPolicy: Delete storageClassName: example-vanilla-block-sc volumeMode: Filesystem status: phase: Bound

The finalizer prevents this PersistentVolume from being removed from the cluster. As stated previously, the finalizer is only removed from the PV object after it is successfully deleted from the storage backend. To learn more about finalizers, please refer to Using Finalizers to Control Deletion.

Similarly, the finalizer kubernetes.io/pv-controller is added to dynamically provisioned in-tree plugin volumes.

Important note

The fix does not apply to statically provisioned in-tree plugin volumes.

How to enable new behavior?

To take advantage of the new behavior, you must have upgraded your cluster to the v1.33 release of Kubernetes and run the CSI external-provisioner version 5.0.1 or later. The feature was released as beta in v1.31 release of Kubernetes, where it was enabled by default.

References

KEP-2644

Volume leak issue

Beta Release Blog

How do I get involved?

The Kubernetes Slack channel SIG Storage communication channels are great mediums to reach out to the SIG Storage and migration working group teams.

Special thanks to the following people for the insightful reviews, thorough consideration and valuable contribution:

Fan Baofa (carlory)

Jan Šafránek (jsafrane)

Xing Yang (xing-yang)

Matthew Wong (wongma7)

Join the Kubernetes Storage Special Interest Group (SIG) if you're interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system. We’re rapidly growing and always welcome new contributors.

via Kubernetes Blog https://kubernetes.io/

May 05, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA
It's a Trap! The Two Generals' Problem
It's a Trap! The Two Generals' Problem
In distributed systems, coordination is hard—really hard—especially when both parties depend on mutual confirmation to proceed, but there’s no guarantee their messages will arrive. This classic…
·particular.net·
It's a Trap! The Two Generals' Problem
We the builders - we find the truth and tell the truth
We the builders - we find the truth and tell the truth
For decades, we've done our jobs in the background. We helped with filing taxes, getting veterans' benefits, applying for financial aid, refugees navigating immigration, everyone find vaccines, parents find baby formula
·wethebuilders.org·
We the builders - we find the truth and tell the truth
Kubernetes v1.33: Mutable CSI Node Allocatable Count
Kubernetes v1.33: Mutable CSI Node Allocatable Count

Kubernetes v1.33: Mutable CSI Node Allocatable Count

https://kubernetes.io/blog/2025/05/02/kubernetes-1-33-mutable-csi-node-allocatable-count/

Scheduling stateful applications reliably depends heavily on accurate information about resource availability on nodes. Kubernetes v1.33 introduces an alpha feature called mutable CSI node allocatable count, allowing Container Storage Interface (CSI) drivers to dynamically update the reported maximum number of volumes that a node can handle. This capability significantly enhances the accuracy of pod scheduling decisions and reduces scheduling failures caused by outdated volume capacity information.

Background

Traditionally, Kubernetes CSI drivers report a static maximum volume attachment limit when initializing. However, actual attachment capacities can change during a node's lifecycle for various reasons, such as:

Manual or external operations attaching/detaching volumes outside of Kubernetes control.

Dynamically attached network interfaces or specialized hardware (GPUs, NICs, etc.) consuming available slots.

Multi-driver scenarios, where one CSI driver’s operations affect available capacity reported by another.

Static reporting can cause Kubernetes to schedule pods onto nodes that appear to have capacity but don't, leading to pods stuck in a ContainerCreating state.

Dynamically adapting CSI volume limits

With the new feature gate MutableCSINodeAllocatableCount, Kubernetes enables CSI drivers to dynamically adjust and report node attachment capacities at runtime. This ensures that the scheduler has the most accurate, up-to-date view of node capacity.

How it works

When this feature is enabled, Kubernetes supports two mechanisms for updating the reported node volume limits:

Periodic Updates: CSI drivers specify an interval to periodically refresh the node's allocatable capacity.

Reactive Updates: An immediate update triggered when a volume attachment fails due to exhausted resources (ResourceExhausted error).

Enabling the feature

To use this alpha feature, you must enable the MutableCSINodeAllocatableCount feature gate in these components:

kube-apiserver

kubelet

Example CSI driver configuration

Below is an example of configuring a CSI driver to enable periodic updates every 60 seconds:

apiVersion: storage.k8s.io/v1 kind: CSIDriver metadata: name: example.csi.k8s.io spec: nodeAllocatableUpdatePeriodSeconds: 60

This configuration directs Kubelet to periodically call the CSI driver's NodeGetInfo method every 60 seconds, updating the node’s allocatable volume count. Kubernetes enforces a minimum update interval of 10 seconds to balance accuracy and resource usage.

Immediate updates on attachment failures

In addition to periodic updates, Kubernetes now reacts to attachment failures. Specifically, if a volume attachment fails with a ResourceExhausted error (gRPC code 8), an immediate update is triggered to correct the allocatable count promptly.

This proactive correction prevents repeated scheduling errors and helps maintain cluster health.

Getting started

To experiment with mutable CSI node allocatable count in your Kubernetes v1.33 cluster:

Enable the feature gate MutableCSINodeAllocatableCount on the kube-apiserver and kubelet components.

Update your CSI driver configuration by setting nodeAllocatableUpdatePeriodSeconds.

Monitor and observe improvements in scheduling accuracy and pod placement reliability.

Next steps

This feature is currently in alpha and the Kubernetes community welcomes your feedback. Test it, share your experiences, and help guide its evolution toward beta and GA stability.

Join discussions in the Kubernetes Storage Special Interest Group (SIG-Storage) to shape the future of Kubernetes storage capabilities.

via Kubernetes Blog https://kubernetes.io/

May 02, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: Mutable CSI Node Allocatable Count
FIPS 140: The Best Explanation Ever (Hopefully)
FIPS 140: The Best Explanation Ever (Hopefully)
Cryptography = modern cyber security. Full stop. It is at the core of everyone’s lives from buying a latte with Apple/Google Pay, messaging our friends, and even just checking the online news…
·itnext.io·
FIPS 140: The Best Explanation Ever (Hopefully)
Simplifying HPC: CIQ Releases User-Friendly UI and API for Warewulf - TFiR
Simplifying HPC: CIQ Releases User-Friendly UI and API for Warewulf - TFiR
CIQ has announced the tech preview release of a user-friendly management interface for the Warewulf cluster provisioning system. This new web interface, built on Cockpit and backed by a new, open source API, is designed to simplify and streamline management of high-performance computing (HPC) clusters. This new capability simplifies cluster administration for new and existing
·tfir.io·
Simplifying HPC: CIQ Releases User-Friendly UI and API for Warewulf - TFiR