1_r/devopsish

1_r/devopsish

54498 bookmarks
Custom sorting
Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA
Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA

Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA

https://kubernetes.io/blog/2024/08/19/kubernetes-1-31-pod-failure-policy-for-jobs-goes-ga/

This post describes Pod failure policy, which graduates to stable in Kubernetes 1.31, and how to use it in your Jobs.

About Pod failure policy

When you run workloads on Kubernetes, Pods might fail for a variety of reasons. Ideally, workloads like Jobs should be able to ignore transient, retriable failures and continue running to completion.

To allow for these transient failures, Kubernetes Jobs include the backoffLimit field, which lets you specify a number of Pod failures that you're willing to tolerate during Job execution. However, if you set a large value for the backoffLimit field and rely solely on this field, you might notice unnecessary increases in operating costs as Pods restart excessively until the backoffLimit is met.

This becomes particularly problematic when running large-scale Jobs with thousands of long-running Pods across thousands of nodes.

The Pod failure policy extends the backoff limit mechanism to help you reduce costs in the following ways:

Gives you control to fail the Job as soon as a non-retriable Pod failure occurs.

Allows you to ignore retriable errors without increasing the backoffLimit field.

For example, you can use a Pod failure policy to run your workload on more affordable spot machines by ignoring Pod failures caused by graceful node shutdown.

The policy allows you to distinguish between retriable and non-retriable Pod failures based on container exit codes or Pod conditions in a failed Pod.

How it works

You specify a Pod failure policy in the Job specification, represented as a list of rules.

For each rule you define match requirements based on one of the following properties:

Container exit codes: the onExitCodes property.

Pod conditions: the onPodConditions property.

Additionally, for each rule, you specify one of the following actions to take when a Pod matches the rule:

Ignore: Do not count the failure towards the backoffLimit or backoffLimitPerIndex.

FailJob: Fail the entire Job and terminate all running Pods.

FailIndex: Fail the index corresponding to the failed Pod. This action works with the Backoff limit per index feature.

Count: Count the failure towards the backoffLimit or backoffLimitPerIndex. This is the default behavior.

When Pod failures occur in a running Job, Kubernetes matches the failed Pod status against the list of Pod failure policy rules, in the specified order, and takes the corresponding actions for the first matched rule.

Note that when specifying the Pod failure policy, you must also set the Job's Pod template with restartPolicy: Never. This prevents race conditions between the kubelet and the Job controller when counting Pod failures.

Kubernetes-initiated Pod disruptions

To allow matching Pod failure policy rules against failures caused by disruptions initiated by Kubernetes, this feature introduces the DisruptionTarget Pod condition.

Kubernetes adds this condition to any Pod, regardless of whether it's managed by a Job controller, that fails because of a retriable disruption scenario. The DisruptionTarget condition contains one of the following reasons that corresponds to these disruption scenarios:

PreemptionByKubeScheduler: Preemption by kube-scheduler to accommodate a new Pod that has a higher priority.

DeletionByTaintManager - the Pod is due to be deleted by kube-controller-manager due to a NoExecute taint that the Pod doesn't tolerate.

EvictionByEvictionAPI - the Pod is due to be deleted by an API-initiated eviction.

DeletionByPodGC - the Pod is bound to a node that no longer exists, and is due to be deleted by Pod garbage collection.

TerminationByKubelet - the Pod was terminated by graceful node shutdown, node pressure eviction or preemption for system critical pods.

In all other disruption scenarios, like eviction due to exceeding Pod container limits, Pods don't receive the DisruptionTarget condition because the disruptions were likely caused by the Pod and would reoccur on retry.

Example

The Pod failure policy snippet below demonstrates an example use:

podFailurePolicy: rules:

  • action: Ignore onPodConditions:
  • type: DisruptionTarget
  • action: FailJob onPodConditions:
  • type: ConfigIssue
  • action: FailJob onExitCodes: operator: In values: [ 42 ]

In this example, the Pod failure policy does the following:

Ignores any failed Pods that have the built-in DisruptionTarget condition. These Pods don't count towards Job backoff limits.

Fails the Job if any failed Pods have the custom user-supplied ConfigIssue condition, which was added either by a custom controller or webhook.

Fails the Job if any containers exited with the exit code 42.

Counts all other Pod failures towards the default backoffLimit (or backoffLimitPerIndex if used).

Learn more

For a hands-on guide to using Pod failure policy, see Handling retriable and non-retriable pod failures with Pod failure policy

Read the documentation for Pod failure policy and Backoff limit per index

Read the documentation for Pod disruption conditions

Read the KEP for Pod failure policy

Related work

Based on the concepts introduced by Pod failure policy, the following additional work is in progress:

JobSet integration: Configurable Failure Policy API

Pod failure policy extension to add more granular failure reasons

Support for Pod failure policy via JobSet in Kubeflow Training v2

Proposal: Disrupted Pods should be removed from endpoints

Get involved

This work was sponsored by batch working group in close collaboration with the SIG Apps, and SIG Node, and SIG Scheduling communities.

If you are interested in working on new features in the space we recommend subscribing to our Slack channel and attending the regular community meetings.

Acknowledgments

I would love to thank everyone who was involved in this project over the years - it's been a journey and a joint community effort! The list below is my best-effort attempt to remember and recognize people who made an impact. Thank you!

Aldo Culquicondor for guidance and reviews throughout the process

Jordan Liggitt for KEP and API reviews

David Eads for API reviews

Maciej Szulik for KEP reviews from SIG Apps PoV

Clayton Coleman for guidance and SIG Node reviews

Sergey Kanzhelev for KEP reviews from SIG Node PoV

Dawn Chen for KEP reviews from SIG Node PoV

Daniel Smith for reviews from SIG API machinery PoV

Antoine Pelisse for reviews from SIG API machinery PoV

John Belamaric for PRR reviews

Filip Křepinský for thorough reviews from SIG Apps PoV and bug-fixing

David Porter for thorough reviews from SIG Node PoV

Jensen Lo for early requirements discussions, testing and reporting issues

Daniel Vega-Myhre for advancing JobSet integration and reporting issues

Abdullah Gharaibeh for early design discussions and guidance

Antonio Ojea for test reviews

Yuki Iwai for reviews and aligning implementation of the closely related Job features

Kevin Hannon for reviews and aligning implementation of the closely related Job features

Tim Bannister for docs reviews

Shannon Kularathna for docs reviews

Paola Cortés for docs reviews

via Kubernetes Blog https://kubernetes.io/

August 18, 2024 at 08:00PM

·kubernetes.io·
Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA
.@juliemshort massively reorganized Maxs playroom a couple days ago into more of a LEGO builder space since thats really all Max does in here. Max asked me to come see his battlefield that hed set up. The organizer is new to help keep minifigs sorted as the previous solution was overflowing. I went to grab a droid and had no idea where they were. Max tells me and I immediately forget. So this is what Im doing right now. Labeling drawers after Julie came through and made sure everything was in the right place (it wasnt; hence the labeling). Happy Sunday! #LEGO #organization #legominifigs
.@juliemshort massively reorganized Maxs playroom a couple days ago into more of a LEGO builder space since thats really all Max does in here. Max asked me to come see his battlefield that hed set up. The organizer is new to help keep minifigs sorted as the previous solution was overflowing. I went to grab a droid and had no idea where they were. Max tells me and I immediately forget. So this is what Im doing right now. Labeling drawers after Julie came through and made sure everything was in the right place (it wasnt; hence the labeling). Happy Sunday! #LEGO #organization #legominifigs

.@juliemshort massively reorganized Max’s playroom a couple days ago into more of a LEGO builder space since that’s really all Max does in here.

Max asked me to come see his battlefield that he’d set up. The organizer is new to help keep minifigs sorted as the previous solution was overflowing. I went to grab a droid and had no idea where they were. Max tells me and I immediately forget.

So this is what I’m doing right now. Labeling drawers after Julie came through and made sure everything was in the right place (it wasn’t; hence the labeling). Happy Sunday! #LEGO #organization #legominifigs

August 18, 2024 at 12:43PM

via Instagram https://instagr.am/p/C-0YI-gvPmH/

·instagr.am·
.@juliemshort massively reorganized Maxs playroom a couple days ago into more of a LEGO builder space since thats really all Max does in here. Max asked me to come see his battlefield that hed set up. The organizer is new to help keep minifigs sorted as the previous solution was overflowing. I went to grab a droid and had no idea where they were. Max tells me and I immediately forget. So this is what Im doing right now. Labeling drawers after Julie came through and made sure everything was in the right place (it wasnt; hence the labeling). Happy Sunday! #LEGO #organization #legominifigs
Kubernetes 1.31: MatchLabelKeys in PodAffinity graduates to beta
Kubernetes 1.31: MatchLabelKeys in PodAffinity graduates to beta

Kubernetes 1.31: MatchLabelKeys in PodAffinity graduates to beta

https://kubernetes.io/blog/2024/08/16/matchlabelkeys-podaffinity/

Kubernetes 1.29 introduced new fields MatchLabelKeys and MismatchLabelKeys in PodAffinity and PodAntiAffinity.

In Kubernetes 1.31, this feature moves to beta and the corresponding feature gate (MatchLabelKeysInPodAffinity) gets enabled by default.

MatchLabelKeys - Enhanced scheduling for versatile rolling updates

During a workload's (e.g., Deployment) rolling update, a cluster may have Pods from multiple versions at the same time. However, the scheduler cannot distinguish between old and new versions based on the LabelSelector specified in PodAffinity or PodAntiAffinity. As a result, it will co-locate or disperse Pods regardless of their versions.

This can lead to sub-optimal scheduling outcome, for example:

New version Pods are co-located with old version Pods (PodAffinity), which will eventually be removed after rolling updates.

Old version Pods are distributed across all available topologies, preventing new version Pods from finding nodes due to PodAntiAffinity.

MatchLabelKeys is a set of Pod label keys and addresses this problem. The scheduler looks up the values of these keys from the new Pod's labels and combines them with LabelSelector so that PodAffinity matches Pods that have the same key-value in labels.

By using label pod-template-hash in MatchLabelKeys, you can ensure that only Pods of the same version are evaluated for PodAffinity or PodAntiAffinity.

apiVersion: apps/v1 kind: Deployment metadata: name: application-server ... affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution:

  • labelSelector: matchExpressions:
  • key: app operator: In values:
  • database topologyKey: topology.kubernetes.io/zone matchLabelKeys:
  • pod-template-hash

The above matchLabelKeys will be translated in Pods like:

kind: Pod metadata: name: application-server labels: pod-template-hash: xyz ... affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution:

  • labelSelector: matchExpressions:
  • key: app operator: In values:
  • database
  • key: pod-template-hash # Added from matchLabelKeys; Only Pods from the same replicaset will match this affinity. operator: In values:
  • xyz topologyKey: topology.kubernetes.io/zone matchLabelKeys:
  • pod-template-hash

MismatchLabelKeys - Service isolation

MismatchLabelKeys is a set of Pod label keys, like MatchLabelKeys, which looks up the values of these keys from the new Pod's labels, and merge them with LabelSelector as key notin (value) so that PodAffinity does not match Pods that have the same key-value in labels.

Suppose all Pods for each tenant get tenant label via a controller or a manifest management tool like Helm.

Although the value of tenant label is unknown when composing each workload's manifest, the cluster admin wants to achieve exclusive 1:1 tenant to domain placement for a tenant isolation.

MismatchLabelKeys works for this usecase; By applying the following affinity globally using a mutating webhook, the cluster admin can ensure that the Pods from the same tenant will land on the same domain exclusively, meaning Pods from other tenants won't land on the same domain.

affinity: podAffinity: # ensures the pods of this tenant land on the same node pool requiredDuringSchedulingIgnoredDuringExecution:

  • matchLabelKeys:
  • tenant topologyKey: node-pool podAntiAffinity: # ensures only Pods from this tenant lands on the same node pool requiredDuringSchedulingIgnoredDuringExecution:
  • mismatchLabelKeys:
  • tenant labelSelector: matchExpressions:
  • key: tenant operator: Exists topologyKey: node-pool

The above matchLabelKeys and mismatchLabelKeys will be translated to like:

kind: Pod metadata: name: application-server labels: tenant: service-a spec: affinity: podAffinity: # ensures the pods of this tenant land on the same node pool requiredDuringSchedulingIgnoredDuringExecution:

  • matchLabelKeys:
  • tenant topologyKey: node-pool labelSelector: matchExpressions:
  • key: tenant operator: In values:
  • service-a podAntiAffinity: # ensures only Pods from this tenant lands on the same node pool requiredDuringSchedulingIgnoredDuringExecution:
  • mismatchLabelKeys:
  • tenant labelSelector: matchExpressions:
  • key: tenant operator: Exists
  • key: tenant operator: NotIn values:
  • service-a topologyKey: node-pool

Getting involved

These features are managed by Kubernetes SIG Scheduling.

Please join us and share your feedback. We look forward to hearing from you!

How can I learn more?

The official document of PodAffinity

KEP-3633: Introduce MatchLabelKeys and MismatchLabelKeys to PodAffinity and PodAntiAffinity

via Kubernetes Blog https://kubernetes.io/

August 15, 2024 at 08:00PM

·kubernetes.io·
Kubernetes 1.31: MatchLabelKeys in PodAffinity graduates to beta
Kubernetes 1.31: Prevent PersistentVolume Leaks When Deleting out of Order
Kubernetes 1.31: Prevent PersistentVolume Leaks When Deleting out of Order

Kubernetes 1.31: Prevent PersistentVolume Leaks When Deleting out of Order

https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-prevent-persistentvolume-leaks-when-deleting-out-of-order/

PersistentVolume (or PVs for short) are associated with Reclaim Policy. The reclaim policy is used to determine the actions that need to be taken by the storage backend on deletion of the PVC Bound to a PV. When the reclaim policy is Delete, the expectation is that the storage backend releases the storage resource allocated for the PV. In essence, the reclaim policy needs to be honored on PV deletion.

With the recent Kubernetes v1.31 release, a beta feature lets you configure your cluster to behave that way and honor the configured reclaim policy.

How did reclaim work in previous Kubernetes releases?

PersistentVolumeClaim (or PVC for short) is a user's request for storage. A PV and PVC are considered Bound if a newly created PV or a matching PV is found. The PVs themselves are backed by volumes allocated by the storage backend.

Normally, if the volume is to be deleted, then the expectation is to delete the PVC for a bound PV-PVC pair. However, there are no restrictions on deleting a PV before deleting a PVC.

First, I'll demonstrate the behavior for clusters running an older version of Kubernetes.

Retrieve a PVC that is bound to a PV

Retrieve an existing PVC example-vanilla-block-pvc

kubectl get pvc example-vanilla-block-pvc

The following output shows the PVC and its bound PV; the PV is shown under the VOLUME column:

NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE example-vanilla-block-pvc Bound pvc-6791fdd4-5fad-438e-a7fb-16410363e3da 5Gi RWO example-vanilla-block-sc 19s

Delete PV

When I try to delete a bound PV, the kubectl session blocks and the kubectl tool does not return back control to the shell; for example:

kubectl delete pv pvc-6791fdd4-5fad-438e-a7fb-16410363e3da

persistentvolume "pvc-6791fdd4-5fad-438e-a7fb-16410363e3da" deleted ^C

Retrieving the PV

kubectl get pv pvc-6791fdd4-5fad-438e-a7fb-16410363e3da

It can be observed that the PV is in a Terminating state

NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-6791fdd4-5fad-438e-a7fb-16410363e3da 5Gi RWO Delete Terminating default/example-vanilla-block-pvc example-vanilla-block-sc 2m23s

Delete PVC

kubectl delete pvc example-vanilla-block-pvc

The following output is seen if the PVC gets successfully deleted:

persistentvolumeclaim "example-vanilla-block-pvc" deleted

The PV object from the cluster also gets deleted. When attempting to retrieve the PV it will be observed that the PV is no longer found:

kubectl get pv pvc-6791fdd4-5fad-438e-a7fb-16410363e3da

Error from server (NotFound): persistentvolumes "pvc-6791fdd4-5fad-438e-a7fb-16410363e3da" not found

Although the PV is deleted, the underlying storage resource is not deleted and needs to be removed manually.

To sum up, the reclaim policy associated with the PersistentVolume is currently ignored under certain circumstances. For a Bound PV-PVC pair, the ordering of PV-PVC deletion determines whether the PV reclaim policy is honored. The reclaim policy is honored if the PVC is deleted first; however, if the PV is deleted prior to deleting the PVC, then the reclaim policy is not exercised. As a result of this behavior, the associated storage asset in the external infrastructure is not removed.

PV reclaim policy with Kubernetes v1.31

The new behavior ensures that the underlying storage object is deleted from the backend when users attempt to delete a PV manually.

How to enable new behavior?

To take advantage of the new behavior, you must have upgraded your cluster to the v1.31 release of Kubernetes and run the CSI external-provisioner version 5.0.1 or later.

How does it work?

For CSI volumes, the new behavior is achieved by adding a finalizer external-provisioner.volume.kubernetes.io/finalizer on new and existing PVs. The finalizer is only removed after the storage from the backend is deleted. `

An example of a PV with the finalizer, notice the new finalizer in the finalizers list

kubectl get pv pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 -o yaml

apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com creationTimestamp: "2021-11-17T19:28:56Z" finalizers:

  • kubernetes.io/pv-protection
  • external-provisioner.volume.kubernetes.io/finalizer name: pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 resourceVersion: "194711" uid: 087f14f2-4157-4e95-8a70-8294b039d30e spec: accessModes:
  • ReadWriteOnce capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: example-vanilla-block-pvc namespace: default resourceVersion: "194677" uid: a7b7e3ba-f837-45ba-b243-dec7d8aaed53 csi: driver: csi.vsphere.vmware.com fsType: ext4 volumeAttributes: storage.kubernetes.io/csiProvisionerIdentity: 1637110610497-8081-csi.vsphere.vmware.com type: vSphere CNS Block Volume volumeHandle: 2dacf297-803f-4ccc-afc7-3d3c3f02051e persistentVolumeReclaimPolicy: Delete storageClassName: example-vanilla-block-sc volumeMode: Filesystem status: phase: Bound

The finalizer prevents this PersistentVolume from being removed from the cluster. As stated previously, the finalizer is only removed from the PV object after it is successfully deleted from the storage backend. To learn more about finalizers, please refer to Using Finalizers to Control Deletion.

Similarly, the finalizer kubernetes.io/pv-controller is added to dynamically provisioned in-tree plugin volumes.

What about CSI migrated volumes?

The fix applies to CSI migrated volumes as well.

Some caveats

The fix does not apply to statically provisioned in-tree plugin volumes.

References

KEP-2644

Volume leak issue

How do I get involved?

The Kubernetes Slack channel SIG Storage communication channels are great mediums to reach out to the SIG Storage and migration working group teams.

Special thanks to the following people for the insightful reviews, thorough consideration and valuable contribution:

Fan Baofa (carlory)

Jan Šafránek (jsafrane)

Xing Yang (xing-yang)

Matthew Wong (wongma7)

Join the Kubernetes Storage Special Interest Group (SIG) if you're interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system. We’re rapidly growing and always welcome new contributors.

via Kubernetes Blog https://kubernetes.io/

August 15, 2024 at 08:00PM

·kubernetes.io·
Kubernetes 1.31: Prevent PersistentVolume Leaks When Deleting out of Order
Kubernetes 1.31: Read Only Volumes Based On OCI Artifacts (alpha)
Kubernetes 1.31: Read Only Volumes Based On OCI Artifacts (alpha)

Kubernetes 1.31: Read Only Volumes Based On OCI Artifacts (alpha)

https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/

The Kubernetes community is moving towards fulfilling more Artificial Intelligence (AI) and Machine Learning (ML) use cases in the future. While the project has been designed to fulfill microservice architectures in the past, it’s now time to listen to the end users and introduce features which have a stronger focus on AI/ML.

One of these requirements is to support Open Container Initiative (OCI) compatible images and artifacts (referred as OCI objects) directly as a native volume source. This allows users to focus on OCI standards as well as enables them to store and distribute any content using OCI registries. A feature like this gives the Kubernetes project a chance to grow into use cases which go beyond running particular images.

Given that, the Kubernetes community is proud to present a new alpha feature introduced in v1.31: The Image Volume Source (KEP-4639). This feature allows users to specify an image reference as volume in a pod while reusing it as volume mount within containers:

… kind: Pod spec: containers:

  • … volumeMounts:
  • name: my-volume mountPath: /path/to/directory volumes:
  • name: my-volume image: reference: my-image:tag

The above example would result in mounting my-image:tag to /path/to/directory in the pod’s container.

Use cases

The goal of this enhancement is to stick as close as possible to the existing container image implementation within the kubelet, while introducing a new API surface to allow more extended use cases.

For example, users could share a configuration file among multiple containers in a pod without including the file in the main image, so that they can minimize security risks and the overall image size. They can also package and distribute binary artifacts using OCI images and mount them directly into Kubernetes pods, so that they can streamline their CI/CD pipeline as an example.

Data scientists, MLOps engineers, or AI developers, can mount large language model weights or machine learning model weights in a pod alongside a model-server, so that they can efficiently serve them without including them in the model-server container image. They can package these in an OCI object to take advantage of OCI distribution and ensure efficient model deployment. This allows them to separate the model specifications/content from the executables that process them.

Another use case is that security engineers can use a public image for a malware scanner and mount in a volume of private (commercial) malware signatures, so that they can load those signatures without baking their own combined image (which might not be allowed by the copyright on the public image). Those files work regardless of the OS or version of the scanner software.

But in the long term it will be up to you as an end user of this project to outline further important use cases for the new feature. SIG Node is happy to retrieve any feedback or suggestions for further enhancements to allow more advanced usage scenarios. Feel free to provide feedback by either using the Kubernetes Slack (#sig-node) channel or the SIG Node mailinglist.

Detailed example

The Kubernetes alpha feature gate ImageVolume needs to be enabled on the API Server as well as the kubelet to make it functional. If that’s the case and the container runtime has support for the feature (like CRI-O ≥ v1.31), then an example pod.yaml like this can be created:

apiVersion: v1 kind: Pod metadata: name: pod spec: containers:

  • name: test image: registry.k8s.io/e2e-test-images/echoserver:2.3 volumeMounts:
  • name: volume mountPath: /volume volumes:
  • name: volume image: reference: quay.io/crio/artifact:v1 pullPolicy: IfNotPresent

The pod declares a new volume using the image.reference of quay.io/crio/artifact:v1, which refers to an OCI object containing two files. The pullPolicy behaves in the same way as for container images and allows the following values:

Always: the kubelet always attempts to pull the reference and the container creation will fail if the pull fails.

Never: the kubelet never pulls the reference and only uses a local image or artifact. The container creation will fail if the reference isn’t present.

IfNotPresent: the kubelet pulls if the reference isn’t already present on disk. The container creation will fail if the reference isn’t present and the pull fails.

The volumeMounts field is indicating that the container with the name test should mount the volume under the path /volume.

If you now create the pod:

kubectl apply -f pod.yaml

And exec into it:

kubectl exec -it pod -- sh

Then you’re able to investigate what has been mounted:

/ # ls /volume dir file / # cat /volume/file 2 / # ls /volume/dir file / # cat /volume/dir/file 1

You managed to consume an OCI artifact using Kubernetes!

The container runtime pulls the image (or artifact), mounts it to the container and makes it finally available for direct usage. There are a bunch of details in the implementation, which closely align to the existing image pull behavior of the kubelet. For example:

If a :latest tag as reference is provided, then the pullPolicy will default to Always, while in any other case it will default to IfNotPresent if unset.

The volume gets re-resolved if the pod gets deleted and recreated, which means that new remote content will become available on pod recreation. A failure to resolve or pull the image during pod startup will block containers from starting and may add significant latency. Failures will be retried using normal volume backoff and will be reported on the pod reason and message.

Pull secrets will be assembled in the same way as for the container image by looking up node credentials, service account image pull secrets, and pod spec image pull secrets.

The OCI object gets mounted in a single directory by merging the manifest layers in the same way as for container images.

The volume is mounted as read-only (ro) and non-executable files (noexec).

Sub-path mounts for containers are not supported (spec.containers[*].volumeMounts.subpath).

The field spec.securityContext.fsGroupChangePolicy has no effect on this volume type.

The feature will also work with the AlwaysPullImages admission plugin if enabled.

Thank you for reading through the end of this blog post! SIG Node is proud and happy to deliver this feature as part of Kubernetes v1.31.

As writer of this blog post, I would like to emphasize my special thanks to all involved individuals out there! You all rock, let’s keep on hacking!

Further reading

Use an Image Volume With a Pod

image volume overview

via Kubernetes Blog https://kubernetes.io/

August 15, 2024 at 08:00PM

·kubernetes.io·
Kubernetes 1.31: Read Only Volumes Based On OCI Artifacts (alpha)
Evolving our self-hosted offering and license model
Evolving our self-hosted offering and license model

Evolving our self-hosted offering and license model

Contact us Sign in Evolving our self-hosted offering and license model What you need to know about the upcoming changes to CockroachDB Enterprise arriving this…

August 15, 2024 at 10:13AM

via Instapaper

·cockroachlabs.com·
Evolving our self-hosted offering and license model
Kubernetes 1.31: VolumeAttributesClass for Volume Modification Beta
Kubernetes 1.31: VolumeAttributesClass for Volume Modification Beta

Kubernetes 1.31: VolumeAttributesClass for Volume Modification Beta

https://kubernetes.io/blog/2024/08/15/kubernetes-1-31-volume-attributes-class/

Volumes in Kubernetes have been described by two attributes: their storage class, and their capacity. The storage class is an immutable property of the volume, while the capacity can be changed dynamically with volume resize.

This complicates vertical scaling of workloads with volumes. While cloud providers and storage vendors often offer volumes which allow specifying IO quality of service (Performance) parameters like IOPS or throughput and tuning them as workloads operate, Kubernetes has no API which allows changing them.

We are pleased to announce that the VolumeAttributesClass KEP, alpha since Kubernetes 1.29, will be beta in 1.31. This provides a generic, Kubernetes-native API for modifying volume parameters like provisioned IO.

Like all new volume features in Kubernetes, this API is implemented via the container storage interface (CSI). In addition to the VolumeAttributesClass feature gate, your provisioner-specific CSI driver must support the new ModifyVolume API which is the CSI side of this feature.

See the full documentation for all details. Here we show the common workflow.

Dynamically modifying volume attributes.

A VolumeAttributesClass is a cluster-scoped resource that specifies provisioner-specific attributes. These are created by the cluster administrator in the same way as storage classes. For example, a series of gold, silver and bronze volume attribute classes can be created for volumes with greater or lessor amounts of provisioned IO.

apiVersion: storage.k8s.io/v1alpha1 kind: VolumeAttributesClass metadata: name: silver driverName: your-csi-driver parameters: provisioned-iops: "500" provisioned-throughput: "50MiB/s" --- apiVersion: storage.k8s.io/v1alpha1 kind: VolumeAttributesClass metadata: name: gold driverName: your-csi-driver parameters: provisioned-iops: "10000" provisioned-throughput: "500MiB/s"

An attribute class is added to a PVC in much the same way as a storage class.

apiVersion: v1 kind: PersistentVolumeClaim metadata: name: test-pv-claim spec: storageClassName: any-storage-class volumeAttributesClassName: silver accessModes:

  • ReadWriteOnce resources: requests: storage: 64Gi

Unlike a storage class, the volume attributes class can be changed:

kubectl patch pvc test-pv-claim -p '{"spec": "volumeAttributesClassName": "gold"}'

Kubernetes will work with the CSI driver to update the attributes of the volume. The status of the PVC will track the current and desired attributes class. The PV resource will also be updated with the new volume attributes class which will be set to the currently active attributes of the PV.

Limitations with the beta

As a beta feature, there are still some features which are planned for GA but not yet present. The largest is quota support, see the KEP and discussion in sig-storage for details.

See the Kubernetes CSI driver list for up-to-date information of support for this feature in CSI drivers.

via Kubernetes Blog https://kubernetes.io/

August 14, 2024 at 08:00PM

·kubernetes.io·
Kubernetes 1.31: VolumeAttributesClass for Volume Modification Beta
Open Model Initiative
Open Model Initiative
Open Model Initiative has 3 repositories available. Follow their code on GitHub.
·github.com·
Open Model Initiative
Announcing Karpenter 1.0 | Amazon Web Services
Announcing Karpenter 1.0 | Amazon Web Services
Introduction In November 2021, AWS announced the launch of v0.5 of Karpenter, “a new open source Kubernetes cluster auto scaling project.” Originally conceived as a flexible, dynamic, and high-performance alternative to the Kubernetes Cluster Autoscaler, in the nearly three years since then Karpenter has evolved substantially into a fully featured, Kubernetes native node lifecycle manager. […]
·aws.amazon.com·
Announcing Karpenter 1.0 | Amazon Web Services
SNMP alerted us when we were supposed to start shutting down servers when our ancient air handlers bit the dust at Pope AFB. I tracked variances to time of day. | Uncertainties and issues in using IPMI temperature data
SNMP alerted us when we were supposed to start shutting down servers when our ancient air handlers bit the dust at Pope AFB. I tracked variances to time of day. | Uncertainties and issues in using IPMI temperature data
Uncertainties and issues in using IPMI temperature data
·utcc.utoronto.ca·
SNMP alerted us when we were supposed to start shutting down servers when our ancient air handlers bit the dust at Pope AFB. I tracked variances to time of day. | Uncertainties and issues in using IPMI temperature data
Kubernetes v1.31: Accelerating Cluster Performance with Consistent Reads from Cache
Kubernetes v1.31: Accelerating Cluster Performance with Consistent Reads from Cache

Kubernetes v1.31: Accelerating Cluster Performance with Consistent Reads from Cache

https://kubernetes.io/blog/2024/08/15/consistent-read-from-cache-beta/

Kubernetes is renowned for its robust orchestration of containerized applications, but as clusters grow, the demands on the control plane can become a bottleneck. A key challenge has been ensuring strongly consistent reads from the etcd datastore, requiring resource-intensive quorum reads.

Today, the Kubernetes community is excited to announce a major improvement: consistent reads from cache, graduating to Beta in Kubernetes v1.31.

Why consistent reads matter

Consistent reads are essential for ensuring that Kubernetes components have an accurate view of the latest cluster state. Guaranteeing consistent reads is crucial for maintaining the accuracy and reliability of Kubernetes operations, enabling components to make informed decisions based on up-to-date information. In large-scale clusters, fetching and processing this data can be a performance bottleneck, especially for requests that involve filtering results. While Kubernetes can filter data by namespace directly within etcd, any other filtering by labels or field selectors requires the entire dataset to be fetched from etcd and then filtered in-memory by the Kubernetes API server. This is particularly impactful for components like the kubelet, which only needs to list pods scheduled to its node - but previously required the API Server and etcd to process all pods in the cluster.

The breakthrough: Caching with confidence

Kubernetes has long used a watch cache to optimize read operations. The watch cache stores a snapshot of the cluster state and receives updates through etcd watches. However, until now, it couldn't serve consistent reads directly, as there was no guarantee the cache was sufficiently up-to-date.

The consistent reads from cache feature addresses this by leveraging etcd's progress notifications mechanism. These notifications inform the watch cache about how current its data is compared to etcd. When a consistent read is requested, the system first checks if the watch cache is up-to-date. If the cache is not up-to-date, the system queries etcd for progress notifications until it's confirmed that the cache is sufficiently fresh. Once ready, the read is efficiently served directly from the cache, which can significantly improve performance, particularly in cases where it would require fetching a lot of data from etcd. This enables requests that filter data to be served from the cache, with only minimal metadata needing to be read from etcd.

Important Note: To benefit from this feature, your Kubernetes cluster must be running etcd version 3.4.31+ or 3.5.13+. For older etcd versions, Kubernetes will automatically fall back to serving consistent reads directly from etcd.

Performance gains you'll notice

This seemingly simple change has a profound impact on Kubernetes performance and scalability:

Reduced etcd Load: Kubernetes v1.31 can offload work from etcd, freeing up resources for other critical operations.

Lower Latency: Serving reads from cache is significantly faster than fetching and processing data from etcd. This translates to quicker responses for components, improving overall cluster responsiveness.

Improved Scalability: Large clusters with thousands of nodes and pods will see the most significant gains, as the reduction in etcd load allows the control plane to handle more requests without sacrificing performance.

5k Node Scalability Test Results: In recent scalability tests on 5,000 node clusters, enabling consistent reads from cache delivered impressive improvements:

30% reduction in kube-apiserver CPU usage

25% reduction in etcd CPU usage

Up to 3x reduction (from 5 seconds to 1.5 seconds) in 99th percentile pod LIST request latency

What's next?

With the graduation to beta, consistent reads from cache are enabled by default, offering a seamless performance boost to all Kubernetes users running a supported etcd version.

Our journey doesn't end here. Kubernetes community is actively exploring pagination support in the watch cache, which will unlock even more performance optimizations in the future.

Getting started

Upgrading to Kubernetes v1.31 and ensuring you are using etcd version 3.4.31+ or 3.5.13+ is the easiest way to experience the benefits of consistent reads from cache. If you have any questions or feedback, don't hesitate to reach out to the Kubernetes community.

Let us know how consistent reads from cache transforms your Kubernetes experience!

Special thanks to @ah8ad3 and @p0lyn0mial for their contributions to this feature!

via Kubernetes Blog https://kubernetes.io/

August 14, 2024 at 08:00PM

·kubernetes.io·
Kubernetes v1.31: Accelerating Cluster Performance with Consistent Reads from Cache
Next they’ll charge a fee to buy things through WebKit browsers | Apple says Patreon must switch to its billing system or risk removal from App Store | TechCrunch
Next they’ll charge a fee to buy things through WebKit browsers | Apple says Patreon must switch to its billing system or risk removal from App Store | TechCrunch
Apple has threatened to remove creator platform Patreon from the App Store if creators use unsupported third-party billing options or disable transactions
·techcrunch.com·
Next they’ll charge a fee to buy things through WebKit browsers | Apple says Patreon must switch to its billing system or risk removal from App Store | TechCrunch
Palo Alto Networks apologizes as sexist marketing misfires
Palo Alto Networks apologizes as sexist marketing misfires

Palo Alto Networks apologizes as sexist marketing misfires

If you attended the Black Hat conference in Vegas last week and found yourself over in Palo Alto Networks' corner of the event, you may have encountered a…

August 14, 2024 at 01:44PM

via Instapaper

·theregister.com·
Palo Alto Networks apologizes as sexist marketing misfires
Last Week in Kubernetes Development - Week Ending August 11 2024
Last Week in Kubernetes Development - Week Ending August 11 2024

Week Ending August 11, 2024

https://lwkd.info/2024/20240814

Developer News

It’s Release Week! Kubernetes 1.31 “Elli” is released, with many new features. In addition to the list of features in the main blog post, note that cgroups v1 is going into maintenance, several things have been removed (most notably Ceph in-tree driver), and the addition of lastTransitionTime for PVs. More 1.31 features below.

Steering Committee nominations are open.

The Kubernetes Contributor Summit is looking for artists to create designs. Registration and CfP is still open.

Release Schedule

Next Deadline: v1.31.0 release day, August 13th

Kubernetes 1.31 was released on August 13.

Patch releases are expected later this week.

Lesser-known 1.31 Features

These features didn’t make the 1.31 release blog, but are interesting to contributors:

4355: Coordinated Leader Elections

This Enhancement makes control plane leader elections function in a way that is compatible with upgrading one control plane component at a time, by keeping everyone on the old APIserver until everything else is upgraded. This should make for a smoother, and more reliable, upgrade experience. Alpha and opt-in only for 1.31.

4368: Job API managed-by mechanism

A small part of the MultiKueue initiative of the Kueue job manager, this enhancement adds tracking for which controller “owns” a job. While potentially useful for any multi-controller environment, the change is intended to make multi-cluster job scheduling possible. Alpha in 1.31.

4176 and 4622: HPC Features

Two features make Kubernetes more useful on bigger, beefier machines. We can spread hyperthreads across physical CPUs, making better use of high-core-count machines. And we can configure topology rules for more than eight NUMA nodes, supporting very high memory systems. 4176 is Alpha and 4622 is Beta in 1.31.

KEP of the Week

KEP 4420: Retry Generate Name

This KEP implements automated retry of generateName create requests when a name conflict occurs. Despite generating over 14 million possible names per prefix with a 5-character random suffix, conflicts are frequent, with a 50% chance after 5,000 names. Currently, a conflict triggers an HTTP 409 response, leaving it to clients to retry, which many fail to do, causing production issues.

This feature became Beta in 1.31.

Subprojects and Dependency Updates

containerd v1.6.35 regenerate UUID if state is empty in introspection service

prometheus v2.54.0 remote-Write: Version 2.0 experimental, plus metadata in WAL via feature flag metadata-wal-records; also v2.53.2

via Last Week in Kubernetes Development https://lwkd.info/

August 14, 2024 at 05:00PM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending August 11 2024
A good read for folks who are suddenly looking for a job or preparing to be | One Week Later: My Journey, Gratitude, and Tips for Standing Out in the Job Market
A good read for folks who are suddenly looking for a job or preparing to be | One Week Later: My Journey, Gratitude, and Tips for Standing Out in the Job Market
It's been a week since I made my initial post on August 2nd announcing my departure from AWS. If you haven't seen it yet, you can find it here: https://www.
·linkedin.com·
A good read for folks who are suddenly looking for a job or preparing to be | One Week Later: My Journey, Gratitude, and Tips for Standing Out in the Job Market
Kubernetes 1.31: Moving cgroup v1 Support into Maintenance Mode
Kubernetes 1.31: Moving cgroup v1 Support into Maintenance Mode

Kubernetes 1.31: Moving cgroup v1 Support into Maintenance Mode

https://kubernetes.io/blog/2024/08/14/kubernetes-1-31-moving-cgroup-v1-support-maintenance-mode/

As Kubernetes continues to evolve and adapt to the changing landscape of container orchestration, the community has decided to move cgroup v1 support into maintenance mode in v1.31. This shift aligns with the broader industry's move towards cgroup v2, offering improved functionalities: including scalability and a more consistent interface. Before we dive into the consequences for Kubernetes, let's take a step back to understand what cgroups are and their significance in Linux.

Understanding cgroups

Control groups, or cgroups, are a Linux kernel feature that allows the allocation, prioritization, denial, and management of system resources (such as CPU, memory, disk I/O, and network bandwidth) among processes. This functionality is crucial for maintaining system performance and ensuring that no single process can monopolize system resources, which is especially important in multi-tenant environments.

There are two versions of cgroups: v1 and v2. While cgroup v1 provided sufficient capabilities for resource management, it had limitations that led to the development of cgroup v2. Cgroup v2 offers a more unified and consistent interface, on top of better resource control features.

Cgroups in Kubernetes

For Linux nodes, Kubernetes relies heavily on cgroups to manage and isolate the resources consumed by containers running in pods. Each container in Kubernetes is placed in its own cgroup, which allows Kubernetes to enforce resource limits, monitor usage, and ensure fair resource distribution among all containers.

How Kubernetes uses cgroups

Resource Allocation

Ensures that containers do not exceed their allocated CPU and memory limits.

Isolation

Isolates containers from each other to prevent resource contention.

Monitoring

Tracks resource usage for each container to provide insights and metrics.

Transitioning to Cgroup v2

The Linux community has been focusing on cgroup v2 for new features and improvements. Major Linux distributions and projects like systemd are transitioning towards cgroup v2. Using cgroup v2 provides several benefits over cgroupv1, such as Unified Hierarchy, Improved Interface, Better Resource Control, cgroup aware OOM killer, rootless support etc.

Given these advantages, Kubernetes is also making the move to embrace cgroup v2 more fully. However, this transition needs to be handled carefully to avoid disrupting existing workloads and to provide a smooth migration path for users.

Moving cgroup v1 support into maintenance mode

What does maintenance mode mean?

When cgroup v1 is placed into maintenance mode in Kubernetes, it means that:

Feature Freeze: No new features will be added to cgroup v1 support.

Security Fixes: Critical security fixes will still be provided.

Best-Effort Bug Fixes: Major bugs may be fixed if feasible, but some issues might remain unresolved.

Why move to maintenance mode?

The move to maintenance mode is driven by the need to stay in line with the broader ecosystem and to encourage the adoption of cgroup v2, which offers better performance, security, and usability. By transitioning cgroup v1 to maintenance mode, Kubernetes can focus on enhancing support for cgroup v2 and ensure it meets the needs of modern workloads. It's important to note that maintenance mode does not mean deprecation; cgroup v1 will continue to receive critical security fixes and major bug fixes as needed.

What this means for cluster administrators

Users currently relying on cgroup v1 are highly encouraged to plan for the transition to cgroup v2. This transition involves:

Upgrading Systems: Ensuring that the underlying operating systems and container runtimes support cgroup v2.

Testing Workloads: Verifying that workloads and applications function correctly with cgroup v2.

Further reading

Linux cgroups

Cgroup v2 in Kubernetes

Kubernetes 1.25: cgroup v2 graduates to GA

via Kubernetes Blog https://kubernetes.io/

August 13, 2024 at 08:00PM

·kubernetes.io·
Kubernetes 1.31: Moving cgroup v1 Support into Maintenance Mode
Kubernetes v1.31: PersistentVolume Last Phase Transition Time Moves to GA
Kubernetes v1.31: PersistentVolume Last Phase Transition Time Moves to GA

Kubernetes v1.31: PersistentVolume Last Phase Transition Time Moves to GA

https://kubernetes.io/blog/2024/08/14/last-phase-transition-time-ga/

Announcing the graduation to General Availability (GA) of the PersistentVolume lastTransitionTime status field, in Kubernetes v1.31!

The Kubernetes SIG Storage team is excited to announce that the "PersistentVolumeLastPhaseTransitionTime" feature, introduced as an alpha in Kubernetes v1.28, has now reached GA status and is officially part of the Kubernetes v1.31 release. This enhancement helps Kubernetes users understand when a PersistentVolume transitions between different phases, allowing for more efficient and informed resource management.

For a v1.31 cluster, you can now assume that every PersistentVolume object has a .status.lastTransitionTime field, that holds a timestamp of when the volume last transitioned its phase. This change is not immediate; the new field will be populated whenever a PersistentVolume is updated and first transitions between phases (Pending, Bound, or Released) after upgrading to Kubernetes v1.31.

What changed?

The API strategy for updating PersistentVolume objects has been modified to populate the .status.lastTransitionTime field with the current timestamp whenever a PersistentVolume transitions phases. Users are allowed to set this field manually if needed, but it will be overwritten when the PersistentVolume transitions phases again.

For more details, read about Phase transition timestamp in the Kubernetes documentation. You can also read the previous blog post announcing the feature as alpha in v1.28.

To provide feedback, join our Kubernetes Storage Special-Interest-Group (SIG) or participate in discussions on our public Slack channel.

via Kubernetes Blog https://kubernetes.io/

August 13, 2024 at 08:00PM

·kubernetes.io·
Kubernetes v1.31: PersistentVolume Last Phase Transition Time Moves to GA
Inside the "3 Billion People" National Public Data Breach
Inside the "3 Billion People" National Public Data Breach
I decided to write this post because there's no concise way to explain the nuances of what's being described as one of the largest data breaches ever. Usually, it's easy to articulate a data breach; a service people provide their information to had someone snag it through an act of
·troyhunt.com·
Inside the "3 Billion People" National Public Data Breach
Kubernetes v1.31: Elli
Kubernetes v1.31: Elli

Kubernetes v1.31: Elli

https://kubernetes.io/blog/2024/08/13/kubernetes-v1-31-release/

Editors: Matteo Bianchi, Yigit Demirbas, Abigail McCarthy, Edith Puclla, Rashan Smith

Announcing the release of Kubernetes v1.31: Elli!

Similar to previous releases, the release of Kubernetes v1.31 introduces new stable, beta, and alpha features. The consistent delivery of high-quality releases underscores the strength of our development cycle and the vibrant support from our community. This release consists of 45 enhancements. Of those enhancements, 11 have graduated to Stable, 22 are entering Beta, and 12 have graduated to Alpha.

Release theme and logo

The Kubernetes v1.31 Release Theme is "Elli".

Kubernetes v1.31's Elli is a cute and joyful dog, with a heart of gold and a nice sailor's cap, as a playful wink to the huge and diverse family of Kubernetes contributors.

Kubernetes v1.31 marks the first release after the project has successfully celebrated its first 10 years. Kubernetes has come a very long way since its inception, and it's still moving towards exciting new directions with each release. After 10 years, it is awe-inspiring to reflect on the effort, dedication, skill, wit and tiring work of the countless Kubernetes contributors who have made this a reality.

And yet, despite the herculean effort needed to run the project, there is no shortage of people who show up, time and again, with enthusiasm, smiles and a sense of pride for contributing and being part of the community. This "spirit" that we see from new and old contributors alike is the sign of a vibrant community, a "joyful" community, if we might call it that.

Kubernetes v1.31's Elli is all about celebrating this wonderful spirit! Here's to the next decade of Kubernetes!

Highlights of features graduating to Stable

This is a selection of some of the improvements that are now stable following the v1.31 release.

AppAprmor support is now stable

Kubernetes support for AppArmor is now GA. Protect your containers using AppArmor by setting the appArmorProfile.type field in the container's securityContext. Note that before Kubernetes v1.30, AppArmor was controlled via annotations; starting in v1.30 it is controlled using fields. It is recommended that you should migrate away from using annotations and start using the appArmorProfile.type field.

To learn more read the AppArmor tutorial. This work was done as a part of KEP #24, by SIG Node.

Improved ingress connectivity reliability for kube-proxy

Kube-proxy improved ingress connectivity reliability is stable in v1.31. One of the common problems with load balancers in Kubernetes is the synchronization between the different components involved to avoid traffic drop. This feature implements a mechanism in kube-proxy for load balancers to do connection draining for terminating Nodes exposed by services of type: LoadBalancer and externalTrafficPolicy: Cluster and establish some best practices for cloud providers and Kubernetes load balancers implementations.

To use this feature, kube-proxy needs to run as default service proxy on the cluster and the load balancer needs to support connection draining. There are no specific changes required for using this feature, it has been enabled by default in kube-proxy since v1.30 and been promoted to stable in v1.31.

For more details about this feature please visit the Virtual IPs and Service Proxies documentation page.

This work was done as part of KEP #3836 by SIG Network.

Persistent Volume last phase transition time

Persistent Volume last phase transition time feature moved to GA in v1.31. This feature adds a PersistentVolumeStatus field which holds a timestamp of when a PersistentVolume last transitioned to a different phase. With this feature enabled, every PersistentVolume object will have a new field .status.lastTransitionTime, that holds a timestamp of when the volume last transitioned its phase. This change is not immediate; the new field will be populated whenever a PersistentVolume is updated and first transitions between phases (Pending, Bound, or Released) after upgrading to Kubernetes v1.31. This allows you to measure time between when a PersistentVolume moves from Pending to Bound. This can be also useful for providing metrics and SLOs.

For more details about this feature please visit the PersistentVolume documentation page.

This work was done as a part of KEP #3762 by SIG Storage.

Highlights of features graduating to Beta

This is a selection of some of the improvements that are now beta following the v1.31 release.

nftables backend for kube-proxy

The nftables backend moves to beta in v1.31, behind the NFTablesProxyMode feature gate which is now enabled by default.

The nftables API is the successor to the iptables API and is designed to provide better performance and scalability than iptables. The nftables proxy mode is able to process changes to service endpoints faster and more efficiently than the iptables mode, and is also able to more efficiently process packets in the kernel (though this only becomes noticeable in clusters with tens of thousands of services).

As of Kubernetes v1.31, the nftables mode is still relatively new, and may not be compatible with all network plugins; consult the documentation for your network plugin. This proxy mode is only available on Linux nodes, and requires kernel 5.13 or later. Before migrating, note that some features, especially around NodePort services, are not implemented exactly the same in nftables mode as they are in iptables mode. Check the migration guide to see if you need to override the default configuration.

This work was done as part of KEP #3866 by SIG Network.

Changes to reclaim policy for PersistentVolumes

The Always Honor PersistentVolume Reclaim Policy feature has advanced to beta in Kubernetes v1.31. This enhancement ensures that the PersistentVolume (PV) reclaim policy is respected even after the associated PersistentVolumeClaim (PVC) is deleted, thereby preventing the leakage of volumes.

Prior to this feature, the reclaim policy linked to a PV could be disregarded under specific conditions, depending on whether the PV or PVC was deleted first. Consequently, the corresponding storage resource in the external infrastructure might not be removed, even if the reclaim policy was set to "Delete". This led to potential inconsistencies and resource leaks.

With the introduction of this feature, Kubernetes now guarantees that the "Delete" reclaim policy will be enforced, ensuring the deletion of the underlying storage object from the backend infrastructure, regardless of the deletion sequence of the PV and PVC.

This work was done as a part of KEP #2644 and by SIG Storage.

Bound service account token improvements

The ServiceAccountTokenNodeBinding feature is promoted to beta in v1.31. This feature allows requesting a token bound only to a node, not to a pod, which includes node information in claims in the token and validates the existence of the node when the token is used. For more information, read the bound service account tokens documentation.

This work was done as part of KEP #4193 by SIG Auth.

Multiple Service CIDRs

Support for clusters with multiple Service CIDRs moves to beta in v1.31 (disabled by default).

There are multiple components in a Kubernetes cluster that consume IP addresses: Nodes, Pods and Services. Nodes and Pods IP ranges can be dynamically changed because depend on the infrastructure or the network plugin respectively. However, Services IP ranges are defined during the cluster creation as a hardcoded flag in the kube-apiserver. IP exhaustion has been a problem for long lived or large clusters, as admins needed to expand, shrink or even replace entirely the assigned Service CIDR range. These operations were never supported natively and were performed via complex and delicate maintenance operations, often causing downtime on their clusters. This new feature allows users and cluster admins to dynamically modify Service CIDR ranges with zero downtime.

For more details about this feature please visit the Virtual IPs and Service Proxies documentation page.

This work was done as part of KEP #1880 by SIG Network.

Traffic distribution for Services

Traffic distribution for Services moves to beta in v1.31 and is enabled by default.

After several iterations on finding the best user experience and traffic engineering capabilities for Services networking, SIG Networking implemented the trafficDistribution field in the Service specification, which serves as a guideline for the underlying implementation to consider while making routing decisions.

For more details about this feature please read the 1.30 Release Blog or visit the Service documentation page.

This work was done as part of KEP #4444 by SIG Network.

Kubernetes VolumeAttributesClass ModifyVolume

VolumeAttributesClass API is moving to beta in v1.31. The VolumeAttributesClass provides a generic, Kubernetes-native API for modifying dynamically volume parameters like provisioned IO. This allows workloads to vertically scale their volumes on-line to balance cost and performance, if supported by their provider. This feature had been alpha since Kubernetes 1.29.

This work was done as a part of KEP #3751 and lead by SIG Storage.

New features in Alpha

This is a selection of some of the improvements that are now alpha following the v1.31 release.

New DRA APIs for better accelerators and other hardware management

Kubernetes v1.31 brings an updated dynamic resource allocation (DRA) API and design. The main focus in the update is on structured parameters because they make resource information and requests transparent to Kubernetes and clients and enable implementing features like cluster autoscaling. DRA support in the kubelet was updated such that version skew between kubelet and the control plane is possible. With structured parameters, the scheduler allocates ResourceClaims while scheduling a pod. Allocati

·kubernetes.io·
Kubernetes v1.31: Elli
Comfy Org
Comfy Org
Creators of ComfyUI. We are a team dedicated to iterate and improve ComfyUI, support the ComfyUI ecosystem with tools like node manager, node registry, cli, automated testing, and public documentation.
·comfy.org·
Comfy Org