
1_r/devopsish
Kubernetes 1.31: Autoconfiguration For Node Cgroup Driver (beta)
https://kubernetes.io/blog/2024/08/21/cri-cgroup-driver-lookup-now-beta/
Historically, configuring the correct cgroup driver has been a pain point for users running new Kubernetes clusters. On Linux systems, there are two different cgroup drivers: cgroupfs and systemd. In the past, both the kubelet and CRI implementation (like CRI-O or containerd) needed to be configured to use the same cgroup driver, or else the kubelet would exit with an error. This was a source of headaches for many cluster admins. However, there is light at the end of the tunnel!
Automated cgroup driver detection
In v1.28.0, the SIG Node community introduced the feature gate KubeletCgroupDriverFromCRI, which instructs the kubelet to ask the CRI implementation which cgroup driver to use. A few minor releases of Kubernetes happened whilst we waited for support to land in the major two CRI implementations (containerd and CRI-O), but as of v1.31.0, this feature is now beta!
In addition to setting the feature gate, a cluster admin needs to ensure their CRI implementation is new enough:
containerd: Support was added in v2.0.0
CRI-O: Support was added in v1.28.0
Then, they should ensure their CRI implementation is configured to the cgroup_driver they would like to use.
Future work
Eventually, support for the kubelet's cgroupDriver configuration field will be dropped, and the kubelet will fail to start if the CRI implementation isn't new enough to have support for this feature.
via Kubernetes Blog https://kubernetes.io/
August 20, 2024 at 08:00PM
Who needs GitHub Copilot when you roll your own
Hands on Code assistants have gained considerable attention as an early use case for generative AI – especially following the launch of Microsoft's GitHub…
August 20, 2024 at 10:43AM
via Instapaper
The Window-Knocking Machine Test · ines.io
AI is making futurists of us all. With the dizzying speed of new innovations, it’s clear that our lives and work are going to change. So what’s next? How will…
August 20, 2024 at 10:38AM
via Instapaper
continuedev/continue: ⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
August 20, 2024 at 09:47AM
via Instapaper
Kubernetes 1.31: Streaming Transitions from SPDY to WebSockets
https://kubernetes.io/blog/2024/08/20/websockets-transition/
In Kubernetes 1.31, by default kubectl now uses the WebSocket protocol instead of SPDY for streaming.
This post describes what these changes mean for you and why these streaming APIs matter.
Streaming APIs in Kubernetes
In Kubernetes, specific endpoints that are exposed as an HTTP or RESTful interface are upgraded to streaming connections, which require a streaming protocol. Unlike HTTP, which is a request-response protocol, a streaming protocol provides a persistent connection that's bi-directional, low-latency, and lets you interact in real-time. Streaming protocols support reading and writing data between your client and the server, in both directions, over the same connection. This type of connection is useful, for example, when you create a shell in a running container from your local workstation and run commands in the container.
Why change the streaming protocol?
Before the v1.31 release, Kubernetes used the SPDY/3.1 protocol by default when upgrading streaming connections. SPDY/3.1 has been deprecated for eight years, and it was never standardized. Many modern proxies, gateways, and load balancers no longer support the protocol. As a result, you might notice that commands like kubectl cp, kubectl attach, kubectl exec, and kubectl port-forward stop working when you try to access your cluster through a proxy or gateway.
As of Kubernetes v1.31, SIG API Machinery has modified the streaming protocol that a Kubernetes client (such as kubectl) uses for these commands to the more modern WebSocket streaming protocol. The WebSocket protocol is a currently supported standardized streaming protocol that guarantees compatibility and interoperability with different components and programming languages. The WebSocket protocol is more widely supported by modern proxies and gateways than SPDY.
How streaming APIs work
Kubernetes upgrades HTTP connections to streaming connections by adding specific upgrade headers to the originating HTTP request. For example, an HTTP upgrade request for running the date command on an nginx container within a cluster is similar to the following:
$ kubectl exec -v=8 nginx -- date GET https://127.0.0.1:43251/api/v1/namespaces/default/pods/nginx/exec?command=date… Request Headers: Connection: Upgrade Upgrade: websocket Sec-Websocket-Protocol: v5.channel.k8s.io User-Agent: kubectl/v1.31.0 (linux/amd64) kubernetes/6911225
If the container runtime supports the WebSocket streaming protocol and at least one of the subprotocol versions (e.g. v5.channel.k8s.io), the server responds with a successful 101 Switching Protocols status, along with the negotiated subprotocol version:
Response Status: 101 Switching Protocols in 3 milliseconds Response Headers: Upgrade: websocket Connection: Upgrade Sec-Websocket-Accept: j0/jHW9RpaUoGsUAv97EcKw8jFM= Sec-Websocket-Protocol: v5.channel.k8s.io
At this point the TCP connection used for the HTTP protocol has changed to a streaming connection. Subsequent STDIN, STDOUT, and STDERR data (as well as terminal resizing data and process exit code data) for this shell interaction is then streamed over this upgraded connection.
How to use the new WebSocket streaming protocol
If your cluster and kubectl are on version 1.29 or later, there are two control plane feature gates and two kubectl environment variables that govern the use of the WebSockets rather than SPDY. In Kubernetes 1.31, all of the following feature gates are in beta and are enabled by default:
Feature gates
TranslateStreamCloseWebsocketRequests
.../exec
.../attach
PortForwardWebsockets
.../port-forward
kubectl feature control environment variables
KUBECTL_REMOTE_COMMAND_WEBSOCKETS
kubectl exec
kubectl cp
kubectl attach
KUBECTL_PORT_FORWARD_WEBSOCKETS
kubectl port-forward
If you're connecting to an older cluster but can manage the feature gate settings, turn on both TranslateStreamCloseWebsocketRequests (added in Kubernetes v1.29) and PortForwardWebsockets (added in Kubernetes v1.30) to try this new behavior. Version 1.31 of kubectl can automatically use the new behavior, but you do need to connect to a cluster where the server-side features are explicitly enabled.
Learn more about streaming APIs
KEP 4006 - Transitioning from SPDY to WebSockets
RFC 6455 - The WebSockets Protocol
Container Runtime Interface streaming explained
via Kubernetes Blog https://kubernetes.io/
August 19, 2024 at 08:00PM
(21) Post | Feed | LinkedIn
1 notification total Repost successful. View repost…
August 19, 2024 at 11:24AM
via Instapaper
The Dark Side of Open Source: Are We All Just Selfish?
Open-source software is often seen as a free-for-all, but the reality is more complex. Many companies invest heavily in open source projects as a go-to-market strategy, paying full-time maintainers to ensure project success. This video explores the motivations behind open source, the role of big companies like Google and AWS, and the impact of license changes by companies like MongoDB and HashiCorp. Discover why no open-source project should be owned by a single company and the benefits of foundation-owned projects like Kubernetes and Linux. Learn how you can contribute to and support the open source ecosystem.
OpenSource #TechIndustry #SoftwareDevelopment #CorporateSponsorship
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ Twitter: https://twitter.com/vfarcic ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=4l_kK90khNA
Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA
https://kubernetes.io/blog/2024/08/19/kubernetes-1-31-pod-failure-policy-for-jobs-goes-ga/
This post describes Pod failure policy, which graduates to stable in Kubernetes 1.31, and how to use it in your Jobs.
About Pod failure policy
When you run workloads on Kubernetes, Pods might fail for a variety of reasons. Ideally, workloads like Jobs should be able to ignore transient, retriable failures and continue running to completion.
To allow for these transient failures, Kubernetes Jobs include the backoffLimit field, which lets you specify a number of Pod failures that you're willing to tolerate during Job execution. However, if you set a large value for the backoffLimit field and rely solely on this field, you might notice unnecessary increases in operating costs as Pods restart excessively until the backoffLimit is met.
This becomes particularly problematic when running large-scale Jobs with thousands of long-running Pods across thousands of nodes.
The Pod failure policy extends the backoff limit mechanism to help you reduce costs in the following ways:
Gives you control to fail the Job as soon as a non-retriable Pod failure occurs.
Allows you to ignore retriable errors without increasing the backoffLimit field.
For example, you can use a Pod failure policy to run your workload on more affordable spot machines by ignoring Pod failures caused by graceful node shutdown.
The policy allows you to distinguish between retriable and non-retriable Pod failures based on container exit codes or Pod conditions in a failed Pod.
How it works
You specify a Pod failure policy in the Job specification, represented as a list of rules.
For each rule you define match requirements based on one of the following properties:
Container exit codes: the onExitCodes property.
Pod conditions: the onPodConditions property.
Additionally, for each rule, you specify one of the following actions to take when a Pod matches the rule:
Ignore: Do not count the failure towards the backoffLimit or backoffLimitPerIndex.
FailJob: Fail the entire Job and terminate all running Pods.
FailIndex: Fail the index corresponding to the failed Pod. This action works with the Backoff limit per index feature.
Count: Count the failure towards the backoffLimit or backoffLimitPerIndex. This is the default behavior.
When Pod failures occur in a running Job, Kubernetes matches the failed Pod status against the list of Pod failure policy rules, in the specified order, and takes the corresponding actions for the first matched rule.
Note that when specifying the Pod failure policy, you must also set the Job's Pod template with restartPolicy: Never. This prevents race conditions between the kubelet and the Job controller when counting Pod failures.
Kubernetes-initiated Pod disruptions
To allow matching Pod failure policy rules against failures caused by disruptions initiated by Kubernetes, this feature introduces the DisruptionTarget Pod condition.
Kubernetes adds this condition to any Pod, regardless of whether it's managed by a Job controller, that fails because of a retriable disruption scenario. The DisruptionTarget condition contains one of the following reasons that corresponds to these disruption scenarios:
PreemptionByKubeScheduler: Preemption by kube-scheduler to accommodate a new Pod that has a higher priority.
DeletionByTaintManager - the Pod is due to be deleted by kube-controller-manager due to a NoExecute taint that the Pod doesn't tolerate.
EvictionByEvictionAPI - the Pod is due to be deleted by an API-initiated eviction.
DeletionByPodGC - the Pod is bound to a node that no longer exists, and is due to be deleted by Pod garbage collection.
TerminationByKubelet - the Pod was terminated by graceful node shutdown, node pressure eviction or preemption for system critical pods.
In all other disruption scenarios, like eviction due to exceeding Pod container limits, Pods don't receive the DisruptionTarget condition because the disruptions were likely caused by the Pod and would reoccur on retry.
Example
The Pod failure policy snippet below demonstrates an example use:
podFailurePolicy: rules:
- action: Ignore onPodConditions:
- type: DisruptionTarget
- action: FailJob onPodConditions:
- type: ConfigIssue
- action: FailJob onExitCodes: operator: In values: [ 42 ]
In this example, the Pod failure policy does the following:
Ignores any failed Pods that have the built-in DisruptionTarget condition. These Pods don't count towards Job backoff limits.
Fails the Job if any failed Pods have the custom user-supplied ConfigIssue condition, which was added either by a custom controller or webhook.
Fails the Job if any containers exited with the exit code 42.
Counts all other Pod failures towards the default backoffLimit (or backoffLimitPerIndex if used).
Learn more
For a hands-on guide to using Pod failure policy, see Handling retriable and non-retriable pod failures with Pod failure policy
Read the documentation for Pod failure policy and Backoff limit per index
Read the documentation for Pod disruption conditions
Read the KEP for Pod failure policy
Related work
Based on the concepts introduced by Pod failure policy, the following additional work is in progress:
JobSet integration: Configurable Failure Policy API
Pod failure policy extension to add more granular failure reasons
Support for Pod failure policy via JobSet in Kubeflow Training v2
Proposal: Disrupted Pods should be removed from endpoints
Get involved
This work was sponsored by batch working group in close collaboration with the SIG Apps, and SIG Node, and SIG Scheduling communities.
If you are interested in working on new features in the space we recommend subscribing to our Slack channel and attending the regular community meetings.
Acknowledgments
I would love to thank everyone who was involved in this project over the years - it's been a journey and a joint community effort! The list below is my best-effort attempt to remember and recognize people who made an impact. Thank you!
Aldo Culquicondor for guidance and reviews throughout the process
Jordan Liggitt for KEP and API reviews
David Eads for API reviews
Maciej Szulik for KEP reviews from SIG Apps PoV
Clayton Coleman for guidance and SIG Node reviews
Sergey Kanzhelev for KEP reviews from SIG Node PoV
Dawn Chen for KEP reviews from SIG Node PoV
Daniel Smith for reviews from SIG API machinery PoV
Antoine Pelisse for reviews from SIG API machinery PoV
John Belamaric for PRR reviews
Filip Křepinský for thorough reviews from SIG Apps PoV and bug-fixing
David Porter for thorough reviews from SIG Node PoV
Jensen Lo for early requirements discussions, testing and reporting issues
Daniel Vega-Myhre for advancing JobSet integration and reporting issues
Abdullah Gharaibeh for early design discussions and guidance
Antonio Ojea for test reviews
Yuki Iwai for reviews and aligning implementation of the closely related Job features
Kevin Hannon for reviews and aligning implementation of the closely related Job features
Tim Bannister for docs reviews
Shannon Kularathna for docs reviews
Paola Cortés for docs reviews
via Kubernetes Blog https://kubernetes.io/
August 18, 2024 at 08:00PM
LEGOs labeled and bins for partial projects allocated. It’s a LEGO playroom now.
August 18, 2024 at 04:56PM
via Instagram https://instagr.am/p/C-01F5yvXDK/
.@juliemshort massively reorganized Max’s playroom a couple days ago into more of a LEGO builder space since that’s really all Max does in here.
Max asked me to come see his battlefield that he’d set up. The organizer is new to help keep minifigs sorted as the previous solution was overflowing. I went to grab a droid and had no idea where they were. Max tells me and I immediately forget.
So this is what I’m doing right now. Labeling drawers after Julie came through and made sure everything was in the right place (it wasn’t; hence the labeling). Happy Sunday! #LEGO #organization #legominifigs
August 18, 2024 at 12:43PM
via Instagram https://instagr.am/p/C-0YI-gvPmH/
CVE-2024-7646
https://github.com/kubernetes/kubernetes/issues/126744
Ingress-nginx Annotation Validation Bypass
via Kubernetes Vulnerability Announcements - CVE Feed https://kubernetes.io/docs/reference/issues-security/official-cve-feed/
August 16, 2024 at 12:10PM
Kubernetes 1.31: MatchLabelKeys in PodAffinity graduates to beta
https://kubernetes.io/blog/2024/08/16/matchlabelkeys-podaffinity/
Kubernetes 1.29 introduced new fields MatchLabelKeys and MismatchLabelKeys in PodAffinity and PodAntiAffinity.
In Kubernetes 1.31, this feature moves to beta and the corresponding feature gate (MatchLabelKeysInPodAffinity) gets enabled by default.
MatchLabelKeys - Enhanced scheduling for versatile rolling updates
During a workload's (e.g., Deployment) rolling update, a cluster may have Pods from multiple versions at the same time. However, the scheduler cannot distinguish between old and new versions based on the LabelSelector specified in PodAffinity or PodAntiAffinity. As a result, it will co-locate or disperse Pods regardless of their versions.
This can lead to sub-optimal scheduling outcome, for example:
New version Pods are co-located with old version Pods (PodAffinity), which will eventually be removed after rolling updates.
Old version Pods are distributed across all available topologies, preventing new version Pods from finding nodes due to PodAntiAffinity.
MatchLabelKeys is a set of Pod label keys and addresses this problem. The scheduler looks up the values of these keys from the new Pod's labels and combines them with LabelSelector so that PodAffinity matches Pods that have the same key-value in labels.
By using label pod-template-hash in MatchLabelKeys, you can ensure that only Pods of the same version are evaluated for PodAffinity or PodAntiAffinity.
apiVersion: apps/v1 kind: Deployment metadata: name: application-server ... affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: matchExpressions:
- key: app operator: In values:
- database topologyKey: topology.kubernetes.io/zone matchLabelKeys:
- pod-template-hash
The above matchLabelKeys will be translated in Pods like:
kind: Pod metadata: name: application-server labels: pod-template-hash: xyz ... affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector: matchExpressions:
- key: app operator: In values:
- database
- key: pod-template-hash # Added from matchLabelKeys; Only Pods from the same replicaset will match this affinity. operator: In values:
- xyz topologyKey: topology.kubernetes.io/zone matchLabelKeys:
- pod-template-hash
MismatchLabelKeys - Service isolation
MismatchLabelKeys is a set of Pod label keys, like MatchLabelKeys, which looks up the values of these keys from the new Pod's labels, and merge them with LabelSelector as key notin (value) so that PodAffinity does not match Pods that have the same key-value in labels.
Suppose all Pods for each tenant get tenant label via a controller or a manifest management tool like Helm.
Although the value of tenant label is unknown when composing each workload's manifest, the cluster admin wants to achieve exclusive 1:1 tenant to domain placement for a tenant isolation.
MismatchLabelKeys works for this usecase; By applying the following affinity globally using a mutating webhook, the cluster admin can ensure that the Pods from the same tenant will land on the same domain exclusively, meaning Pods from other tenants won't land on the same domain.
affinity: podAffinity: # ensures the pods of this tenant land on the same node pool requiredDuringSchedulingIgnoredDuringExecution:
- matchLabelKeys:
- tenant topologyKey: node-pool podAntiAffinity: # ensures only Pods from this tenant lands on the same node pool requiredDuringSchedulingIgnoredDuringExecution:
- mismatchLabelKeys:
- tenant labelSelector: matchExpressions:
- key: tenant operator: Exists topologyKey: node-pool
The above matchLabelKeys and mismatchLabelKeys will be translated to like:
kind: Pod metadata: name: application-server labels: tenant: service-a spec: affinity: podAffinity: # ensures the pods of this tenant land on the same node pool requiredDuringSchedulingIgnoredDuringExecution:
- matchLabelKeys:
- tenant topologyKey: node-pool labelSelector: matchExpressions:
- key: tenant operator: In values:
- service-a podAntiAffinity: # ensures only Pods from this tenant lands on the same node pool requiredDuringSchedulingIgnoredDuringExecution:
- mismatchLabelKeys:
- tenant labelSelector: matchExpressions:
- key: tenant operator: Exists
- key: tenant operator: NotIn values:
- service-a topologyKey: node-pool
Getting involved
These features are managed by Kubernetes SIG Scheduling.
Please join us and share your feedback. We look forward to hearing from you!
How can I learn more?
The official document of PodAffinity
KEP-3633: Introduce MatchLabelKeys and MismatchLabelKeys to PodAffinity and PodAntiAffinity
via Kubernetes Blog https://kubernetes.io/
August 15, 2024 at 08:00PM
Kubernetes 1.31: Prevent PersistentVolume Leaks When Deleting out of Order
PersistentVolume (or PVs for short) are associated with Reclaim Policy. The reclaim policy is used to determine the actions that need to be taken by the storage backend on deletion of the PVC Bound to a PV. When the reclaim policy is Delete, the expectation is that the storage backend releases the storage resource allocated for the PV. In essence, the reclaim policy needs to be honored on PV deletion.
With the recent Kubernetes v1.31 release, a beta feature lets you configure your cluster to behave that way and honor the configured reclaim policy.
How did reclaim work in previous Kubernetes releases?
PersistentVolumeClaim (or PVC for short) is a user's request for storage. A PV and PVC are considered Bound if a newly created PV or a matching PV is found. The PVs themselves are backed by volumes allocated by the storage backend.
Normally, if the volume is to be deleted, then the expectation is to delete the PVC for a bound PV-PVC pair. However, there are no restrictions on deleting a PV before deleting a PVC.
First, I'll demonstrate the behavior for clusters running an older version of Kubernetes.
Retrieve a PVC that is bound to a PV
Retrieve an existing PVC example-vanilla-block-pvc
kubectl get pvc example-vanilla-block-pvc
The following output shows the PVC and its bound PV; the PV is shown under the VOLUME column:
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE example-vanilla-block-pvc Bound pvc-6791fdd4-5fad-438e-a7fb-16410363e3da 5Gi RWO example-vanilla-block-sc 19s
Delete PV
When I try to delete a bound PV, the kubectl session blocks and the kubectl tool does not return back control to the shell; for example:
kubectl delete pv pvc-6791fdd4-5fad-438e-a7fb-16410363e3da
persistentvolume "pvc-6791fdd4-5fad-438e-a7fb-16410363e3da" deleted ^C
Retrieving the PV
kubectl get pv pvc-6791fdd4-5fad-438e-a7fb-16410363e3da
It can be observed that the PV is in a Terminating state
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE pvc-6791fdd4-5fad-438e-a7fb-16410363e3da 5Gi RWO Delete Terminating default/example-vanilla-block-pvc example-vanilla-block-sc 2m23s
Delete PVC
kubectl delete pvc example-vanilla-block-pvc
The following output is seen if the PVC gets successfully deleted:
persistentvolumeclaim "example-vanilla-block-pvc" deleted
The PV object from the cluster also gets deleted. When attempting to retrieve the PV it will be observed that the PV is no longer found:
kubectl get pv pvc-6791fdd4-5fad-438e-a7fb-16410363e3da
Error from server (NotFound): persistentvolumes "pvc-6791fdd4-5fad-438e-a7fb-16410363e3da" not found
Although the PV is deleted, the underlying storage resource is not deleted and needs to be removed manually.
To sum up, the reclaim policy associated with the PersistentVolume is currently ignored under certain circumstances. For a Bound PV-PVC pair, the ordering of PV-PVC deletion determines whether the PV reclaim policy is honored. The reclaim policy is honored if the PVC is deleted first; however, if the PV is deleted prior to deleting the PVC, then the reclaim policy is not exercised. As a result of this behavior, the associated storage asset in the external infrastructure is not removed.
PV reclaim policy with Kubernetes v1.31
The new behavior ensures that the underlying storage object is deleted from the backend when users attempt to delete a PV manually.
How to enable new behavior?
To take advantage of the new behavior, you must have upgraded your cluster to the v1.31 release of Kubernetes and run the CSI external-provisioner version 5.0.1 or later.
How does it work?
For CSI volumes, the new behavior is achieved by adding a finalizer external-provisioner.volume.kubernetes.io/finalizer on new and existing PVs. The finalizer is only removed after the storage from the backend is deleted. `
An example of a PV with the finalizer, notice the new finalizer in the finalizers list
kubectl get pv pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 -o yaml
apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: csi.vsphere.vmware.com creationTimestamp: "2021-11-17T19:28:56Z" finalizers:
- kubernetes.io/pv-protection
- external-provisioner.volume.kubernetes.io/finalizer name: pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 resourceVersion: "194711" uid: 087f14f2-4157-4e95-8a70-8294b039d30e spec: accessModes:
- ReadWriteOnce capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: example-vanilla-block-pvc namespace: default resourceVersion: "194677" uid: a7b7e3ba-f837-45ba-b243-dec7d8aaed53 csi: driver: csi.vsphere.vmware.com fsType: ext4 volumeAttributes: storage.kubernetes.io/csiProvisionerIdentity: 1637110610497-8081-csi.vsphere.vmware.com type: vSphere CNS Block Volume volumeHandle: 2dacf297-803f-4ccc-afc7-3d3c3f02051e persistentVolumeReclaimPolicy: Delete storageClassName: example-vanilla-block-sc volumeMode: Filesystem status: phase: Bound
The finalizer prevents this PersistentVolume from being removed from the cluster. As stated previously, the finalizer is only removed from the PV object after it is successfully deleted from the storage backend. To learn more about finalizers, please refer to Using Finalizers to Control Deletion.
Similarly, the finalizer kubernetes.io/pv-controller is added to dynamically provisioned in-tree plugin volumes.
What about CSI migrated volumes?
The fix applies to CSI migrated volumes as well.
Some caveats
The fix does not apply to statically provisioned in-tree plugin volumes.
References
KEP-2644
Volume leak issue
How do I get involved?
The Kubernetes Slack channel SIG Storage communication channels are great mediums to reach out to the SIG Storage and migration working group teams.
Special thanks to the following people for the insightful reviews, thorough consideration and valuable contribution:
Fan Baofa (carlory)
Jan Šafránek (jsafrane)
Xing Yang (xing-yang)
Matthew Wong (wongma7)
Join the Kubernetes Storage Special Interest Group (SIG) if you're interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system. We’re rapidly growing and always welcome new contributors.
via Kubernetes Blog https://kubernetes.io/
August 15, 2024 at 08:00PM
Kubernetes 1.31: Read Only Volumes Based On OCI Artifacts (alpha)
https://kubernetes.io/blog/2024/08/16/kubernetes-1-31-image-volume-source/
The Kubernetes community is moving towards fulfilling more Artificial Intelligence (AI) and Machine Learning (ML) use cases in the future. While the project has been designed to fulfill microservice architectures in the past, it’s now time to listen to the end users and introduce features which have a stronger focus on AI/ML.
One of these requirements is to support Open Container Initiative (OCI) compatible images and artifacts (referred as OCI objects) directly as a native volume source. This allows users to focus on OCI standards as well as enables them to store and distribute any content using OCI registries. A feature like this gives the Kubernetes project a chance to grow into use cases which go beyond running particular images.
Given that, the Kubernetes community is proud to present a new alpha feature introduced in v1.31: The Image Volume Source (KEP-4639). This feature allows users to specify an image reference as volume in a pod while reusing it as volume mount within containers:
… kind: Pod spec: containers:
- … volumeMounts:
- name: my-volume mountPath: /path/to/directory volumes:
- name: my-volume image: reference: my-image:tag
The above example would result in mounting my-image:tag to /path/to/directory in the pod’s container.
Use cases
The goal of this enhancement is to stick as close as possible to the existing container image implementation within the kubelet, while introducing a new API surface to allow more extended use cases.
For example, users could share a configuration file among multiple containers in a pod without including the file in the main image, so that they can minimize security risks and the overall image size. They can also package and distribute binary artifacts using OCI images and mount them directly into Kubernetes pods, so that they can streamline their CI/CD pipeline as an example.
Data scientists, MLOps engineers, or AI developers, can mount large language model weights or machine learning model weights in a pod alongside a model-server, so that they can efficiently serve them without including them in the model-server container image. They can package these in an OCI object to take advantage of OCI distribution and ensure efficient model deployment. This allows them to separate the model specifications/content from the executables that process them.
Another use case is that security engineers can use a public image for a malware scanner and mount in a volume of private (commercial) malware signatures, so that they can load those signatures without baking their own combined image (which might not be allowed by the copyright on the public image). Those files work regardless of the OS or version of the scanner software.
But in the long term it will be up to you as an end user of this project to outline further important use cases for the new feature. SIG Node is happy to retrieve any feedback or suggestions for further enhancements to allow more advanced usage scenarios. Feel free to provide feedback by either using the Kubernetes Slack (#sig-node) channel or the SIG Node mailinglist.
Detailed example
The Kubernetes alpha feature gate ImageVolume needs to be enabled on the API Server as well as the kubelet to make it functional. If that’s the case and the container runtime has support for the feature (like CRI-O ≥ v1.31), then an example pod.yaml like this can be created:
apiVersion: v1 kind: Pod metadata: name: pod spec: containers:
- name: test image: registry.k8s.io/e2e-test-images/echoserver:2.3 volumeMounts:
- name: volume mountPath: /volume volumes:
- name: volume image: reference: quay.io/crio/artifact:v1 pullPolicy: IfNotPresent
The pod declares a new volume using the image.reference of quay.io/crio/artifact:v1, which refers to an OCI object containing two files. The pullPolicy behaves in the same way as for container images and allows the following values:
Always: the kubelet always attempts to pull the reference and the container creation will fail if the pull fails.
Never: the kubelet never pulls the reference and only uses a local image or artifact. The container creation will fail if the reference isn’t present.
IfNotPresent: the kubelet pulls if the reference isn’t already present on disk. The container creation will fail if the reference isn’t present and the pull fails.
The volumeMounts field is indicating that the container with the name test should mount the volume under the path /volume.
If you now create the pod:
kubectl apply -f pod.yaml
And exec into it:
kubectl exec -it pod -- sh
Then you’re able to investigate what has been mounted:
/ # ls /volume dir file / # cat /volume/file 2 / # ls /volume/dir file / # cat /volume/dir/file 1
You managed to consume an OCI artifact using Kubernetes!
The container runtime pulls the image (or artifact), mounts it to the container and makes it finally available for direct usage. There are a bunch of details in the implementation, which closely align to the existing image pull behavior of the kubelet. For example:
If a :latest tag as reference is provided, then the pullPolicy will default to Always, while in any other case it will default to IfNotPresent if unset.
The volume gets re-resolved if the pod gets deleted and recreated, which means that new remote content will become available on pod recreation. A failure to resolve or pull the image during pod startup will block containers from starting and may add significant latency. Failures will be retried using normal volume backoff and will be reported on the pod reason and message.
Pull secrets will be assembled in the same way as for the container image by looking up node credentials, service account image pull secrets, and pod spec image pull secrets.
The OCI object gets mounted in a single directory by merging the manifest layers in the same way as for container images.
The volume is mounted as read-only (ro) and non-executable files (noexec).
Sub-path mounts for containers are not supported (spec.containers[*].volumeMounts.subpath).
The field spec.securityContext.fsGroupChangePolicy has no effect on this volume type.
The feature will also work with the AlwaysPullImages admission plugin if enabled.
Thank you for reading through the end of this blog post! SIG Node is proud and happy to deliver this feature as part of Kubernetes v1.31.
As writer of this blog post, I would like to emphasize my special thanks to all involved individuals out there! You all rock, let’s keep on hacking!
Further reading
Use an Image Volume With a Pod
image volume overview
via Kubernetes Blog https://kubernetes.io/
August 15, 2024 at 08:00PM
Evolving our self-hosted offering and license model
Contact us Sign in Evolving our self-hosted offering and license model What you need to know about the upcoming changes to CockroachDB Enterprise arriving this…
August 15, 2024 at 10:13AM
via Instapaper