1_r/devopsish

1_r/devopsish

54496 bookmarks
Custom sorting
Kubernetes v1.33: Storage Capacity Scoring of Nodes for Dynamic Provisioning (alpha)
Kubernetes v1.33: Storage Capacity Scoring of Nodes for Dynamic Provisioning (alpha)

Kubernetes v1.33: Storage Capacity Scoring of Nodes for Dynamic Provisioning (alpha)

https://kubernetes.io/blog/2025/04/30/kubernetes-v1-33-storage-capacity-scoring-feature/

Kubernetes v1.33 introduces a new alpha feature called StorageCapacityScoring. This feature adds a scoring method for pod scheduling with the topology-aware volume provisioning. This feature eases to schedule pods on nodes with either the most or least available storage capacity.

About this feature

This feature extends the kube-scheduler's VolumeBinding plugin to perform scoring using node storage capacity information obtained from Storage Capacity. Currently, you can only filter out nodes with insufficient storage capacity. So, you have to use a scheduler extender to achieve storage-capacity-based pod scheduling.

This feature is useful for provisioning node-local PVs, which have size limits based on the node's storage capacity. By using this feature, you can assign the PVs to the nodes with the most available storage space so that you can expand the PVs later as much as possible.

In another use case, you might want to reduce the number of nodes as much as possible for low operation costs in cloud environments by choosing the least storage capacity node. This feature helps maximize resource utilization by filling up nodes more sequentially, starting with the most utilized nodes first that still have enough storage capacity for the requested volume size.

How to use

Enabling the feature

In the alpha phase, StorageCapacityScoring is disabled by default. To use this feature, add StorageCapacityScoring=true to the kube-scheduler command line option --feature-gates.

Configuration changes

You can configure node priorities based on storage utilization using the shape parameter in the VolumeBinding plugin configuration. This allows you to prioritize nodes with higher available storage capacity (default) or, conversely, nodes with lower available storage capacity. For example, to prioritize lower available storage capacity, configure KubeSchedulerConfiguration as follows:

apiVersion: kubescheduler.config.k8s.io/v1 kind: KubeSchedulerConfiguration profiles: ... pluginConfig:

  • name: VolumeBinding args: ... shape:
  • utilization: 0 score: 0
  • utilization: 100 score: 10

For more details, please refer to the documentation.

Further reading

KEP-4049: Storage Capacity Scoring of Nodes for Dynamic Provisioning

Additional note: Relationship with VolumeCapacityPriority

The alpha feature gate VolumeCapacityPriority, which performs node scoring based on available storage capacity during static provisioning, will be deprecated and replaced by StorageCapacityScoring.

Please note that while VolumeCapacityPriority prioritizes nodes with lower available storage capacity by default, StorageCapacityScoring prioritizes nodes with higher available storage capacity by default.

via Kubernetes Blog https://kubernetes.io/

April 30, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: Storage Capacity Scoring of Nodes for Dynamic Provisioning (alpha)
DevOps Toolkit - Ep20 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=nFWGZEI37SA
DevOps Toolkit - Ep20 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=nFWGZEI37SA

Ep20 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, Cloud, Kubernetes, Platform Engineering, containers, or anything else. We'll have special guests Scott Rosenberg and Ramiro Berrelleza to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 Codefresh GitOps Cloud: https://codefresh.io ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=nFWGZEI37SA

·youtube.com·
DevOps Toolkit - Ep20 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=nFWGZEI37SA
Last Week in Kubernetes Development - Week Ending April 27 2025
Last Week in Kubernetes Development - Week Ending April 27 2025

Week Ending April 27, 2025

https://lwkd.info/2025/20250430

Developer News

Benjamin Elder reminded contributors of the changes to the E2E Testing Framework that take effect now. Contributors must use framework.WithFeatureGate(features.YourFeature) for tests related to specific feature gates to ensure proper execution in CI jobs. Tests need to specify both feature gates and cluster configurations.

After 5 long years, SIG-Testing has finally acheived zero hard-coded test skips in pull-kubernetes-e2e-kind and related jobs. This is near parity with pull-kubernetes-e2e-gce (1056 tests vs 1080 test) in approximately half the runtime (~30m vs ~1h).

Applications for Project Lighting talks, Maintainer’s Track and ContribFest at KubeCon NA 2025 are open! Get your submissions in before 7th July.

Please read and comment on an ongoing discussion about AI-generated contributions to Kubernetes. Several repositories have been receiving AI-generated submissions which look acceptable until carefully reviewed. Younger developers may be more reliant on AI and may not realize that such contributions are unacceptable. Community members are discussing whether we need a more restrictive policy than the Linux Foundation’s.

Release Schedule

Next Deadline: 1.34 Release Cycle Begins – soon

We are in the between-release limbo period, so time to work on whatever you want. That irritating bug, the subproject you’ve been meaning to investigate, a birdhouse, whatever. The call for enhancements will come soon enough.

Featured PRs

131491: kubectl describe service: Add Traffic Distribution

This PR shows the Traffic Distribution field, added in Kubernetes 1.31, in kubectl describe service if the field is set. This makes the field much more accessible and useful to users.

130782: Kubeadm issue #3152 ControlPlane node setup failing with “etcdserver: can only promote a learner member”

This PR fixes a bug where in kubeadm ControlPlane node setup fails with the error “etcdserver: can only promote a learner member”; This PR adds a check to ensure that promotion does not retry if the member is already promoted and introduces a call to remove the learner member if the promotion fails entirely.

KEP of the Week

KEP 1769: Speed up recursive SELinux label change

This KEP speeds up volume mounts on SELinux-enforcing systems by using the -o context=XYZ mount option instead of slow recursive relabeling. It has rolled out in three phases: starting with ReadWriteOncePod volumes (v1.28), then adding metrics and an opt-out (v1.32), and finally applying to all volumes by default in 1.33.

Other Merges

Fix for OIDC discovery document publishing when external service account token signing is enabled

hack/update-codegen.sh now automatically ensures goimports and protoc

Deprecated scheduler cache metrics removed

Recovery feature’s status in kubelet now checks for newer resize fields

Fix for the invalid SucceededCriteriaMet condition type in the Job API

Watch handler tests moved to handlers package

Fix for error handling and CSI JSON file removal interaction

Pod resize e2e utilities moved out of e2e/framework

Fix for a possible deadlock in the watch client

Long directory names with e2e pod logs shortened

endpoint-controller and workload-leader-election FlowSchemas removed from the default APF configuration

Fix for the allocatedResourceStatuses Field name mismatch in PVC status validation

scheduler-perf adds option to enable api-server initialization

Kubelet to use the node informer to get the node addresses directly

Fix for a bug in Job controller which could result in creating unnecessary Pods for a finished Job

kube-controller-manager events to support contextual logging

Fix for a bug where NodeResizeError condition was in PVC status when the CSI driver does not support node volume expansion

kubeadm refactoring to reduce code repetition using slice package

Version Updates

google/cel-go to v0.25.0

cri-tools to v1.33.0

mockery to v2.53.3

coredns to v.1.12.1

Shoutouts

Ryota: Now that Kubernetes v1.33 is officially out, the Release Team Subteam Leads — , rayandas(Docs), Wendy Ha (Release Signal), Dipesh (Enhancements), and Ryota (Comms) — want to send a huge shoutout to our amazing Release Lead Nina Polshakova

via Last Week in Kubernetes Development https://lwkd.info/

April 30, 2025 at 05:00PM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending April 27 2025
CLOTributor
CLOTributor
CLOTributor makes it easier to discover great opportunities to become a Cloud Native contributor.
·clotributor.dev·
CLOTributor
Kubernetes v1.33: Image Volumes graduate to beta!
Kubernetes v1.33: Image Volumes graduate to beta!

Kubernetes v1.33: Image Volumes graduate to beta!

https://kubernetes.io/blog/2025/04/29/kubernetes-v1-33-image-volume-beta/

Image Volumes were introduced as an Alpha feature with the Kubernetes v1.31 release as part of KEP-4639. In Kubernetes v1.33, this feature graduates to beta.

Please note that the feature is still disabled by default, because not all container runtimes have full support for it. CRI-O supports the initial feature since version v1.31 and will add support for Image Volumes as beta in v1.33. containerd merged support for the alpha feature which will be part of the v2.1.0 release and is working on beta support as part of PR #11578.

What's new

The major change for the beta graduation of Image Volumes is the support for subPath and subPathExpr mounts for containers via spec.containers[*].volumeMounts.[subPath,subPathExpr]. This allows end-users to mount a certain subdirectory of an image volume, which is still mounted as readonly (noexec). This means that non-existing subdirectories cannot be mounted by default. As for other subPath and subPathExpr values, Kubernetes will ensure that there are no absolute path or relative path components part of the specified sub path. Container runtimes are also required to double check those requirements for safety reasons. If a specified subdirectory does not exist within a volume, then runtimes should fail on container creation and provide user feedback by using existing kubelet events.

Besides that, there are also three new kubelet metrics available for image volumes:

kubelet_image_volume_requested_total: Outlines the number of requested image volumes.

kubelet_image_volume_mounted_succeed_total: Counts the number of successful image volume mounts.

kubelet_image_volume_mounted_errors_total: Accounts the number of failed image volume mounts.

To use an existing subdirectory for a specific image volume, just use it as subPath (or subPathExpr) value of the containers volumeMounts:

apiVersion: v1 kind: Pod metadata: name: image-volume spec: containers:

  • name: shell command: ["sleep", "infinity"] image: debian volumeMounts:
  • name: volume mountPath: /volume subPath: dir volumes:
  • name: volume image: reference: quay.io/crio/artifact:v2 pullPolicy: IfNotPresent

Then, create the pod on your cluster:

kubectl apply -f image-volumes-subpath.yaml

Now you can attach to the container:

kubectl attach -it image-volume bash

And check the content of the file from the dir sub path in the volume:

cat /volume/file

The output will be similar to:

1

Thank you for reading through the end of this blog post! SIG Node is proud and happy to deliver this feature graduation as part of Kubernetes v1.33.

As writer of this blog post, I would like to emphasize my special thanks to all involved individuals out there!

If you would like to provide feedback or suggestions feel free to reach out to SIG Node using the Kubernetes Slack (#sig-node) channel or the SIG Node mailing list.

Further reading

Use an Image Volume With a Pod

image volume overview

via Kubernetes Blog https://kubernetes.io/

April 29, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: Image Volumes graduate to beta!
Rocky Linux Achieves FIPS 140-3 Compliance
Rocky Linux Achieves FIPS 140-3 Compliance
Rocky Linux has taken a major leap forward by achieving FIPS 140-3 compliance for versions 8 and 9.2. This achievement makes the already popular...
·linuxsecurity.com·
Rocky Linux Achieves FIPS 140-3 Compliance
The State of Open Source in 2025
The State of Open Source in 2025
: The good news: everyone's using it. The bad news: have you seen how they're using it?
·theregister.com·
The State of Open Source in 2025
From Fragile to Faultless: Kubernetes Self-Healing In Practice with Grzegorz Głąb
From Fragile to Faultless: Kubernetes Self-Healing In Practice with Grzegorz Głąb

From Fragile to Faultless: Kubernetes Self-Healing In Practice, with Grzegorz Głąb

https://ku.bz/yg_fkP0LN

Discover how to build resilient Kubernetes environments at scale with practical automation strategies from an engineer who's tackled complex production challenges.

Grzegorz Głąb, Kubernetes Engineer at Cloud Kitchens, shares his team's journey developing a comprehensive self-healing framework. He explains how they addressed issues ranging from spot node preemptions to network packet drops caused by unbalanced IRQs, providing concrete examples of automation that prevents downtime and improves reliability.

You will learn:

How managed Kubernetes services like AKS provide benefits but require customization for specific use cases

The architecture of an effective self-healing framework using DaemonSets and deployments with Kubernetes-native components

Practical solutions for common challenges like StatefulSet pods stuck on unreachable nodes and cleaning up orphaned pods

Techniques for workload-level automation, including throttling CPU-hungry pods and automating diagnostic data collection

Sponsor

This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/yg_fkP0LN

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

April 29, 2025 at 06:00AM

·kube.fm·
From Fragile to Faultless: Kubernetes Self-Healing In Practice with Grzegorz Głąb
Only Google Can Run Chrome, Company’s Browser Chief Tells Judge
Only Google Can Run Chrome, Company’s Browser Chief Tells Judge
Google is the only company that can offer the level of features and functionality that its popular Chrome web browser has today, given its “interdependencies” on other parts of the Alphabet Inc. unit, the head of Chrome testified.
·bloomberg.com·
Only Google Can Run Chrome, Company’s Browser Chief Tells Judge
Kubernetes v1.33: HorizontalPodAutoscaler Configurable Tolerance
Kubernetes v1.33: HorizontalPodAutoscaler Configurable Tolerance

Kubernetes v1.33: HorizontalPodAutoscaler Configurable Tolerance

https://kubernetes.io/blog/2025/04/28/kubernetes-v1-33-hpa-configurable-tolerance/

This post describes configurable tolerance for horizontal Pod autoscaling, a new alpha feature first available in Kubernetes 1.33.

What is it?

Horizontal Pod Autoscaling is a well-known Kubernetes feature that allows your workload to automatically resize by adding or removing replicas based on resource utilization.

Let's say you have a web application running in a Kubernetes cluster with 50 replicas. You configure the Horizontal Pod Autoscaler (HPA) to scale based on CPU utilization, with a target of 75% utilization. Now, imagine that the current CPU utilization across all replicas is 90%, which is higher than the desired 75%. The HPA will calculate the required number of replicas using the formula:

$$desiredReplicas = ceil\left\lceil currentReplicas \times \frac{currentMetricValue}{desiredMetricValue} \right\rceil$$

In this example:

$$50 \times (90/75) = 60$$

So, the HPA will increase the number of replicas from 50 to 60 to reduce the load on each pod. Similarly, if the CPU utilization were to drop below 75%, the HPA would scale down the number of replicas accordingly. The Kubernetes documentation provides a detailed description of the scaling algorithm.

In order to avoid replicas being created or deleted whenever a small metric fluctuation occurs, Kubernetes applies a form of hysteresis: it only changes the number of replicas when the current and desired metric values differ by more than 10%. In the example above, since the ratio between the current and desired metric values is (90/75), or 20% above target, exceeding the 10% tolerance, the scale-up action will proceed.

This default tolerance of 10% is cluster-wide; in older Kubernetes releases, it could not be fine-tuned. It's a suitable value for most usage, but too coarse for large deployments, where a 10% tolerance represents tens of pods. As a result, the community has long asked to be able to tune this value.

In Kubernetes v1.33, this is now possible.

How do I use it?

After enabling the HPAConfigurableTolerance feature gate in your Kubernetes v1.33 cluster, you can add your desired tolerance for your HorizontalPodAutoscaler object.

Tolerances appear under the spec.behavior.scaleDown and spec.behavior.scaleUp fields and can thus be different for scale up and scale down. A typical usage would be to specify a small tolerance on scale up (to react quickly to spikes), but higher on scale down (to avoid adding and removing replicas too quickly in response to small metric fluctuations).

For example, an HPA with a tolerance of 5% on scale-down, and no tolerance on scale-up, would look like the following:

apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: my-app spec: ... behavior: scaleDown: tolerance: 0.05 scaleUp: tolerance: 0

I want all the details!

Get all the technical details by reading KEP-4951 and follow issue 4951 to be notified of the feature graduation.

via Kubernetes Blog https://kubernetes.io/

April 28, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: HorizontalPodAutoscaler Configurable Tolerance
DevOps Toolkit - Kro vs Helm: Is It Time to Ditch Helm Charts? - https://www.youtube.com/watch?v=V4N1dHzHmXI
DevOps Toolkit - Kro vs Helm: Is It Time to Ditch Helm Charts? - https://www.youtube.com/watch?v=V4N1dHzHmXI

Kro vs Helm: Is It Time to Ditch Helm Charts?

Dive deep into the fascinating debate of Helm vs. kro in Kubernetes! Discover the major differences between these tools, how they compare, and their unique advantages. From Helm's templating engine to kro's Custom Resource Definitions, this video will guide you through deploying applications and managing resources effectively. Learn about the evolution of Kubernetes, the role of CRDs and controllers, and why custom solutions might be the future.

Helm #Kro #Kubernetes

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/kro-vs-helm-is-it-time-to-ditch-helm-charts 🔗 kro: https://kro.run 🔗 UpCloud: https://upcloud.com 🎬 Is This the End of Crossplane? Compose Kubernetes Resources with kro: https://youtu.be/8zQtpcxmdhs

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Helm vs. Kro 03:14 Lower vs. Higher Level Controllers 15:59 Kubernetes Custom Resources 26:19 Create Your Own CRDs and Controllers 30:48 Wrap-Up

via YouTube https://www.youtube.com/watch?v=V4N1dHzHmXI

·youtube.com·
DevOps Toolkit - Kro vs Helm: Is It Time to Ditch Helm Charts? - https://www.youtube.com/watch?v=V4N1dHzHmXI
Claude Code Best Practices \ Anthropic
Claude Code Best Practices \ Anthropic
A blog post covering tips and tricks that have proven effective for using Claude Code across various codebases, languages, and environments.
·anthropic.com·
Claude Code Best Practices \ Anthropic
Kubernetes v1.33: User Namespaces enabled by default!
Kubernetes v1.33: User Namespaces enabled by default!

Kubernetes v1.33: User Namespaces enabled by default!

https://kubernetes.io/blog/2025/04/25/userns-enabled-by-default/

In Kubernetes v1.33 support for user namespaces is enabled by default. This means that, when the stack requirements are met, pods can opt-in to use user namespaces. To use the feature there is no need to enable any Kubernetes feature flag anymore!

In this blog post we answer some common questions about user namespaces. But, before we dive into that, let's recap what user namespaces are and why they are important.

What is a user namespace?

Note: Linux user namespaces are a different concept from Kubernetes namespaces. The former is a Linux kernel feature; the latter is a Kubernetes feature.

Linux provides different namespaces to isolate processes from each other. For example, a typical Kubernetes pod runs within a network namespace to isolate the network identity and a PID namespace to isolate the processes.

One Linux namespace that was left behind is the user namespace. It isolates the UIDs and GIDs of the containers from the ones on the host. The identifiers in a container can be mapped to identifiers on the host in a way where host and container(s) never end up in overlapping UID/GIDs. Furthermore, the identifiers can be mapped to unprivileged, non-overlapping UIDs and GIDs on the host. This brings three key benefits:

Prevention of lateral movement: As the UIDs and GIDs for different containers are mapped to different UIDs and GIDs on the host, containers have a harder time attacking each other, even if they escape the container boundaries. For example, suppose container A runs with different UIDs and GIDs on the host than container B. In that case, the operations it can do on container B's files and processes are limited: only read/write what a file allows to others, as it will never have permission owner or group permission (the UIDs/GIDs on the host are guaranteed to be different for different containers).

Increased host isolation: As the UIDs and GIDs are mapped to unprivileged users on the host, if a container escapes the container boundaries, even if it runs as root inside the container, it has no privileges on the host. This greatly protects what host files it can read/write, which process it can send signals to, etc. Furthermore, capabilities granted are only valid inside the user namespace and not on the host, limiting the impact a container escape can have.

Enablement of new use cases: User namespaces allow containers to gain certain capabilities inside their own user namespace without affecting the host. This unlocks new possibilities, such as running applications that require privileged operations without granting full root access on the host. This is particularly useful for running nested containers.

User namespace IDs allocation

If a pod running as the root user without a user namespace manages to breakout, it has root privileges on the node. If some capabilities were granted to the container, the capabilities are valid on the host too. None of this is true when using user namespaces (modulo bugs, of course 🙂).

Demos

Rodrigo created demos to understand how some CVEs are mitigated when user namespaces are used. We showed them here before (see here and here), but take a look if you haven't:

Mitigation of CVE 2024-21626 with user namespaces:

Mitigation of CVE 2022-0492 with user namespaces:

Everything you wanted to know about user namespaces in Kubernetes

Here we try to answer some of the questions we have been asked about user namespaces support in Kubernetes.

  1. What are the requirements to use it?

The requirements are documented here. But we will elaborate a bit more, in the following questions.

Note this is a Linux-only feature.

  1. How do I configure a pod to opt-in?

A complete step-by-step guide is available here. But the short version is you need to set the hostUsers: false field in the pod spec. For example like this:

apiVersion: v1 kind: Pod metadata: name: userns spec: hostUsers: false containers:

  • name: shell command: ["sleep", "infinity"] image: debian

Yes, it is that simple. Applications will run just fine, without any other changes needed (unless your application needs the privileges).

User namespaces allows you to run as root inside the container, but not have privileges in the host. However, if your application needs the privileges on the host, for example an app that needs to load a kernel module, then you can't use user namespaces.

  1. What are idmap mounts and why the file-systems used need to support it?

Idmap mounts are a Linux kernel feature that uses a mapping of UIDs/GIDs when accessing a mount. When combined with user namespaces, it greatly simplifies the support for volumes, as you can forget about the host UIDs/GIDs the user namespace is using.

In particular, thanks to idmap mounts we can:

Run each pod with different UIDs/GIDs on the host. This is key for the lateral movement prevention we mentioned earlier.

Share volumes with pods that don't use user namespaces.

Enable/disable user namespaces without needing to chown the pod's volumes.

Support for idmap mounts in the kernel is per file-system and different kernel releases added support for idmap mounts on different file-systems.

To find which kernel version added support for each file-system, you can check out the mount_setattr man page, or the online version of it here.

Most popular file-systems are supported, the notable absence that isn't supported yet is NFS.

  1. Can you clarify exactly which file-systems need to support idmap mounts?

The file-systems that need to support idmap mounts are all the file-systems used by a pod in the pod.spec.volumes field.

This means: for PV/PVC volumes, the file-system used in the PV needs to support idmap mounts; for hostPath volumes, the file-system used in the hostPath needs to support idmap mounts.

What does this mean for secrets/configmaps/projected/downwardAPI volumes? For these volumes, the kubelet creates a tmpfs file-system. So, you will need a 6.3 kernel to use these volumes (note that if you use them as env variables it is fine).

And what about emptyDir volumes? Those volumes are created by the kubelet by default in /var/lib/kubelet/pods/. You can also use a custom directory for this. But what needs to support idmap mounts is the file-system used in that directory.

The kubelet creates some more files for the container, like /etc/hostname, /etc/resolv.conf, /dev/termination-log, /etc/hosts, etc. These files are also created in /var/lib/kubelet/pods/ by default, so it's important for the file-system used in that directory to support idmap mounts.

Also, some container runtimes may put some of these ephemeral volumes inside a tmpfs file-system, in which case you will need support for idmap mounts in tmpfs.

  1. Can I use a kernel older than 6.3?

Yes, but you will need to make sure you are not using a tmpfs file-system. If you avoid that, you can easily use 5.19 (if all the other file-systems you use support idmap mounts in that kernel).

It can be tricky to avoid using tmpfs, though, as we just described above. Besides having to avoid those volume types, you will also have to avoid mounting the service account token. Every pod has it mounted by default, and it uses a projected volume that, as we mentioned, uses a tmpfs file-system.

You could even go lower than 5.19, all the way to 5.12. However, your container rootfs probably uses an overlayfs file-system, and support for overlayfs was added in 5.19. We wouldn't recommend to use a kernel older than 5.19, as not being able to use idmap mounts for the rootfs is a big limitation. If you absolutely need to, you can check this blog post Rodrigo wrote some years ago, about tricks to use user namespaces when you can't support idmap mounts on the rootfs.

  1. If my stack supports user namespaces, do I need to configure anything else?

No, if your stack supports it and you are using Kubernetes v1.33, there is nothing you need to configure. You should be able to follow the task: Use a user namespace with a pod.

However, in case you have specific requirements, you may configure various options. You can find more information here. You can also enable a feature gate to relax the PSS rules.

  1. The demos are nice, but are there more CVEs that this mitigates?

Yes, quite a lot, actually! Besides the ones in the demo, the KEP has more CVEs you can check. That list is not exhaustive, there are many more.

  1. Can you sum up why user namespaces is important?

Think about running a process as root, maybe even an untrusted process. Do you think that is secure? What if we limit it by adding seccomp and apparmor, mask some files in /proc (so it can't crash the node, etc.) and some more tweaks?

Wouldn't it be better if we don't give it privileges in the first place, instead of trying to play whack-a-mole with all the possible ways root can escape?

This is what user namespaces does, plus some other goodies:

Run as an unprivileged user on the host without making changes to your application. Greg and Vinayak did a great talk on the pains you can face when trying to run unprivileged without user namespaces. The pains part starts in this minute.

All pods run with different UIDs/GIDs, we significantly improve the lateral movement. This is guaranteed with user namespaces (the kubelet chooses it for you). In the same talk, Greg and Vinayak show that to achieve the same without user namespaces, they went through a quite complex custom solution. This part starts in this minute.

The capabilities granted are only granted inside the user namespace. That means that if a pod breaks out of the container, they are not valid on the host. We can't provide that without user namespaces.

It enables new use-cases in a secure way. You can run docker in docker, unprivileged container builds, Kubernetes inside Kubernetes, etc all in a secure way. Most of the previous solutions to do this required privilege

·kubernetes.io·
Kubernetes v1.33: User Namespaces enabled by default!
Ransomware Groups Evolve Affiliate Models | Secureworks
Ransomware Groups Evolve Affiliate Models | Secureworks
Learn how Secureworks CTU researchers observed the DragonForce and Anubis ransomware operators introducing novel models to attract affiliates and increase profits.
·secureworks.com·
Ransomware Groups Evolve Affiliate Models | Secureworks