Suggested Reads

Suggested Reads

54794 bookmarks
Newest
ioquake3
ioquake3
Play Quake 3, mods, new games, or make your own!
·ioquake3.org·
ioquake3
awslabs/s3-connector-for-pytorch: The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.
awslabs/s3-connector-for-pytorch: The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.
The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3. - awslabs/s3-connector-for-pytorch
·github.com·
awslabs/s3-connector-for-pytorch: The Amazon S3 Connector for PyTorch delivers high throughput for PyTorch training jobs that access and store data in Amazon S3.
botocove
botocove
A decorator to allow running a function against all AWS accounts in an organization
·pypi.org·
botocove
Framework’s Series A-1 and Community Participation
Framework’s Series A-1 and Community Participation
Today we’re announcing $18M in new funding from an incredible set of investors, with a $17M Series A-1 round led by Spark Capital, with Buckley Ventures, Anzu Partners, Cooler Master, and Pathbreaker Ventures participating. It’s ultimately your belief in our mission and products that drives our
·frame.work·
Framework’s Series A-1 and Community Participation
Which of these stages have you experienced before or are you currently in? Comment below👇 The term ‘burnout’ was first mentioned by… | Instagram
Which of these stages have you experienced before or are you currently in? Comment below👇 The term ‘burnout’ was first mentioned by… | Instagram
51K likes, 632 comments - thepresentpsychologist on February 21, 2024: "Which of these stages have you experienced before or are you currently in? Comment below👇 The term ‘burnout’ was first mentioned ...".
·instagram.com·
Which of these stages have you experienced before or are you currently in? Comment below👇 The term ‘burnout’ was first mentioned by… | Instagram
cookiecutter/cookiecutter: A cross-platform command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, C projects.
cookiecutter/cookiecutter: A cross-platform command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, C projects.
A cross-platform command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, C projects. - cookiecutter/cookiecutter
·github.com·
cookiecutter/cookiecutter: A cross-platform command-line utility that creates projects from cookiecutters (project templates), e.g. Python package projects, C projects.
These vendors are playing with fire. You DON’T break laws like this. | China acquired US-restricted Nvidia AI chips built in servers, tenders show
These vendors are playing with fire. You DON’T break laws like this. | China acquired US-restricted Nvidia AI chips built in servers, tenders show
Among the buyers were the Chinese Academy of Sciences, the Shandong Artificial Intelligence Institute, the Hubei Earthquake Administration and a state-run aviation research centre.
·scmp.com·
These vendors are playing with fire. You DON’T break laws like this. | China acquired US-restricted Nvidia AI chips built in servers, tenders show
Bluefin is now Generally Available - Bluefin and Aurora - Universal Blue
Bluefin is now Generally Available - Bluefin and Aurora - Universal Blue
That’s a mouthful! Project Bluefin has been my passion project for going on three years now, and thanks to a bunch of awesome people we feel it’s time to move to general availability (GA) and out of beta. Our young dromeasaur is ready! Download the ISOs here This Fedora 39-based version (which we call gts) and it’s been baking for 6 months. It is designed to be installed on a device and follow Fedora’s releases in perpetuity – we accomplish this by maintaining the image in GitHub as a comm...
·universal-blue.discourse.group·
Bluefin is now Generally Available - Bluefin and Aurora - Universal Blue
Kubernetes 1.30: Validating Admission Policy Is Generally Available
Kubernetes 1.30: Validating Admission Policy Is Generally Available

Kubernetes 1.30: Validating Admission Policy Is Generally Available

https://kubernetes.io/blog/2024/04/24/validating-admission-policy-ga/

On behalf of the Kubernetes project, I am excited to announce that ValidatingAdmissionPolicy has reached general availability as part of Kubernetes 1.30 release. If you have not yet read about this new declarative alternative to validating admission webhooks, it may be interesting to read our previous post about the new feature. If you have already heard about ValidatingAdmissionPolicies and you are eager to try them out, there is no better time to do it than now.

Let's have a taste of a ValidatingAdmissionPolicy, by replacing a simple webhook.

Example admission webhook

First, let's take a look at an example of a simple webhook. Here is an excerpt from a webhook that enforces runAsNonRoot, readOnlyRootFilesystem, allowPrivilegeEscalation, and privileged to be set to the least permissive values.

func verifyDeployment(deploy *appsv1.Deployment) error { var errs []error for i, c := range deploy.Spec.Template.Spec.Containers { if c.Name "" { return fmt.Errorf("container %d has no name", i) } if c.SecurityContext nil { errs = append(errs, fmt.Errorf("container %q does not have SecurityContext", c.Name)) } if c.SecurityContext.RunAsNonRoot nil || !*c.SecurityContext.RunAsNonRoot { errs = append(errs, fmt.Errorf("container %q must set RunAsNonRoot to true in its SecurityContext", c.Name)) } if c.SecurityContext.ReadOnlyRootFilesystem nil || !*c.SecurityContext.ReadOnlyRootFilesystem { errs = append(errs, fmt.Errorf("container %q must set ReadOnlyRootFilesystem to true in its SecurityContext", c.Name)) } if c.SecurityContext.AllowPrivilegeEscalation != nil && *c.SecurityContext.AllowPrivilegeEscalation { errs = append(errs, fmt.Errorf("container %q must NOT set AllowPrivilegeEscalation to true in its SecurityContext", c.Name)) } if c.SecurityContext.Privileged != nil && *c.SecurityContext.Privileged { errs = append(errs, fmt.Errorf("container %q must NOT set Privileged to true in its SecurityContext", c.Name)) } } return errors.NewAggregate(errs) }

Check out What are admission webhooks? Or, see the full code of this webhook to follow along with this walkthrough.

The policy

Now let's try to recreate the validation faithfully with a ValidatingAdmissionPolicy.

apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules:

  • apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] validations:
  • expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.runAsNonRoot) && c.securityContext.runAsNonRoot) message: 'all containers must set runAsNonRoot to true'
  • expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.readOnlyRootFilesystem) && c.securityContext.readOnlyRootFilesystem) message: 'all containers must set readOnlyRootFilesystem to true'
  • expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.allowPrivilegeEscalation) || !c.securityContext.allowPrivilegeEscalation) message: 'all containers must NOT set allowPrivilegeEscalation to true'
  • expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) message: 'all containers must NOT set privileged to true'

Create the policy with kubectl. Great, no complain so far. But let's get the policy object back and take a look at its status.

kubectl get -oyaml validatingadmissionpolicies/pod-security.policy.example.com

status: typeChecking: expressionWarnings:

  • fieldRef: spec.validations[3].expression warning: | apps/v1, Kind=Deployment: ERROR: <input>:1:76: undefined field 'Privileged' | object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) | ...........................................................................^ ERROR: <input>:1:128: undefined field 'Privileged' | object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.Privileged) || !c.securityContext.Privileged) | ...............................................................................................................................^

The policy was checked against its matched type, which is apps/v1.Deployment. Looking at the fieldRef, the problem was with the 3rd expression (index starts with 0) The expression in question accessed an undefined Privileged field. Ahh, looks like it was a copy-and-paste error. The field name should be in lowercase.

apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules:

  • apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] validations:
  • expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.runAsNonRoot) && c.securityContext.runAsNonRoot) message: 'all containers must set runAsNonRoot to true'
  • expression: object.spec.template.spec.containers.all(c, has(c.securityContext) && has(c.securityContext.readOnlyRootFilesystem) && c.securityContext.readOnlyRootFilesystem) message: 'all containers must set readOnlyRootFilesystem to true'
  • expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.allowPrivilegeEscalation) || !c.securityContext.allowPrivilegeEscalation) message: 'all containers must NOT set allowPrivilegeEscalation to true'
  • expression: object.spec.template.spec.containers.all(c, !has(c.securityContext) || !has(c.securityContext.privileged) || !c.securityContext.privileged) message: 'all containers must NOT set privileged to true'

Check its status again, and you should see all warnings cleared.

Next, let's create a namespace for our tests.

kubectl create namespace policy-test

Then, I bind the policy to the namespace. But at this point, I set the action to Warn so that the policy prints out warnings instead of rejecting the requests. This is especially useful to collect results from all expressions during development and automated testing.

apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicyBinding metadata: name: "pod-security.policy-binding.example.com" spec: policyName: "pod-security.policy.example.com" validationActions: ["Warn"] matchResources: namespaceSelector: matchLabels: "kubernetes.io/metadata.name": "policy-test"

Tests out policy enforcement.

kubectl create -n policy-test -f- <<EOF apiVersion: apps/v1 kind: Deployment metadata: labels: app: nginx name: nginx spec: selector: matchLabels: app: nginx template: metadata: labels: app: nginx spec: containers:

  • image: nginx name: nginx securityContext: privileged: true allowPrivilegeEscalation: true EOF

Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must set runAsNonRoot to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must set readOnlyRootFilesystem to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must NOT set allowPrivilegeEscalation to true Warning: Validation failed for ValidatingAdmissionPolicy 'pod-security.policy.example.com' with binding 'pod-security.policy-binding.example.com': all containers must NOT set privileged to true Error from server: error when creating "STDIN": admission webhook "webhook.example.com" denied the request: [container "nginx" must set RunAsNonRoot to true in its SecurityContext, container "nginx" must set ReadOnlyRootFilesystem to true in its SecurityContext, container "nginx" must NOT set AllowPrivilegeEscalation to true in its SecurityContext, container "nginx" must NOT set Privileged to true in its SecurityContext]

Looks great! The policy and the webhook give equivalent results. After a few other cases, when we are confident with our policy, maybe it is time to do some cleanup.

For every expression, we repeat access to object.spec.template.spec.containers and to each securityContext;

There is a pattern of checking presence of a field and then accessing it, which looks a bit verbose.

Fortunately, since Kubernetes 1.28, we have new solutions for both issues. Variable Composition allows us to extract repeated sub-expressions into their own variables. Kubernetes enables the optional library for CEL, which are excellent to work with fields that are, you guessed it, optional.

With both features in mind, let's refactor the policy a bit.

apiVersion: admissionregistration.k8s.io/v1 kind: ValidatingAdmissionPolicy metadata: name: "pod-security.policy.example.com" spec: failurePolicy: Fail matchConstraints: resourceRules:

  • apiGroups: ["apps"] apiVersions: ["v1"] operations: ["CREATE", "UPDATE"] resources: ["deployments"] variables:
  • name: containers expression: object.spec.template.spec.containers
  • name: securityContexts expression: 'variables.containers.map(c, c.?securityContext)' validations:
  • expression: variables.securityContexts.all(c, c.?runAsNonRoot == optional.of(true)) message: 'all containers must set runAsNonRoot to true'
  • expression: variables.securityContexts.all(c, c.?readOnlyRootFilesystem == optional.of(true)) message: 'all containers must set readOnlyRootFilesystem to true'
  • expression: variables.securityContexts.all(c, c.?allo
·kubernetes.io·
Kubernetes 1.30: Validating Admission Policy Is Generally Available
Week Ending April 21 2024
Week Ending April 21 2024

Week Ending April 21, 2024

https://lwkd.info/2024/20240423

Developer News

Kubernetes v1.30: Uwubernetes was released! Major features includes Go workspaces, Pod Scheduling Readiness, VolumeManager reconstruction after kubelet restart, Node log query and more. Read more in the announcement blog post and the release notes.

Release Schedule

Next Deadline: 1.31 Cycle Begins, April 2024

We are in the period between releases right now. Dates for 1.31 have not been published yet.

Featured PR

123905: # Field selector for Services based on ClusterIP and Type

In clusters with unusually large numbers of headless Services (i.e. Services without a cluster IP), it can cause memory bloat in the Kubelet as it has to cache these as part of the API informer. This PR extends the Service API to allow filtering on both clusterIP and type, both improving the memory usage of the Kubelet and decreasing load on the API. While this specific optimization only helps a niche audience, it’s worth reinforcing how this technique can be applied elsewhere. When optimizing any controller, always keep an eye open for how API watch traffic could be mitigated with server-side logic or filters. Creating field selectors is easy and streamlined, and can likely be used in many more such optimizations.

KEP of the Week

KEP 3521: Pod Scheduling Readiness

This KEP proposes to add an API to mark Pods as ready or paused for scheduling so that the scheduler is not wasting cycles retrying to schedule Pods that are determined to be unschedulable. The KEP adds APIs for users and controllers to control when a Pod is ready to be considered for scheduling. This is done with the new .spec.schedulingGate field to the Pod API. Pods will only be attempted to be scheduled to a Node by the scheduler when .spec.schedulingGate key is nil. A new Enqueue extension point is also added to customize Pod queueing behaviour.

This KEP graduated to stable in the v1.30 release.

Other Merges

*

Promotions

*

Deprecated

*

Version Updates

*

Subprojects and Dependency Updates

*

via Last Week in Kubernetes Development https://lwkd.info/

April 23, 2024 at 06:00PM

·lwkd.info·
Week Ending April 21 2024
Defining the Role of Distinguished Engineers | AWS Executive Insights
Defining the Role of Distinguished Engineers | AWS Executive Insights
Watch as Clarke Rodgers, Director of AWS Enterprise Strategy, interviews Paul about why he joined AWS and what his experience has been like so far. You’ll hear more about Paul’s notable career as an early influencer in the evolution of the Internet, how he was inducted into the Internet Hall of Fame, and what he’s doing now as a Deputy CISO and Distinguished Engineer at AWS.
·aws.amazon.com·
Defining the Role of Distinguished Engineers | AWS Executive Insights
Kubernetes 1.30: Read-only volume mounts can be finally literally read-only
Kubernetes 1.30: Read-only volume mounts can be finally literally read-only

Kubernetes 1.30: Read-only volume mounts can be finally literally read-only

https://kubernetes.io/blog/2024/04/23/recursive-read-only-mounts/

Author: Akihiro Suda (NTT)

Read-only volume mounts have been a feature of Kubernetes since the beginning. Surprisingly, read-only mounts are not completely read-only under certain conditions on Linux. As of the v1.30 release, they can be made completely read-only, with alpha support for recursive read-only mounts.

Read-only volume mounts are not really read-only by default

Volume mounts can be deceptively complicated.

You might expect that the following manifest makes everything under /mnt in the containers read-only:

--- apiVersion: v1 kind: Pod spec: volumes:

  • name: mnt hostPath: path: /mnt containers:
  • volumeMounts:
  • name: mnt mountPath: /mnt readOnly: true

However, any sub-mounts beneath /mnt may still be writable! For example, consider that /mnt/my-nfs-server is writeable on the host. Inside the container, writes to /mnt/ will be rejected but /mnt/my-nfs-server/ will still be writeable.

New mount option: recursiveReadOnly

Kubernetes 1.30 added a new mount option recursiveReadOnly so as to make submounts recursively read-only.

The option can be enabled as follows:

--- apiVersion: v1 kind: Pod spec: volumes:

  • name: mnt hostPath: path: /mnt containers:
  • volumeMounts:
  • name: mnt mountPath: /mnt readOnly: true # NEW # Possible values are Enabled, IfPossible, and Disabled. # Needs to be specified in conjunction with readOnly: true. recursiveReadOnly: Enabled

This is implemented by applying the MOUNT_ATTR_RDONLY attribute with the AT_RECURSIVE flag using mount_setattr(2) added in Linux kernel v5.12.

For backwards compatibility, the recursiveReadOnly field is not a replacement for readOnly, but is used in conjunction with it. To get a properly recursive read-only mount, you must set both fields.

Feature availability

To enable recursiveReadOnly mounts, the following components have to be used:

Kubernetes: v1.30 or later, with the RecursiveReadOnlyMounts feature gate enabled. As of v1.30, the gate is marked as alpha.

CRI runtime:

containerd: v2.0 or later

OCI runtime:

runc: v1.1 or later

crun: v1.8.6 or later

Linux kernel: v5.12 or later

What's next?

Kubernetes SIG Node hope - and expect - that the feature will be promoted to beta and eventually general availability (GA) in future releases of Kubernetes, so that users no longer need to enable the feature gate manually.

The default value of recursiveReadOnly will still remain Disabled, for backwards compatibility.

How can I learn more?

Please check out the documentation for the further details of recursiveReadOnly mounts.

How to get involved?

This feature is driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you!

via Kubernetes Blog https://kubernetes.io/

April 22, 2024 at 08:00PM

·kubernetes.io·
Kubernetes 1.30: Read-only volume mounts can be finally literally read-only
Exploring KCL: Configuration and Data Structure Language; CUE and Pkl Replacement?
Exploring KCL: Configuration and Data Structure Language; CUE and Pkl Replacement?

Exploring KCL: Configuration and Data Structure Language; CUE and Pkl Replacement?

Dive into the world of K Configuration Language (KCL).

This review and walkthrough illuminates the features and advantages of using KCL to generate YAML or JSON configurations and manifests. We cover the basics of KCL's syntax, its approach to handling hierarchical data, and demonstrate how to define and manipulate configurations with clarity and precision.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Hostmane 🔗 https://bit.ly/44ae0gf 🔗 Hostman offers affordable cloud services starting at just $1/month, including free bandwidth. The company’s services are hosted on globally secure, ISO-certified servers located in Tier 3 data centers. Key features include free Firewall, Private Networks, Images, Snapshots, and cost-effective backup solutions starting at $0.07/GB. Additionally, Hostman provides 24/7 rapid tech support and a 7-day trial with a $100 credit for new users. ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

KCL #Kubernetes

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Gist with the commands: https://gist.github.com/vfarcic/e6636bb851ae28d748fc8c1517bac931 🔗 KCL: https://kcl-lang.io 🎬 Is CUE The Perfect Language For Kubernetes Manifests (Helm Templates Replacement)?: https://youtu.be/m6g0aWggdUQ 🎬 Is Timoni With CUE a Helm Replacement?: https://youtu.be/bbE1BFCs548 🎬 Is Pkl the Ultimate Data Format? Unveiling the Challenger to YAML, JSON, and CUE: https://youtu.be/Nm1ioWPRRVQ

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please use https://calendar.app.google/Q9eaDUHN8ibWBaA7A to book a timeslot that suits you, and we'll go over the details. Or feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ Twitter: https://twitter.com/vfarcic ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to KCL 01:03 Hostman (sponsor) 01:42 Introduction to KCL (cont.) 05:41 KCL in Action 14:12 KCL Pros and Cons

via YouTube https://www.youtube.com/watch?v=Gn6btuH3ULw

·youtube.com·
Exploring KCL: Configuration and Data Structure Language; CUE and Pkl Replacement?
Amazon Music launches Maestro a new AI playlist generatorheres your first look at the beta
Amazon Music launches Maestro a new AI playlist generatorheres your first look at the beta

Amazon Music launches Maestro, a new AI playlist generator—here’s your first look at the beta

Today, Amazon Music announces a new feature that uses AI technology to make it easier and way more fun to build playlists you want, when you want. Meet Maestro:…

April 22, 2024 at 09:51AM

via Instapaper

·aboutamazon.com·
Amazon Music launches Maestro a new AI playlist generatorheres your first look at the beta
MITRE attack strikes a NERVE after Ivanti to VMware pivot
MITRE attack strikes a NERVE after Ivanti to VMware pivot
"We did not detect… lateral movement into our VMware infrastructure. At the time we believed we took all the necessary actions to mitigate the vulnerability, but these actions were clearly insufficient.”
·thestack.technology·
MITRE attack strikes a NERVE after Ivanti to VMware pivot
Kubernetes 1.30: Beta Support For Pods With User Namespaces
Kubernetes 1.30: Beta Support For Pods With User Namespaces

Kubernetes 1.30: Beta Support For Pods With User Namespaces

https://kubernetes.io/blog/2024/04/22/userns-beta/

Authors: Rodrigo Campos Catelin (Microsoft), Giuseppe Scrivano (Red Hat), Sascha Grunert (Red Hat)

Linux provides different namespaces to isolate processes from each other. For example, a typical Kubernetes pod runs within a network namespace to isolate the network identity and a PID namespace to isolate the processes.

One Linux namespace that was left behind is the user namespace. This namespace allows us to isolate the user and group identifiers (UIDs and GIDs) we use inside the container from the ones on the host.

This is a powerful abstraction that allows us to run containers as "root": we are root inside the container and can do everything root can inside the pod, but our interactions with the host are limited to what a non-privileged user can do. This is great for limiting the impact of a container breakout.

A container breakout is when a process inside a container can break out onto the host using some unpatched vulnerability in the container runtime or the kernel and can access/modify files on the host or other containers. If we run our pods with user namespaces, the privileges the container has over the rest of the host are reduced, and the files outside the container it can access are limited too.

In Kubernetes v1.25, we introduced support for user namespaces only for stateless pods. Kubernetes 1.28 lifted that restriction, and now, with Kubernetes 1.30, we are moving to beta!

What is a user namespace?

Note: Linux user namespaces are a different concept from Kubernetes namespaces. The former is a Linux kernel feature; the latter is a Kubernetes feature.

User namespaces are a Linux feature that isolates the UIDs and GIDs of the containers from the ones on the host. The identifiers in the container can be mapped to identifiers on the host in a way where the host UID/GIDs used for different containers never overlap. Furthermore, the identifiers can be mapped to unprivileged, non-overlapping UIDs and GIDs on the host. This brings two key benefits:

Prevention of lateral movement: As the UIDs and GIDs for different containers are mapped to different UIDs and GIDs on the host, containers have a harder time attacking each other, even if they escape the container boundaries. For example, suppose container A runs with different UIDs and GIDs on the host than container B. In that case, the operations it can do on container B's files and processes are limited: only read/write what a file allows to others, as it will never have permission owner or group permission (the UIDs/GIDs on the host are guaranteed to be different for different containers).

Increased host isolation: As the UIDs and GIDs are mapped to unprivileged users on the host, if a container escapes the container boundaries, even if it runs as root inside the container, it has no privileges on the host. This greatly protects what host files it can read/write, which process it can send signals to, etc. Furthermore, capabilities granted are only valid inside the user namespace and not on the host, limiting the impact a container escape can have.

User namespace IDs allocation

Without using a user namespace, a container running as root in the case of a container breakout has root privileges on the node. If some capabilities were granted to the container, the capabilities are valid on the host too. None of this is true when using user namespaces (modulo bugs, of course 🙂).

Changes in 1.30

In Kubernetes 1.30, besides moving user namespaces to beta, the contributors working on this feature:

Introduced a way for the kubelet to use custom ranges for the UIDs/GIDs mapping

Have added a way for Kubernetes to enforce that the runtime supports all the features needed for user namespaces. If they are not supported, Kubernetes will show a clear error when trying to create a pod with user namespaces. Before 1.30, if the container runtime didn't support user namespaces, the pod could be created without a user namespace.

Added more tests, including tests in the cri-tools repository.

You can check the documentation on user namespaces for how to configure custom ranges for the mapping.

Demo

A few months ago, CVE-2024-21626 was disclosed. This vulnerability score is 8.6 (HIGH). It allows an attacker to escape a container and read/write to any path on the node and other pods hosted on the same node.

Rodrigo created a demo that exploits CVE 2024-21626 and shows how the exploit, which works without user namespaces, is mitigated when user namespaces are in use.

Please note that with user namespaces, an attacker can do on the host file system what the permission bits for "others" allow. Therefore, the CVE is not completely prevented, but the impact is greatly reduced.

Node system requirements

There are requirements on the Linux kernel version and the container runtime to use this feature.

On Linux you need Linux 6.3 or greater. This is because the feature relies on a kernel feature named idmap mounts, and support for using idmap mounts with tmpfs was merged in Linux 6.3.

Suppose you are using CRI-O with crun; as always, you can expect support for Kubernetes 1.30 with CRI-O 1.30. Please note you also need crun 1.9 or greater. If you are using CRI-O with runc, this is still not supported.

Containerd support is currently targeted for containerd 2.0, and the same crun version requirements apply. If you are using containerd with runc, this is still not supported.

Please note that containerd 1.7 added experimental support for user namespaces, as implemented in Kubernetes 1.25 and 1.26. We did a redesign in Kubernetes 1.27, which requires changes in the container runtime. Those changes are not present in containerd 1.7, so it only works with user namespaces support in Kubernetes 1.25 and 1.26.

Another limitation of containerd 1.7 is that it needs to change the ownership of every file and directory inside the container image during Pod startup. This has a storage overhead and can significantly impact the container startup latency. Containerd 2.0 will probably include an implementation that will eliminate the added startup latency and storage overhead. Consider this if you plan to use containerd 1.7 with user namespaces in production.

None of these containerd 1.7 limitations apply to CRI-O.

How do I get involved?

You can reach SIG Node by several means:

Slack: #sig-node

Mailing list

Open Community Issues/PRs

You can also contact us directly:

GitHub: @rata @giuseppe @saschagrunert

Slack: @rata @giuseppe @sascha

via Kubernetes Blog https://kubernetes.io/

April 21, 2024 at 08:00PM

·kubernetes.io·
Kubernetes 1.30: Beta Support For Pods With User Namespaces