
1_r/devopsish
Kubernetes 1.31: Custom Profiling in Kubectl Debug Graduates to Beta
https://kubernetes.io/blog/2024/08/22/kubernetes-1-31-custom-profiling-kubectl-debug/
There are many ways of troubleshooting the pods and nodes in the cluster. However, kubectl debug is one of the easiest, highly used and most prominent ones. It provides a set of static profiles and each profile serves for a different kind of role. For instance, from the network administrator's point of view, debugging the node should be as easy as this:
$ kubectl debug node/mynode -it --image=busybox --profile=netadmin
On the other hand, static profiles also bring about inherent rigidity, which has some implications for some pods contrary to their ease of use. Because there are various kinds of pods (or nodes) that all have their specific necessities, and unfortunately, some can't be debugged by only using the static profiles.
Take an instance of a simple pod consisting of a container whose healthiness relies on an environment variable:
apiVersion: v1 kind: Pod metadata: name: example-pod spec: containers:
- name: example-container image: customapp:latest env:
- name: REQUIRED_ENV_VAR value: "value1"
Currently, copying the pod is the sole mechanism that supports debugging this pod in kubectl debug. Furthermore, what if user needs to modify the REQUIRED_ENV_VAR to something different for advanced troubleshooting?. There is no mechanism to achieve this.
Custom Profiling
Custom profiling is a new functionality available under --custom flag, introduced in kubectl debug to provide extensibility. It expects partial Container spec in either YAML or JSON format. In order to debug the example-container above by creating an ephemeral container, we simply have to define this YAML:
partial_container.yaml
env:
- name: REQUIRED_ENV_VAR value: value2
and execute:
kubectl debug example-pod -it --image=customapp --custom=partial_container.yaml
Here is another example that modifies multiple fields at once (change port number, add resource limits, modify environment variable) in JSON:
{ "ports": [ { "containerPort": 80 } ], "resources": { "limits": { "cpu": "0.5", "memory": "512Mi" }, "requests": { "cpu": "0.2", "memory": "256Mi" } }, "env": [ { "name": "REQUIRED_ENV_VAR", "value": "value2" } ] }
Constraints
Uncontrolled extensibility hurts the usability. So that, custom profiling is not allowed for certain fields such as command, image, lifecycle, volume devices and container name. In the future, more fields can be added to the disallowed list if required.
Limitations
The kubectl debug command has 3 aspects: Debugging with ephemeral containers, pod copying, and node debugging. The largest intersection set of these aspects is the container spec within a Pod That's why, custom profiling only supports the modification of the fields that are defined with containers. This leads to a limitation that if user needs to modify the other fields in the Pod spec, it is not supported.
Acknowledgments
Special thanks to all the contributors who reviewed and commented on this feature, from the initial conception to its actual implementation (alphabetical order):
Eddie Zaneski
Maciej Szulik
Lee Verberne
via Kubernetes Blog https://kubernetes.io/
August 21, 2024 at 08:00PM
Kubernetes 1.31: Fine-grained SupplementalGroups control
https://kubernetes.io/blog/2024/08/22/fine-grained-supplementalgroups-control/
This blog discusses a new feature in Kubernetes 1.31 to improve the handling of supplementary groups in containers within Pods.
Motivation: Implicit group memberships defined in /etc/group in the container image
Although this behavior may not be popular with many Kubernetes cluster users/admins, kubernetes, by default, merges group information from the Pod with information defined in /etc/group in the container image.
Let's see an example, below Pod specifies runAsUser=1000, runAsGroup=3000 and supplementalGroups=4000 in the Pod's security context.
implicit-groups.yaml
apiVersion: v1 kind: Pod metadata: name: implicit-groups spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] containers:
- name: ctr image: registry.k8s.io/e2e-test-images/agnhost:2.45 command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false
What is the result of id command in the ctr container?
Create the Pod:
$ kubectl apply -f https://k8s.io/blog/2024-08-22-Fine-grained-SupplementalGroups-control/implicit-groups.yaml
Verify that the Pod's Container is running:
$ kubectl get pod implicit-groups
Check the id command
$ kubectl exec implicit-groups -- id
Then, output should be similar to this:
uid=1000 gid=3000 groups=3000,4000,50000
Where does group ID 50000 in supplementary groups (groups field) come from, even though 50000 is not defined in the Pod's manifest at all? The answer is /etc/group file in the container image.
Checking the contents of /etc/group in the container image should show below:
$ kubectl exec implicit-groups -- cat /etc/group ... user-defined-in-image:x:1000: group-defined-in-image:x:50000:user-defined-in-image
Aha! The container's primary user 1000 belongs to the group 50000 in the last entry.
Thus, the group membership defined in /etc/group in the container image for the container's primary user is implicitly merged to the information from the Pod. Please note that this was a design decision the current CRI implementations inherited from Docker, and the community never really reconsidered it until now.
What's wrong with it?
The implicitly merged group information from /etc/group in the container image may cause some concerns particularly in accessing volumes (see kubernetes/kubernetes#112879 for details) because file permission is controlled by uid/gid in Linux. Even worse, the implicit gids from /etc/group can not be detected/validated by any policy engines because there is no clue for the implicit group information in the manifest. This can also be a concern for Kubernetes security.
Fine-grained SupplementalGroups control in a Pod: SupplementaryGroupsPolicy
To tackle the above problem, Kubernetes 1.31 introduces new field supplementalGroupsPolicy in Pod's .spec.securityContext.
This field provies a way to control how to calculate supplementary groups for the container processes in a Pod. The available policy is below:
Merge: The group membership defined in /etc/group for the container's primary user will be merged. If not specified, this policy will be applied (i.e. as-is behavior for backword compatibility).
Strict: it only attaches specified group IDs in fsGroup, supplementalGroups, or runAsGroup fields as the supplementary groups of the container processes. This means no group membership defined in /etc/group for the container's primary user will be merged.
Let's see how Strict policy works.
strict-supplementalgroups-policy.yaml
apiVersion: v1 kind: Pod metadata: name: strict-supplementalgroups-policy spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] supplementalGroupsPolicy: Strict containers:
- name: ctr image: registry.k8s.io/e2e-test-images/agnhost:2.45 command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false
Create the Pod:
$ kubectl apply -f https://k8s.io/blog/2024-08-22-Fine-grained-SupplementalGroups-control/strict-supplementalgroups-policy.yaml
Verify that the Pod's Container is running:
$ kubectl get pod strict-supplementalgroups-policy
Check the process identity:
kubectl exec -it strict-supplementalgroups-policy -- id
The output should be similar to this:
uid=1000 gid=3000 groups=3000,4000
You can see Strict policy can exclude group 50000 from groups!
Thus, ensuring supplementalGroupsPolicy: Strict (enforced by some policy mechanism) helps prevent the implicit supplementary groups in a Pod.
Note: Actually, this is not enough because container with sufficient privileges / capability can change its process identity. Please see the following section for details.
Attached process identity in Pod status
This feature also exposes the process identity attached to the first container process of the container via .status.containerStatuses[].user.linux field. It would be helpful to see if implicit group IDs are attached.
... status: containerStatuses:
- name: ctr user: linux: gid: 3000 supplementalGroups:
- 3000
- 4000 uid: 1000 ...
Note: Please note that the values in status.containerStatuses[].user.linux field is the firstly attached process identity to the first container process in the container. If the container has sufficient privilege to call system calls related to process identity (e.g. setuid(2), setgid(2) or setgroups(2), etc.), the container process can change its identity. Thus, the actual process identity will be dynamic.
Feature availability
To enable supplementalGroupsPolicy field, the following components have to be used:
Kubernetes: v1.31 or later, with the SupplementalGroupsPolicy feature gate enabled. As of v1.31, the gate is marked as alpha.
CRI runtime:
containerd: v2.0 or later
CRI-O: v1.31 or later
You can see if the feature is supported in the Node's .status.features.supplementalGroupsPolicy field.
apiVersion: v1 kind: Node ... status: features: supplementalGroupsPolicy: true
What's next?
Kubernetes SIG Node hope - and expect - that the feature will be promoted to beta and eventually general availability (GA) in future releases of Kubernetes, so that users no longer need to enable the feature gate manually.
Merge policy is applied when supplementalGroupsPolicy is not specified, for backwards compatibility.
How can I learn more?
Configure a Security Context for a Pod or Container for the further details of supplementalGroupsPolicy
KEP-3619: Fine-grained SupplementalGroups control
How to get involved?
This feature is driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you!
via Kubernetes Blog https://kubernetes.io/
August 21, 2024 at 08:00PM
Kubernetes v1.31: New Kubernetes CPUManager Static Policy: Distribute CPUs Across Cores
https://kubernetes.io/blog/2024/08/22/cpumanager-static-policy-distributed-cpu-across-cores/
In Kubernetes v1.31, we are excited to introduce a significant enhancement to CPU management capabilities: the distribute-cpus-across-cores option for the CPUManager static policy. This feature is currently in alpha and hidden by default, marking a strategic shift aimed at optimizing CPU utilization and improving system performance across multi-core processors.
Understanding the feature
Traditionally, Kubernetes' CPUManager tends to allocate CPUs as compactly as possible, typically packing them onto the fewest number of physical cores. However, allocation strategy matters, CPUs on the same physical host still share some resources of the physical core, such as the cache and execution units, etc.
While default approach minimizes inter-core communication and can be beneficial under certain scenarios, it also poses a challenge. CPUs sharing a physical core can lead to resource contention, which in turn may cause performance bottlenecks, particularly noticeable in CPU-intensive applications.
The new distribute-cpus-across-cores feature addresses this issue by modifying the allocation strategy. When enabled, this policy option instructs the CPUManager to spread out the CPUs (hardware threads) across as many physical cores as possible. This distribution is designed to minimize contention among CPUs sharing the same physical core, potentially enhancing the performance of applications by providing them dedicated core resources.
Technically, within this static policy, the free CPU list is reordered in the manner depicted in the diagram, aiming to allocate CPUs from separate physical cores.
Enabling the feature
To enable this feature, users firstly need to add --cpu-manager-policy=static kubelet flag or the cpuManagerPolicy: static field in KubeletConfiuration. Then user can add --cpu-manager-policy-options distribute-cpus-across-cores=true or distribute-cpus-across-cores=true to their CPUManager policy options in the Kubernetes configuration or. This setting directs the CPUManager to adopt the new distribution strategy. It is important to note that this policy option cannot currently be used in conjunction with full-pcpus-only or distribute-cpus-across-numa options.
Current limitations and future directions
As with any new feature, especially one in alpha, there are limitations and areas for future improvement. One significant current limitation is that distribute-cpus-across-cores cannot be combined with other policy options that might conflict in terms of CPU allocation strategies. This restriction can affect compatibility with certain workloads and deployment scenarios that rely on more specialized resource management.
Looking forward, we are committed to enhancing the compatibility and functionality of the distribute-cpus-across-cores option. Future updates will focus on resolving these compatibility issues, allowing this policy to be combined with other CPUManager policies seamlessly. Our goal is to provide a more flexible and robust CPU allocation framework that can adapt to a variety of workloads and performance demands.
Conclusion
The introduction of the distribute-cpus-across-cores policy in Kubernetes CPUManager is a step forward in our ongoing efforts to refine resource management and improve application performance. By reducing the contention on physical cores, this feature offers a more balanced approach to CPU resource allocation, particularly beneficial for environments running heterogeneous workloads. We encourage Kubernetes users to test this new feature and provide feedback, which will be invaluable in shaping its future development.
This draft aims to clearly explain the new feature while setting expectations for its current stage and future improvements.
Further reading
Please check out the Control CPU Management Policies on the Node task page to learn more about the CPU Manager, and how it fits in relation to the other node-level resource managers.
Getting involved
This feature is driven by the SIG Node. If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please attend the SIG Node meeting for more details.
via Kubernetes Blog https://kubernetes.io/
August 21, 2024 at 08:00PM
Week Ending August 18, 2024
https://lwkd.info/2024/20240820
Developer News
The Steering Committee nominations are open until August 24. Currently there are four candidates running for three seats. If you are a candidate, or thinking of running, join current Steering members for a Q&A.
All Kubernetes GitHub orgs have been moved under our enterprise account. However, the Prow migration starts August 21, so subprojects should hold off on releases until it’s complete.
Release Schedule
Next Deadline: 1.32 cycle begins, September 9
Kubernetes v1.31.0 is live and the latest! The 1.32 release cycle will begin soon, with Release Team Lead Federico Muñoz.
The latest patch releases v1.28.13, v1.29.8 and v1.30.4 are available.
The Release Team Shadow applications are now live. This form will close on Friday, September 06, 2024. Selected applicants will be notified by the end of the day, Friday, September 13, 2024.
KEP of the Week
KEP 3866: Add an nftables-based kube-proxy backend
The KEP creates a new nftables backend for kube-proxy on Linux to replace the current iptables and ipvs backends. iptables, the default backend, suffers from unfixable performance issues, such as slow rule updates and degraded packet processing as the ruleset grows. While it is hoped that this backend will eventually replace both the iptables and ipvs backends and become the default kube-proxy mode on Linux, that replacement/deprecation would be handled in a separate future KEP.
This KEP is tracked for beta release in the upcoming v1.31.
Other Merges
NodeToStatus map is now a struct (should it be “NodeToStatusStruct”?), which requires changes to all PostFilter plugins
All Feature Gates will be added as featuregate.VersionedSpecs to support control plane versioning
PVC Protection Controller is faster thanks to batch processing
DisableNodeKubeProxyVersion was enabled too soon, so back to defaut disabled
Show image volumes for pods that have them
Regression fix: honor --version Build ID overrides
Prevent preemption pod deletion fail
Use AllocatedResources so that users can recover from node expansion failure
Allow orphan pod processors to speed up concurrent job tracking completion
Disallow extra namespaced keys in structured auth config
Kubeadm gets a validation warning for misconfigured cert periods
kube-proxy waits for all caches to be synced
PriorityClass displays preemptionPolicy
hostNetwork no longer depends on PodIPs being assigned
Node Monitor Grace Period is 50 seconds
Stop retrying the watcher if it doesn’t have permission to watch
Adding PVCs has a queueing hint
New Tests: NodeGetVolumeStats
Stuctured Logging Migration: CSI translation lib
Promotions
kubeadm etcd Learner Mode to GA
Deprecated
Remove Graduated Feature Gates: KMSv2
Version Updates
CoreDNS to 1.11.3
via Last Week in Kubernetes Development https://lwkd.info/
August 20, 2024 at 06:00PM
Kubernetes: rise of a global Open Source movement
In this video, we delve into the story of Kubernetes, from its roots at Google to becoming a cornerstone of the open source and cloud native movements. Discover…
August 21, 2024 at 02:10PM
via Instapaper
S3 Sponsorship Extension - More Resources to Build a More Sustainable Nix
Extended S3 Sponsorship! I’m thrilled to share some great news that hope fuels the rest of the week positively. The AWS Open Source team has once again stepped…
August 21, 2024 at 01:55PM
via Instapaper
How to Terminate Go Programs Elegantly – A Guide to Graceful Shutdowns
August 21, 2024 at 09:29AM
via Instapaper
I just want mTLS on Kubernetes
A common phrase when talking to Kubernetes users is "I just want all my traffic mTLS encrypted on Kubernetes." Occasionally, this comes with some additional…
August 21, 2024 at 09:28AM
via Instapaper
Installing Karpenter: Lessons Learned From Our Experience
But before getting started, let's explain what Karpenter is... AWS Karpenter is an open-source, flexible, high-performance Kubernetes cluster autoscaler. It was…
August 21, 2024 at 09:28AM
via Instapaper
Kubernetes 1.31: Autoconfiguration For Node Cgroup Driver (beta)
https://kubernetes.io/blog/2024/08/21/cri-cgroup-driver-lookup-now-beta/
Historically, configuring the correct cgroup driver has been a pain point for users running new Kubernetes clusters. On Linux systems, there are two different cgroup drivers: cgroupfs and systemd. In the past, both the kubelet and CRI implementation (like CRI-O or containerd) needed to be configured to use the same cgroup driver, or else the kubelet would exit with an error. This was a source of headaches for many cluster admins. However, there is light at the end of the tunnel!
Automated cgroup driver detection
In v1.28.0, the SIG Node community introduced the feature gate KubeletCgroupDriverFromCRI, which instructs the kubelet to ask the CRI implementation which cgroup driver to use. A few minor releases of Kubernetes happened whilst we waited for support to land in the major two CRI implementations (containerd and CRI-O), but as of v1.31.0, this feature is now beta!
In addition to setting the feature gate, a cluster admin needs to ensure their CRI implementation is new enough:
containerd: Support was added in v2.0.0
CRI-O: Support was added in v1.28.0
Then, they should ensure their CRI implementation is configured to the cgroup_driver they would like to use.
Future work
Eventually, support for the kubelet's cgroupDriver configuration field will be dropped, and the kubelet will fail to start if the CRI implementation isn't new enough to have support for this feature.
via Kubernetes Blog https://kubernetes.io/
August 20, 2024 at 08:00PM
Who needs GitHub Copilot when you roll your own
Hands on Code assistants have gained considerable attention as an early use case for generative AI – especially following the launch of Microsoft's GitHub…
August 20, 2024 at 10:43AM
via Instapaper
The Window-Knocking Machine Test · ines.io
AI is making futurists of us all. With the dizzying speed of new innovations, it’s clear that our lives and work are going to change. So what’s next? How will…
August 20, 2024 at 10:38AM
via Instapaper
continuedev/continue: ⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
August 20, 2024 at 09:47AM
via Instapaper
Kubernetes 1.31: Streaming Transitions from SPDY to WebSockets
https://kubernetes.io/blog/2024/08/20/websockets-transition/
In Kubernetes 1.31, by default kubectl now uses the WebSocket protocol instead of SPDY for streaming.
This post describes what these changes mean for you and why these streaming APIs matter.
Streaming APIs in Kubernetes
In Kubernetes, specific endpoints that are exposed as an HTTP or RESTful interface are upgraded to streaming connections, which require a streaming protocol. Unlike HTTP, which is a request-response protocol, a streaming protocol provides a persistent connection that's bi-directional, low-latency, and lets you interact in real-time. Streaming protocols support reading and writing data between your client and the server, in both directions, over the same connection. This type of connection is useful, for example, when you create a shell in a running container from your local workstation and run commands in the container.
Why change the streaming protocol?
Before the v1.31 release, Kubernetes used the SPDY/3.1 protocol by default when upgrading streaming connections. SPDY/3.1 has been deprecated for eight years, and it was never standardized. Many modern proxies, gateways, and load balancers no longer support the protocol. As a result, you might notice that commands like kubectl cp, kubectl attach, kubectl exec, and kubectl port-forward stop working when you try to access your cluster through a proxy or gateway.
As of Kubernetes v1.31, SIG API Machinery has modified the streaming protocol that a Kubernetes client (such as kubectl) uses for these commands to the more modern WebSocket streaming protocol. The WebSocket protocol is a currently supported standardized streaming protocol that guarantees compatibility and interoperability with different components and programming languages. The WebSocket protocol is more widely supported by modern proxies and gateways than SPDY.
How streaming APIs work
Kubernetes upgrades HTTP connections to streaming connections by adding specific upgrade headers to the originating HTTP request. For example, an HTTP upgrade request for running the date command on an nginx container within a cluster is similar to the following:
$ kubectl exec -v=8 nginx -- date GET https://127.0.0.1:43251/api/v1/namespaces/default/pods/nginx/exec?command=date… Request Headers: Connection: Upgrade Upgrade: websocket Sec-Websocket-Protocol: v5.channel.k8s.io User-Agent: kubectl/v1.31.0 (linux/amd64) kubernetes/6911225
If the container runtime supports the WebSocket streaming protocol and at least one of the subprotocol versions (e.g. v5.channel.k8s.io), the server responds with a successful 101 Switching Protocols status, along with the negotiated subprotocol version:
Response Status: 101 Switching Protocols in 3 milliseconds Response Headers: Upgrade: websocket Connection: Upgrade Sec-Websocket-Accept: j0/jHW9RpaUoGsUAv97EcKw8jFM= Sec-Websocket-Protocol: v5.channel.k8s.io
At this point the TCP connection used for the HTTP protocol has changed to a streaming connection. Subsequent STDIN, STDOUT, and STDERR data (as well as terminal resizing data and process exit code data) for this shell interaction is then streamed over this upgraded connection.
How to use the new WebSocket streaming protocol
If your cluster and kubectl are on version 1.29 or later, there are two control plane feature gates and two kubectl environment variables that govern the use of the WebSockets rather than SPDY. In Kubernetes 1.31, all of the following feature gates are in beta and are enabled by default:
Feature gates
TranslateStreamCloseWebsocketRequests
.../exec
.../attach
PortForwardWebsockets
.../port-forward
kubectl feature control environment variables
KUBECTL_REMOTE_COMMAND_WEBSOCKETS
kubectl exec
kubectl cp
kubectl attach
KUBECTL_PORT_FORWARD_WEBSOCKETS
kubectl port-forward
If you're connecting to an older cluster but can manage the feature gate settings, turn on both TranslateStreamCloseWebsocketRequests (added in Kubernetes v1.29) and PortForwardWebsockets (added in Kubernetes v1.30) to try this new behavior. Version 1.31 of kubectl can automatically use the new behavior, but you do need to connect to a cluster where the server-side features are explicitly enabled.
Learn more about streaming APIs
KEP 4006 - Transitioning from SPDY to WebSockets
RFC 6455 - The WebSockets Protocol
Container Runtime Interface streaming explained
via Kubernetes Blog https://kubernetes.io/
August 19, 2024 at 08:00PM
(21) Post | Feed | LinkedIn
1 notification total Repost successful. View repost…
August 19, 2024 at 11:24AM
via Instapaper