
Suggested Reads
Sourcegraph makes core repository private, co-founder complains open source means "extra work and risk" • DEVCLASS
Sourcegraph has removed the formerly open source core repository for its popular code search product from public view – with CEO and co-founder Quinn Slack…
August 23, 2024 at 09:16AM
via Instapaper
Kubernetes v1.31: kubeadm v1beta4
https://kubernetes.io/blog/2024/08/23/kubernetes-1-31-kubeadm-v1beta4/
As part of the Kubernetes v1.31 release, kubeadm is adopting a new (v1beta4) version of its configuration file format. Configuration in the previous v1beta3 format is now formally deprecated, which means it's supported but you should migrate to v1beta4 and stop using the deprecated format. Support for v1beta3 configuration will be removed after a minimum of 3 Kubernetes minor releases.
In this article, I'll walk you through key changes; I'll explain about the kubeadm v1beta4 configuration format, and how to migrate from v1beta3 to v1beta4.
You can read the reference for the v1beta4 configuration format: kubeadm Configuration (v1beta4).
A list of changes since v1beta3
This version improves on the v1beta3 format by fixing some minor issues and adding a few new fields.
To put it simply,
Two new configuration elements: ResetConfiguration and UpgradeConfiguration
For InitConfiguration and JoinConfiguration, dryRun mode and nodeRegistration.imagePullSerial are supported
For ClusterConfiguration, there are new fields including certificateValidityPeriod, caCertificateValidityPeriod, encryptionAlgorithm, dns.disabled and proxy.disabled.
Support extraEnvs for all control plan components
extraArgs changed from a map to structured extra arguments for duplicates
Add a timeouts structure for init, join, upgrade and reset.
For details, you can see the official document below:
Support custom environment variables in control plane components under ClusterConfiguration. Use apiServer.extraEnvs, controllerManager.extraEnvs, scheduler.extraEnvs, etcd.local.extraEnvs.
The ResetConfiguration API type is now supported in v1beta4. Users are able to reset a node by passing a --config file to kubeadm reset.
dryRun mode is now configurable in InitConfiguration and JoinConfiguration.
Replace the existing string/string extra argument maps with structured extra arguments that support duplicates. The change applies to ClusterConfiguration - apiServer.extraArgs, controllerManager.extraArgs, scheduler.extraArgs, etcd.local.extraArgs. Also to nodeRegistrationOptions.kubeletExtraArgs.
Added ClusterConfiguration.encryptionAlgorithm that can be used to set the asymmetric encryption algorithm used for this cluster's keys and certificates. Can be one of "RSA-2048" (default), "RSA-3072", "RSA-4096" or "ECDSA-P256".
Added ClusterConfiguration.dns.disabled and ClusterConfiguration.proxy.disabled that can be used to disable the CoreDNS and kube-proxy addons during cluster initialization. Skipping the related addons phases, during cluster creation will set the same fields to true.
Added the nodeRegistration.imagePullSerial field in InitConfiguration and JoinConfiguration, which can be used to control if kubeadm pulls images serially or in parallel.
The UpgradeConfiguration kubeadm API is now supported in v1beta4 when passing --config to kubeadm upgrade subcommands. For upgrade subcommands, the usage of component configuration for kubelet and kube-proxy, as well as InitConfiguration and ClusterConfiguration, is now deprecated and will be ignored when passing --config.
Added a timeouts structure to InitConfiguration, JoinConfiguration, ResetConfiguration and UpgradeConfiguration that can be used to configure various timeouts. The ClusterConfiguration.timeoutForControlPlane field is replaced by timeouts.controlPlaneComponentHealthCheck. The JoinConfiguration.discovery.timeout is replaced by timeouts.discovery.
Added a certificateValidityPeriod and caCertificateValidityPeriod fields to ClusterConfiguration. These fields can be used to control the validity period of certificates generated by kubeadm during sub-commands such as init, join, upgrade and certs. Default values continue to be 1 year for non-CA certificates and 10 years for CA certificates. Also note that only non-CA certificates are renewable by kubeadm certs renew.
These changes simplify the configuration of tools that use kubeadm and improve the extensibility of kubeadm itself.
How to migrate v1beta3 configuration to v1beta4?
If your configuration is not using the latest version, it is recommended that you migrate using the kubeadm config migrate command.
This command reads an existing configuration file that uses the old format, and writes a new file that uses the current format.
Example
Using kubeadm v1.31, run kubeadm config migrate --old-config old-v1beta3.yaml --new-config new-v1beta4.yaml
How do I get involved?
Huge thanks to all the contributors who helped with the design, implementation, and review of this feature:
Lubomir I. Ivanov (neolit123)
Dave Chen(chendave)
Paco Xu (pacoxu)
Sata Qiu(sataqiu)
Baofa Fan(carlory)
Calvin Chen(calvin0327)
Ruquan Zhao(ruquanzhao)
For those interested in getting involved in future discussions on kubeadm configuration, you can reach out kubeadm or SIG-cluster-lifecycle by several means:
v1beta4 related items are tracked in kubeadm issue #2890.
Slack: #kubeadm or #sig-cluster-lifecycle
Mailing list
via Kubernetes Blog https://kubernetes.io/
August 22, 2024 at 08:00PM
Kubernetes 1.31: Custom Profiling in Kubectl Debug Graduates to Beta
https://kubernetes.io/blog/2024/08/22/kubernetes-1-31-custom-profiling-kubectl-debug/
There are many ways of troubleshooting the pods and nodes in the cluster. However, kubectl debug is one of the easiest, highly used and most prominent ones. It provides a set of static profiles and each profile serves for a different kind of role. For instance, from the network administrator's point of view, debugging the node should be as easy as this:
$ kubectl debug node/mynode -it --image=busybox --profile=netadmin
On the other hand, static profiles also bring about inherent rigidity, which has some implications for some pods contrary to their ease of use. Because there are various kinds of pods (or nodes) that all have their specific necessities, and unfortunately, some can't be debugged by only using the static profiles.
Take an instance of a simple pod consisting of a container whose healthiness relies on an environment variable:
apiVersion: v1 kind: Pod metadata: name: example-pod spec: containers:
- name: example-container image: customapp:latest env:
- name: REQUIRED_ENV_VAR value: "value1"
Currently, copying the pod is the sole mechanism that supports debugging this pod in kubectl debug. Furthermore, what if user needs to modify the REQUIRED_ENV_VAR to something different for advanced troubleshooting?. There is no mechanism to achieve this.
Custom Profiling
Custom profiling is a new functionality available under --custom flag, introduced in kubectl debug to provide extensibility. It expects partial Container spec in either YAML or JSON format. In order to debug the example-container above by creating an ephemeral container, we simply have to define this YAML:
partial_container.yaml
env:
- name: REQUIRED_ENV_VAR value: value2
and execute:
kubectl debug example-pod -it --image=customapp --custom=partial_container.yaml
Here is another example that modifies multiple fields at once (change port number, add resource limits, modify environment variable) in JSON:
{ "ports": [ { "containerPort": 80 } ], "resources": { "limits": { "cpu": "0.5", "memory": "512Mi" }, "requests": { "cpu": "0.2", "memory": "256Mi" } }, "env": [ { "name": "REQUIRED_ENV_VAR", "value": "value2" } ] }
Constraints
Uncontrolled extensibility hurts the usability. So that, custom profiling is not allowed for certain fields such as command, image, lifecycle, volume devices and container name. In the future, more fields can be added to the disallowed list if required.
Limitations
The kubectl debug command has 3 aspects: Debugging with ephemeral containers, pod copying, and node debugging. The largest intersection set of these aspects is the container spec within a Pod That's why, custom profiling only supports the modification of the fields that are defined with containers. This leads to a limitation that if user needs to modify the other fields in the Pod spec, it is not supported.
Acknowledgments
Special thanks to all the contributors who reviewed and commented on this feature, from the initial conception to its actual implementation (alphabetical order):
Eddie Zaneski
Maciej Szulik
Lee Verberne
via Kubernetes Blog https://kubernetes.io/
August 21, 2024 at 08:00PM
Kubernetes 1.31: Fine-grained SupplementalGroups control
https://kubernetes.io/blog/2024/08/22/fine-grained-supplementalgroups-control/
This blog discusses a new feature in Kubernetes 1.31 to improve the handling of supplementary groups in containers within Pods.
Motivation: Implicit group memberships defined in /etc/group in the container image
Although this behavior may not be popular with many Kubernetes cluster users/admins, kubernetes, by default, merges group information from the Pod with information defined in /etc/group in the container image.
Let's see an example, below Pod specifies runAsUser=1000, runAsGroup=3000 and supplementalGroups=4000 in the Pod's security context.
implicit-groups.yaml
apiVersion: v1 kind: Pod metadata: name: implicit-groups spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] containers:
- name: ctr image: registry.k8s.io/e2e-test-images/agnhost:2.45 command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false
What is the result of id command in the ctr container?
Create the Pod:
$ kubectl apply -f https://k8s.io/blog/2024-08-22-Fine-grained-SupplementalGroups-control/implicit-groups.yaml
Verify that the Pod's Container is running:
$ kubectl get pod implicit-groups
Check the id command
$ kubectl exec implicit-groups -- id
Then, output should be similar to this:
uid=1000 gid=3000 groups=3000,4000,50000
Where does group ID 50000 in supplementary groups (groups field) come from, even though 50000 is not defined in the Pod's manifest at all? The answer is /etc/group file in the container image.
Checking the contents of /etc/group in the container image should show below:
$ kubectl exec implicit-groups -- cat /etc/group ... user-defined-in-image:x:1000: group-defined-in-image:x:50000:user-defined-in-image
Aha! The container's primary user 1000 belongs to the group 50000 in the last entry.
Thus, the group membership defined in /etc/group in the container image for the container's primary user is implicitly merged to the information from the Pod. Please note that this was a design decision the current CRI implementations inherited from Docker, and the community never really reconsidered it until now.
What's wrong with it?
The implicitly merged group information from /etc/group in the container image may cause some concerns particularly in accessing volumes (see kubernetes/kubernetes#112879 for details) because file permission is controlled by uid/gid in Linux. Even worse, the implicit gids from /etc/group can not be detected/validated by any policy engines because there is no clue for the implicit group information in the manifest. This can also be a concern for Kubernetes security.
Fine-grained SupplementalGroups control in a Pod: SupplementaryGroupsPolicy
To tackle the above problem, Kubernetes 1.31 introduces new field supplementalGroupsPolicy in Pod's .spec.securityContext.
This field provies a way to control how to calculate supplementary groups for the container processes in a Pod. The available policy is below:
Merge: The group membership defined in /etc/group for the container's primary user will be merged. If not specified, this policy will be applied (i.e. as-is behavior for backword compatibility).
Strict: it only attaches specified group IDs in fsGroup, supplementalGroups, or runAsGroup fields as the supplementary groups of the container processes. This means no group membership defined in /etc/group for the container's primary user will be merged.
Let's see how Strict policy works.
strict-supplementalgroups-policy.yaml
apiVersion: v1 kind: Pod metadata: name: strict-supplementalgroups-policy spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] supplementalGroupsPolicy: Strict containers:
- name: ctr image: registry.k8s.io/e2e-test-images/agnhost:2.45 command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false
Create the Pod:
$ kubectl apply -f https://k8s.io/blog/2024-08-22-Fine-grained-SupplementalGroups-control/strict-supplementalgroups-policy.yaml
Verify that the Pod's Container is running:
$ kubectl get pod strict-supplementalgroups-policy
Check the process identity:
kubectl exec -it strict-supplementalgroups-policy -- id
The output should be similar to this:
uid=1000 gid=3000 groups=3000,4000
You can see Strict policy can exclude group 50000 from groups!
Thus, ensuring supplementalGroupsPolicy: Strict (enforced by some policy mechanism) helps prevent the implicit supplementary groups in a Pod.
Note: Actually, this is not enough because container with sufficient privileges / capability can change its process identity. Please see the following section for details.
Attached process identity in Pod status
This feature also exposes the process identity attached to the first container process of the container via .status.containerStatuses[].user.linux field. It would be helpful to see if implicit group IDs are attached.
... status: containerStatuses:
- name: ctr user: linux: gid: 3000 supplementalGroups:
- 3000
- 4000 uid: 1000 ...
Note: Please note that the values in status.containerStatuses[].user.linux field is the firstly attached process identity to the first container process in the container. If the container has sufficient privilege to call system calls related to process identity (e.g. setuid(2), setgid(2) or setgroups(2), etc.), the container process can change its identity. Thus, the actual process identity will be dynamic.
Feature availability
To enable supplementalGroupsPolicy field, the following components have to be used:
Kubernetes: v1.31 or later, with the SupplementalGroupsPolicy feature gate enabled. As of v1.31, the gate is marked as alpha.
CRI runtime:
containerd: v2.0 or later
CRI-O: v1.31 or later
You can see if the feature is supported in the Node's .status.features.supplementalGroupsPolicy field.
apiVersion: v1 kind: Node ... status: features: supplementalGroupsPolicy: true
What's next?
Kubernetes SIG Node hope - and expect - that the feature will be promoted to beta and eventually general availability (GA) in future releases of Kubernetes, so that users no longer need to enable the feature gate manually.
Merge policy is applied when supplementalGroupsPolicy is not specified, for backwards compatibility.
How can I learn more?
Configure a Security Context for a Pod or Container for the further details of supplementalGroupsPolicy
KEP-3619: Fine-grained SupplementalGroups control
How to get involved?
This feature is driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you!
via Kubernetes Blog https://kubernetes.io/
August 21, 2024 at 08:00PM
Kubernetes v1.31: New Kubernetes CPUManager Static Policy: Distribute CPUs Across Cores
https://kubernetes.io/blog/2024/08/22/cpumanager-static-policy-distributed-cpu-across-cores/
In Kubernetes v1.31, we are excited to introduce a significant enhancement to CPU management capabilities: the distribute-cpus-across-cores option for the CPUManager static policy. This feature is currently in alpha and hidden by default, marking a strategic shift aimed at optimizing CPU utilization and improving system performance across multi-core processors.
Understanding the feature
Traditionally, Kubernetes' CPUManager tends to allocate CPUs as compactly as possible, typically packing them onto the fewest number of physical cores. However, allocation strategy matters, CPUs on the same physical host still share some resources of the physical core, such as the cache and execution units, etc.
While default approach minimizes inter-core communication and can be beneficial under certain scenarios, it also poses a challenge. CPUs sharing a physical core can lead to resource contention, which in turn may cause performance bottlenecks, particularly noticeable in CPU-intensive applications.
The new distribute-cpus-across-cores feature addresses this issue by modifying the allocation strategy. When enabled, this policy option instructs the CPUManager to spread out the CPUs (hardware threads) across as many physical cores as possible. This distribution is designed to minimize contention among CPUs sharing the same physical core, potentially enhancing the performance of applications by providing them dedicated core resources.
Technically, within this static policy, the free CPU list is reordered in the manner depicted in the diagram, aiming to allocate CPUs from separate physical cores.
Enabling the feature
To enable this feature, users firstly need to add --cpu-manager-policy=static kubelet flag or the cpuManagerPolicy: static field in KubeletConfiuration. Then user can add --cpu-manager-policy-options distribute-cpus-across-cores=true or distribute-cpus-across-cores=true to their CPUManager policy options in the Kubernetes configuration or. This setting directs the CPUManager to adopt the new distribution strategy. It is important to note that this policy option cannot currently be used in conjunction with full-pcpus-only or distribute-cpus-across-numa options.
Current limitations and future directions
As with any new feature, especially one in alpha, there are limitations and areas for future improvement. One significant current limitation is that distribute-cpus-across-cores cannot be combined with other policy options that might conflict in terms of CPU allocation strategies. This restriction can affect compatibility with certain workloads and deployment scenarios that rely on more specialized resource management.
Looking forward, we are committed to enhancing the compatibility and functionality of the distribute-cpus-across-cores option. Future updates will focus on resolving these compatibility issues, allowing this policy to be combined with other CPUManager policies seamlessly. Our goal is to provide a more flexible and robust CPU allocation framework that can adapt to a variety of workloads and performance demands.
Conclusion
The introduction of the distribute-cpus-across-cores policy in Kubernetes CPUManager is a step forward in our ongoing efforts to refine resource management and improve application performance. By reducing the contention on physical cores, this feature offers a more balanced approach to CPU resource allocation, particularly beneficial for environments running heterogeneous workloads. We encourage Kubernetes users to test this new feature and provide feedback, which will be invaluable in shaping its future development.
This draft aims to clearly explain the new feature while setting expectations for its current stage and future improvements.
Further reading
Please check out the Control CPU Management Policies on the Node task page to learn more about the CPU Manager, and how it fits in relation to the other node-level resource managers.
Getting involved
This feature is driven by the SIG Node. If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please attend the SIG Node meeting for more details.
via Kubernetes Blog https://kubernetes.io/
August 21, 2024 at 08:00PM
Week Ending August 18, 2024
https://lwkd.info/2024/20240820
Developer News
The Steering Committee nominations are open until August 24. Currently there are four candidates running for three seats. If you are a candidate, or thinking of running, join current Steering members for a Q&A.
All Kubernetes GitHub orgs have been moved under our enterprise account. However, the Prow migration starts August 21, so subprojects should hold off on releases until it’s complete.
Release Schedule
Next Deadline: 1.32 cycle begins, September 9
Kubernetes v1.31.0 is live and the latest! The 1.32 release cycle will begin soon, with Release Team Lead Federico Muñoz.
The latest patch releases v1.28.13, v1.29.8 and v1.30.4 are available.
The Release Team Shadow applications are now live. This form will close on Friday, September 06, 2024. Selected applicants will be notified by the end of the day, Friday, September 13, 2024.
KEP of the Week
KEP 3866: Add an nftables-based kube-proxy backend
The KEP creates a new nftables backend for kube-proxy on Linux to replace the current iptables and ipvs backends. iptables, the default backend, suffers from unfixable performance issues, such as slow rule updates and degraded packet processing as the ruleset grows. While it is hoped that this backend will eventually replace both the iptables and ipvs backends and become the default kube-proxy mode on Linux, that replacement/deprecation would be handled in a separate future KEP.
This KEP is tracked for beta release in the upcoming v1.31.
Other Merges
NodeToStatus map is now a struct (should it be “NodeToStatusStruct”?), which requires changes to all PostFilter plugins
All Feature Gates will be added as featuregate.VersionedSpecs to support control plane versioning
PVC Protection Controller is faster thanks to batch processing
DisableNodeKubeProxyVersion was enabled too soon, so back to defaut disabled
Show image volumes for pods that have them
Regression fix: honor --version Build ID overrides
Prevent preemption pod deletion fail
Use AllocatedResources so that users can recover from node expansion failure
Allow orphan pod processors to speed up concurrent job tracking completion
Disallow extra namespaced keys in structured auth config
Kubeadm gets a validation warning for misconfigured cert periods
kube-proxy waits for all caches to be synced
PriorityClass displays preemptionPolicy
hostNetwork no longer depends on PodIPs being assigned
Node Monitor Grace Period is 50 seconds
Stop retrying the watcher if it doesn’t have permission to watch
Adding PVCs has a queueing hint
New Tests: NodeGetVolumeStats
Stuctured Logging Migration: CSI translation lib
Promotions
kubeadm etcd Learner Mode to GA
Deprecated
Remove Graduated Feature Gates: KMSv2
Version Updates
CoreDNS to 1.11.3
via Last Week in Kubernetes Development https://lwkd.info/
August 20, 2024 at 06:00PM
Kubernetes: rise of a global Open Source movement
In this video, we delve into the story of Kubernetes, from its roots at Google to becoming a cornerstone of the open source and cloud native movements. Discover…
August 21, 2024 at 02:10PM
via Instapaper
S3 Sponsorship Extension - More Resources to Build a More Sustainable Nix
Extended S3 Sponsorship! I’m thrilled to share some great news that hope fuels the rest of the week positively. The AWS Open Source team has once again stepped…
August 21, 2024 at 01:55PM
via Instapaper
Bankers Have Lost So Much Money Thanks to Elon’s Terrible Twitter Deal
Everyone knew Elon Musk was overpaying for Twitter when he bought the social media platform back in 2022. That’s precisely why the billionaire tried to back out of the deal before being forced to finalize the purchase after a court order.
Tags:
via Pocket https://gizmodo.com/bankers-have-lost-so-much-money-thanks-to-elons-terrible-twitter-deal-2000489136
August 21, 2024 at 10:41AM
This year’s summer COVID wave is big; FDA may green-light COVID shots early
With the country experiencing a relatively large summer wave of COVID-19, the Food and Drug Administration is considering signing off on this year's strain-matched COVID-19 vaccines as soon as this week, according to a report by CNN that cited unnamed officials familiar with the matter.
Tags:
August 21, 2024 at 09:40AM