Suggested Reads

Suggested Reads

54940 bookmarks
Newest
Blog: Kubernetes v1.26: Alpha support for cross-namespace storage data sources
Blog: Kubernetes v1.26: Alpha support for cross-namespace storage data sources
Author: Takafumi Takahashi (Hitachi Vantara) Kubernetes v1.26, released last month, introduced an alpha feature that lets you specify a data source for a PersistentVolumeClaim, even where the source data belong to a different namespace. With the new feature enabled, you specify a namespace in the dataSourceRef field of a new PersistentVolumeClaim. Once Kubernetes checks that access is OK, the new PersistentVolume can populate its data from the storage source specified in that other namespace. Before Kubernetes v1.26, provided your cluster had the AnyVolumeDataSource feature enabled, you could already provision new volumes from a data source in the same namespace. However, that only worked for the data source in the same namespace, therefore users couldn't provision a PersistentVolume with a claim in one namespace from a data source in other namespace. To solve this problem, Kubernetes v1.26 added a new alpha namespace field to dataSourceRef field in PersistentVolumeClaim the API. How it works Once the csi-provisioner finds that a data source is specified with a dataSourceRef that has a non-empty namespace name, it checks all reference grants within the namespace that's specified by the.spec.dataSourceRef.namespace field of the PersistentVolumeClaim, in order to see if access to the data source is allowed. If any ReferenceGrant allows access, the csi-provisioner provisions a volume from the data source. Trying it out The following things are required to use cross namespace volume provisioning: Enable the AnyVolumeDataSource and CrossNamespaceVolumeDataSource feature gates for the kube-apiserver and kube-controller-manager Install a CRD for the specific VolumeSnapShot controller Install the CSI Provisioner controller and enable the CrossNamespaceVolumeDataSource feature gate Install the CSI driver Install a CRD for ReferenceGrants Putting it all together To see how this works, you can install the sample and try it out. This sample do to create PVC in dev namespace from VolumeSnapshot in prod namespace. That is a simple example. For real world use, you might want to use a more complex approach. Assumptions for this example Your Kubernetes cluster was deployed with AnyVolumeDataSource and CrossNamespaceVolumeDataSource feature gates enabled There are two namespaces, dev and prod CSI driver is being deployed There is an existing VolumeSnapshot named new-snapshot-demo in the prod namespace The ReferenceGrant CRD (from the Gateway API project) is already deployed Grant ReferenceGrants read permission to the CSI Provisioner Access to ReferenceGrants is only needed when the CSI driver has the CrossNamespaceVolumeDataSource controller capability. For this example, the external-provisioner needs get , list , and watch permissions for referencegrants (API group gateway.networking.k8s.io ). - apiGroups : ["gateway.networking.k8s.io" ] resources : ["referencegrants" ] verbs : ["get" , "list" , "watch" ] Enable the CrossNamespaceVolumeDataSource feature gate for the CSI Provisioner Add --feature-gates=CrossNamespaceVolumeDataSource=true to the csi-provisioner command line. For example, use this manifest snippet to redefine the container: - args : - -v=5 - --csi-address=/csi/csi.sock - --feature-gates=Topology=true - --feature-gates=CrossNamespaceVolumeDataSource=true image : csi-provisioner:latest imagePullPolicy : IfNotPresent name : csi-provisioner Create a ReferenceGrant Here's a manifest for an example ReferenceGrant. apiVersion : gateway.networking.k8s.io/v1beta1 kind : ReferenceGrant metadata : name : allow-prod-pvc namespace : prod spec : from : - group : "" kind : PersistentVolumeClaim namespace : dev to : - group : snapshot.storage.k8s.io kind : VolumeSnapshot name : new-snapshot-demo Create a PersistentVolumeClaim by using cross namespace data source Kubernetes creates a PersistentVolumeClaim on dev and the CSI driver populates the PersistentVolume used on dev from snapshots on prod. apiVersion : v1 kind : PersistentVolumeClaim metadata : name : example-pvc namespace : dev spec : storageClassName : example accessModes : - ReadWriteOnce resources : requests : storage : 1Gi dataSourceRef : apiGroup : snapshot.storage.k8s.io kind : VolumeSnapshot name : new-snapshot-demo namespace : prod volumeMode : Filesystem How can I learn more? The enhancement proposal, Provision volumes from cross-namespace snapshots , includes lots of detail about the history and technical implementation of this feature. Please get involved by joining the Kubernetes Storage Special Interest Group (SIG) to help us enhance this feature. There are a lot of good ideas already and we'd be thrilled to have more! Acknowledgments It takes a wonderful group to make wonderful software. Special thanks to the following people for the insightful reviews, thorough consideration and valuable contribution to the CrossNamespaceVolumeDataSouce feature: Michelle Au (msau42) Xing Yang (xing-yang) Masaki Kimura (mkimuram) Tim Hockin (thockin) Ben Swartzlander (bswartz) Rob Scott (robscott) John Griffith (j-griffith) Michael Henriksen (mhenriks) Mustafa Elbehery (Elbehery) It’s been a joy to work with y'all on this.
·kubernetes.io·
Blog: Kubernetes v1.26: Alpha support for cross-namespace storage data sources
Fediverse Observer
Fediverse Observer
Fediverse Servers Status. Find a Fediverse server to sign up for, find one close to you!
·fediverse.observer·
Fediverse Observer
Release v1.18.0 · go-gitea/gitea
Release v1.18.0 · go-gitea/gitea
Changelog SECURITY Remove ReverseProxy authentication from the API (#22219) (#22251) Support Go Vulnerability Management (#21139) Forbid HTML string tooltips (#20935) BREAKING Rework mailer se...
·github.com·
Release v1.18.0 · go-gitea/gitea
Whatever happened to SHA-256 support in Git?
Whatever happened to SHA-256 support in Git?
The news has been proclaimed loudly and often: the SHA-1 hash algorithm is terminally broken and should not be used in any situation where security matters. Among other things, this news gave some impetus to the longstanding effort to support a more robust hash algorithm in the Git source-code management system. As time has passed, though, that work seems to have slowed to a stop, leaving some users wondering when, if ever, Git will support a hash algorithm other than SHA-1.
·lwn.net·
Whatever happened to SHA-256 support in Git?
Vanilla OS
Vanilla OS
Vanilla OS is an Immutable Linux-based distribution which aims to provide a vanilla GNOME experience.
·vanillaos.org·
Vanilla OS
36 Things I Learned in 2022
36 Things I Learned in 2022
Inspired by Tom Whitwell's annual list (here is 2022's), I kept a list of interesting things I learned this year. There are suppos
·kottke.org·
36 Things I Learned in 2022
Blog: Kubernetes v1.26: Advancements in Kubernetes Traffic Engineering
Blog: Kubernetes v1.26: Advancements in Kubernetes Traffic Engineering
Authors: Andrew Sy Kim (Google) Kubernetes v1.26 includes significant advancements in network traffic engineering with the graduation of two features (Service internal traffic policy support, and EndpointSlice terminating conditions) to GA, and a third feature (Proxy terminating endpoints) to beta. The combination of these enhancements aims to address short-comings in traffic engineering that people face today, and unlock new capabilities for the future. Traffic Loss from Load Balancers During Rolling Updates Prior to Kubernetes v1.26, clusters could experience loss of traffic from Service load balancers during rolling updates when setting the externalTrafficPolicy field to Local . There are a lot of moving parts at play here so a quick overview of how Kubernetes manages load balancers might help! In Kubernetes, you can create a Service with type: LoadBalancer to expose an application externally with a load balancer. The load balancer implementation varies between clusters and platforms, but the Service provides a generic abstraction representing the load balancer that is consistent across all Kubernetes installations. apiVersion : v1 kind : Service metadata : name : my-service spec : selector : app.kubernetes.io/name : my-app ports : - protocol : TCP port : 80 targetPort : 9376 type : LoadBalancer Under the hood, Kubernetes allocates a NodePort for the Service, which is then used by kube-proxy to provide a network data path from the NodePort to the Pod. A controller will then add all available Nodes in the cluster to the load balancer’s backend pool, using the designated NodePort for the Service as the backend target port. Figure 1: Overview of Service load balancers Oftentimes it is beneficial to set externalTrafficPolicy: Local for Services, to avoid extra hops between Nodes that are not running healthy Pods backing that Service. When using externalTrafficPolicy: Local , an additional NodePort is allocated for health checking purposes, such that Nodes that do not contain healthy Pods are excluded from the backend pool for a load balancer. Figure 2: Load balancer traffic to a healthy Node, when externalTrafficPolicy is Local One such scenario where traffic can be lost is when a Node loses all Pods for a Service, but the external load balancer has not probed the health check NodePort yet. The likelihood of this situation is largely dependent on the health checking interval configured on the load balancer. The larger the interval, the more likely this will happen, since the load balancer will continue to send traffic to a node even after kube-proxy has removed forwarding rules for that Service. This also occurrs when Pods start terminating during rolling updates. Since Kubernetes does not consider terminating Pods as “Ready”, traffic can be loss when there are only terminating Pods on any given Node during a rolling update. Figure 3: Load balancer traffic to terminating endpoints, when externalTrafficPolicy is Local Starting in Kubernetes v1.26, kube-proxy enables the ProxyTerminatingEndpoints feature by default, which adds automatic failover and routing to terminating endpoints in scenarios where the traffic would otherwise be dropped. More specifically, when there is a rolling update and a Node only contains terminating Pods, kube-proxy will route traffic to the terminating Pods based on their readiness. In addition, kube-proxy will actively fail the health check NodePort if there are only terminating Pods available. By doing so, kube-proxy alerts the external load balancer that new connections should not be sent to that Node but will gracefully handle requests for existing connections. Figure 4: Load Balancer traffic to terminating endpoints with ProxyTerminatingEndpoints enabled, when externalTrafficPolicy is Local EndpointSlice Conditions In order to support this new capability in kube-proxy, the EndpointSlice API introduced new conditions for endpoints: serving and terminating . Figure 5: Overview of EndpointSlice conditions The serving condition is semantically identical to ready , except that it can be true or false while a Pod is terminating, unlike ready which will always be false for terminating Pods for compatibility reasons. The terminating condition is true for Pods undergoing termination (non-empty deletionTimestamp), false otherwise. The addition of these two conditions enables consumers of this API to understand Pod states that were previously not possible. For example, we can now track "ready" and "not ready" Pods that are also terminating. Figure 6: EndpointSlice conditions with a terminating Pod Consumers of the EndpointSlice API, such as Kube-proxy and Ingress Controllers, can now use these conditions to coordinate connection draining events, by continuing to forward traffic for existing connections but rerouting new connections to other non-terminating endpoints. Optimizing Internal Node-Local Traffic Similar to how Services can set externalTrafficPolicy: Local to avoid extra hops for externally sourced traffic, Kubernetes now supports internalTrafficPolicy: Local , to enable the same optimization for traffic originating within the cluster, specifically for traffic using the Service Cluster IP as the destination address. This feature graduated to Beta in Kubernetes v1.24 and is graduating to GA in v1.26. Services default the internalTrafficPolicy field to Cluster , where traffic is randomly distributed to all endpoints. Figure 7: Service routing when internalTrafficPolicy is Cluster When internalTrafficPolicy is set to Local , kube-proxy will forward internal traffic for a Service only if there is an available endpoint that is local to the same Node. Figure 8: Service routing when internalTrafficPolicy is Local Caution: When using internalTrafficPoliy: Local , traffic will be dropped by kube-proxy when no local endpoints are available. Getting Involved If you're interested in future discussions on Kubernetes traffic engineering, you can get involved in SIG Network through the following ways: Slack: #sig-network Mailing list Open Community Issues/PRs Biweekly meetings
·kubernetes.io·
Blog: Kubernetes v1.26: Advancements in Kubernetes Traffic Engineering
John Minnihan and the genesis of hosted source control
John Minnihan and the genesis of hosted source control
John Minnihan walks through the creation of Freepository, the first hosted source control service, how it paved the way for lowering the barrier to participation in open source, and evolution of version control systems.
·opensourcestories.org·
John Minnihan and the genesis of hosted source control
Yin Wu on Twitter
Yin Wu on Twitter
There's a lot of advice on how to scale a startup...But there's little on how to shut down a company when things aren't working.Here are the tactics for dissolving a startup, and what to tell investors to get them to fund your next bet 👇— Yin Wu (@yinyinwu) December 27, 2022
·twitter.com·
Yin Wu on Twitter
Blog: Kubernetes 1.26: Job Tracking, to Support Massively Parallel Batch Workloads, Is Generally Available
Blog: Kubernetes 1.26: Job Tracking, to Support Massively Parallel Batch Workloads, Is Generally Available
Authors: Aldo Culquicondor (Google) The Kubernetes 1.26 release includes a stable implementation of the Job controller that can reliably track a large amount of Jobs with high levels of parallelism. SIG Apps and WG Batch have worked on this foundational improvement since Kubernetes 1.22. After multiple iterations and scale verifications, this is now the default implementation of the Job controller. Paired with the Indexed completion mode , the Job controller can handle massively parallel batch Jobs, supporting up to 100k concurrent Pods. The new implementation also made possible the development of Pod failure policy , which is in beta in the 1.26 release. How do I use this feature? To use Job tracking with finalizers, upgrade to Kubernetes 1.25 or newer and create new Jobs. You can also use this feature in v1.23 and v1.24, if you have the ability to enable the JobTrackingWithFinalizers feature gate . If your cluster runs Kubernetes 1.26, Job tracking with finalizers is a stable feature. For v1.25, it's behind that feature gate, and your cluster administrators may have explicitly disabled it - for example, if you have a policy of not using beta features. Jobs created before the upgrade will still be tracked using the legacy behavior. This is to avoid retroactively adding finalizers to running Pods, which might introduce race conditions. For maximum performance on large Jobs, the Kubernetes project recommends using the Indexed completion mode . In this mode, the control plane is able to track Job progress with less API calls. If you are a developer of operator(s) for batch, HPC , AI , ML or related workloads, we encourage you to use the Job API to delegate accurate progress tracking to Kubernetes. If there is something missing in the Job API that forces you to manage plain Pods, the Working Group Batch welcomes your feedback and contributions. Deprecation notices During the development of the feature, the control plane added the annotation batch.kubernetes.io/job-tracking to the Jobs that were created when the feature was enabled. This allowed a safe transition for older Jobs, but it was never meant to stay. In the 1.26 release, we deprecated the annotation batch.kubernetes.io/job-tracking and the control plane will stop adding it in Kubernetes 1.27. Along with that change, we will remove the legacy Job tracking implementation. As a result, the Job controller will track all Jobs using finalizers and it will ignore Pods that don't have the aforementioned finalizer. Before you upgrade your cluster to 1.27, we recommend that you verify that there are no running Jobs that don't have the annotation, or you wait for those jobs to complete. Otherwise, you might observe the control plane recreating some Pods. We expect that this shouldn't affect any users, as the feature is enabled by default since Kubernetes 1.25, giving enough buffer for old jobs to complete. What problem does the new implementation solve? Generally, Kubernetes workload controllers, such as ReplicaSet or StatefulSet, rely on the existence of Pods or other objects in the API to determine the status of the workload and whether replacements are needed. For example, if a Pod that belonged to a ReplicaSet terminates or ceases to exist, the ReplicaSet controller needs to create a replacement Pod to satisfy the desired number of replicas (.spec.replicas ). Since its inception, the Job controller also relied on the existence of Pods in the API to track Job status. A Job has completion and failure handling policies, requiring the end state of a finished Pod to determine whether to create a replacement Pod or mark the Job as completed or failed. As a result, the Job controller depended on Pods, even terminated ones, to remain in the API in order to keep track of the status. This dependency made the tracking of Job status unreliable, because Pods can be deleted from the API for a number of reasons, including: The garbage collector removing orphan Pods when a Node goes down. The garbage collector removing terminated Pods when they reach a threshold. The Kubernetes scheduler preempting a Pod to accomodate higher priority Pods. The taint manager evicting a Pod that doesn't tolerate a NoExecute taint. External controllers, not included as part of Kubernetes, or humans deleting Pods. The new implementation When a controller needs to take an action on objects before they are removed, it should add a finalizer to the objects that it manages. A finalizer prevents the objects from being deleted from the API until the finalizers are removed. Once the controller is done with the cleanup and accounting for the deleted object, it can remove the finalizer from the object and the control plane removes the object from the API. This is what the new Job controller is doing: adding a finalizer during Pod creation, and removing the finalizer after the Pod has terminated and has been accounted for in the Job status. However, it wasn't that simple. The main challenge is that there are at least two objects involved: the Pod and the Job. While the finalizer lives in the Pod object, the accounting lives in the Job object. There is no mechanism to atomically remove the finalizer in the Pod and update the counters in the Job status. Additionally, there could be more than one terminated Pod at a given time. To solve this problem, we implemented a three staged approach, each translating to an API call. For each terminated Pod, add the unique ID (UID) of the Pod into short-lived lists stored in the .status of the owning Job (.status.uncountedTerminatedPods ). Remove the finalizer from the Pods(s). Atomically do the following operations: remove UIDs from the short-lived lists increment the overall succeeded and failed counters in the status of the Job. Additional complications come from the fact that the Job controller might receive the results of the API changes in steps 1 and 2 out of order. We solved this by adding an in-memory cache for removed finalizers. Still, we faced some issues during the beta stage, leaving some pods stuck with finalizers in some conditions (#108645 , #109485 , and #111646 ). As a result, we decided to switch that feature gate to be disabled by default for the 1.23 and 1.24 releases. Once resolved, we re-enabled the feature for the 1.25 release. Since then, we have received reports from our customers running tens of thousands of Pods at a time in their clusters through the Job API. Seeing this success, we decided to graduate the feature to stable in 1.26, as part of our long term commitment to make the Job API the best way to run large batch Jobs in a Kubernetes cluster. To learn more about the feature, you can read the KEP . Acknowledgments As with any Kubernetes feature, multiple people contributed to getting this done, from testing and filing bugs to reviewing code. On behalf of SIG Apps, I would like to especially thank Jordan Liggitt (Google) for helping me debug and brainstorm solutions for more than one race condition and Maciej Szulik (Red Hat) for his thorough reviews.
·kubernetes.io·
Blog: Kubernetes 1.26: Job Tracking, to Support Massively Parallel Batch Workloads, Is Generally Available
@mrbobbytables@hachyderm.io on Twitter
@mrbobbytables@hachyderm.io on Twitter
So...uhh...not to sound old and crotchety, but If you're new to containers and the cloud native ecosystem, I can't stress enough that having a fundamental understanding of Linux is a *requirement*. If you jump right into Kubernetes you're just going to have a bad time.— @mrbobbytables@hachyderm.io (@MrBobbyTables) December 27, 2022
·twitter.com·
@mrbobbytables@hachyderm.io on Twitter
Blog: Kubernetes v1.26: CPUManager goes GA
Blog: Kubernetes v1.26: CPUManager goes GA
Author: Francesco Romani (Red Hat) The CPU Manager is a part of the kubelet, the Kubernetes node agent, which enables the user to allocate exclusive CPUs to containers. Since Kubernetes v1.10, where it graduated to Beta , the CPU Manager proved itself reliable and fulfilled its role of allocating exclusive CPUs to containers, so adoption has steadily grown making it a staple component of performance-critical and low-latency setups. Over time, most changes were about bugfixes or internal refactoring, with the following noteworthy user-visible changes: support explicit reservation of CPUs : it was already possible to request to reserve a given number of CPUs for system resources, including the kubelet itself, which will not be used for exclusive CPU allocation. Now it is possible to also explicitly select which CPUs to reserve instead of letting the kubelet pick them up automatically. report the exclusively allocated CPUs to containers, much like is already done for devices, using the kubelet-local PodResources API . optimize the usage of system resources , eliminating unnecessary sysfs changes. The CPU Manager reached the point on which it "just works", so in Kubernetes v1.26 it has graduated to generally available (GA). Customization options for CPU Manager The CPU Manager supports two operation modes, configured using its policies . With the none policy, the CPU Manager allocates CPUs to containers without any specific constraint except the (optional) quota set in the Pod spec. With the static policy, then provided that the pod is in the Guaranteed QoS class and every container in that Pod requests an integer amount of vCPU cores, then the CPU Manager allocates CPUs exclusively. Exclusive assignment means that other containers (whether from the same Pod, or from a different Pod) do not get scheduled onto that CPU. This simple operational model served the user base pretty well, but as the CPU Manager matured more and more, users started to look at more elaborate use cases and how to better support them. Rather than add more policies, the community realized that pretty much all the novel use cases are some variation of the behavior enabled by the static CPU Manager policy. Hence, it was decided to add options to tune the behavior of the static policy . The options have a varying degree of maturity, like any other Kubernetes feature, and in order to be accepted, each new option provides a backward compatible behavior when disabled, and to document how to interact with each other, should they interact at all. This enabled the Kubernetes project to graduate to GA the CPU Manager core component and core CPU allocation algorithms to GA, while also enabling a new age of experimentation in this area. In Kubernetes v1.26, the CPU Manager supports three different policy options : full-pcpus-only restrict the CPU Manager core allocation algorithm to full physical cores only, reducing noisy neighbor issues from hardware technologies that allow sharing cores. distribute-cpus-across-numa drive the CPU Manager to evenly distribute CPUs across NUMA nodes, for cases where more than one NUMA node is required to satisfy the allocation. align-by-socket change how the CPU Manager allocates CPUs to a container: consider CPUs to be aligned at the socket boundary, instead of NUMA node boundary. Further development After graduating the main CPU Manager feature, each existing policy option will follow their graduation process, independent from CPU Manager and from each other option. There is room for new options to be added, but there's also a growing demand for even more flexibility than what the CPU Manager, and its policy options, currently grant. Conversations are in progress in the community about splitting the CPU Manager and the other resource managers currently part of the kubelet executable into pluggable, independent kubelet plugins. If you are interested in this effort, please join the conversation on SIG Node communication channels (Slack, mailing list, weekly meeting). Further reading Please check out the Control CPU Management Policies on the Node task page to learn more about the CPU Manager, and how it fits in relation to the other node-level resource managers. Getting involved This feature is driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you!
·kubernetes.io·
Blog: Kubernetes v1.26: CPUManager goes GA
Blog: Kubernetes 1.26: Pod Scheduling Readiness
Blog: Kubernetes 1.26: Pod Scheduling Readiness
Author: Wei Huang (Apple), Abdullah Gharaibeh (Google) Kubernetes 1.26 introduced a new Pod feature: scheduling gates . In Kubernetes, scheduling gates are keys that tell the scheduler when a Pod is ready to be considered for scheduling. What problem does it solve? When a Pod is created, the scheduler will continuously attempt to find a node that fits it. This infinite loop continues until the scheduler either finds a node for the Pod, or the Pod gets deleted. Pods that remain unschedulable for long periods of time (e.g., ones that are blocked on some external event) waste scheduling cycles. A scheduling cycle may take ≅20ms or more depending on the complexity of the Pod's scheduling constraints. Therefore, at scale, those wasted cycles significantly impact the scheduler's performance. See the arrows in the "scheduler" box below. graph LR; pod((New Pod))--queue subgraph Scheduler queue(scheduler queue) sched_cycle[/scheduling cycle/] schedulable{schedulable?} queue==|Pop out|sched_cycle sched_cycle==schedulable schedulable==|No|queue subgraph note [Cycles wasted on keep rescheduling 'unready' Pods] end end classDef plain fill:#ddd,stroke:#fff,stroke-width:1px,color:#000; classDef k8s fill:#326ce5,stroke:#fff,stroke-width:1px,color:#fff; classDef Scheduler fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5; classDef note fill:#edf2ae,stroke:#fff,stroke-width:1px; class queue,sched_cycle,schedulable k8s; class pod plain; class note note; class Scheduler Scheduler; JavaScript must be enabled to view this content Scheduling gates helps address this problem. It allows declaring that newly created Pods are not ready for scheduling. When scheduling gates are present on a Pod, the scheduler ignores the Pod and therefore saves unnecessary scheduling attempts. Those Pods will also be ignored by Cluster Autoscaler if you have it installed in the cluster. Clearing the gates is the responsibility of external controllers with knowledge of when the Pod should be considered for scheduling (e.g., a quota manager). graph LR; pod((New Pod))--queue subgraph Scheduler queue(scheduler queue) sched_cycle[/scheduling cycle/] schedulable{schedulable?} popout{Pop out?} queue==|PreEnqueue check|popout popout--|Yes|sched_cycle popout==|No|queue sched_cycle--schedulable schedulable--|No|queue subgraph note [A knob to gate Pod's scheduling] end end classDef plain fill:#ddd,stroke:#fff,stroke-width:1px,color:#000; classDef k8s fill:#326ce5,stroke:#fff,stroke-width:1px,color:#fff; classDef Scheduler fill:#fff,stroke:#bbb,stroke-width:2px,color:#326ce5; classDef note fill:#edf2ae,stroke:#fff,stroke-width:1px; classDef popout fill:#f96,stroke:#fff,stroke-width:1px; class queue,sched_cycle,schedulable k8s; class pod plain; class note note; class popout popout; class Scheduler Scheduler; JavaScript must be enabled to view this content How does it work? Scheduling gates in general works very similar to Finalizers. Pods with a non-empty spec.schedulingGates field will show as status SchedulingGated and be blocked from scheduling. Note that more than one gate can be added, but they all should be added upon Pod creation (e.g., you can add them as part of the spec or via a mutating webhook). NAME READY STATUS RESTARTS AGE test-pod 0/1 SchedulingGated 0 10s To clear the gates, you update the Pod by removing all of the items from the Pod's schedulingGates field. The gates do not need to be removed all at once, but only when all the gates are removed the scheduler will start to consider the Pod for scheduling. Under the hood, scheduling gates are implemented as a PreEnqueue scheduler plugin, a new scheduler framework extension point that is invoked at the beginning of each scheduling cycle. Use Cases An important use case this feature enables is dynamic quota management. Kubernetes supports ResourceQuota , however the API Server enforces quota at the time you attempt Pod creation. For example, if a new Pod exceeds the CPU quota, it gets rejected. The API Server doesn't queue the Pod; therefore, whoever created the Pod needs to continuously attempt to recreate it again. This either means a delay between resources becoming available and the Pod actually running, or it means load on the API server and Scheduler due to constant attempts. Scheduling gates allows an external quota manager to address the above limitation of ResourceQuota. Specifically, the manager could add a example.com/quota-check scheduling gate to all Pods created in the cluster (using a mutating webhook). The manager would then remove the gate when there is quota to start the Pod. Whats next? To use this feature, the PodSchedulingReadiness feature gate must be enabled in the API Server and scheduler. You're more than welcome to test it out and tell us (SIG Scheduling) what you think! Additional resources Pod Scheduling Readiness in the Kubernetes documentation Kubernetes Enhancement Proposal
·kubernetes.io·
Blog: Kubernetes 1.26: Pod Scheduling Readiness
CoreFreq Gives Peek At CPU Performance Info On Linux
CoreFreq Gives Peek At CPU Performance Info On Linux
The CPU is the part of the computer that makes everything else tick. While GPUs have increasingly become a key part of overall system performance, we still find ourselves wanting to know how our CP…
·hackaday.com·
CoreFreq Gives Peek At CPU Performance Info On Linux