How to Terminate Go Programs Elegantly – A Guide to Graceful Shutdowns
August 21, 2024 at 09:29AM
via Instapaper
How to Terminate Go Programs Elegantly – A Guide to Graceful Shutdowns
August 21, 2024 at 09:29AM
via Instapaper
I just want mTLS on Kubernetes
A common phrase when talking to Kubernetes users is "I just want all my traffic mTLS encrypted on Kubernetes." Occasionally, this comes with some additional…
August 21, 2024 at 09:28AM
via Instapaper
Installing Karpenter: Lessons Learned From Our Experience
But before getting started, let's explain what Karpenter is... AWS Karpenter is an open-source, flexible, high-performance Kubernetes cluster autoscaler. It was…
August 21, 2024 at 09:28AM
via Instapaper
Kubernetes 1.31: Autoconfiguration For Node Cgroup Driver (beta)
https://kubernetes.io/blog/2024/08/21/cri-cgroup-driver-lookup-now-beta/
Historically, configuring the correct cgroup driver has been a pain point for users running new Kubernetes clusters. On Linux systems, there are two different cgroup drivers: cgroupfs and systemd. In the past, both the kubelet and CRI implementation (like CRI-O or containerd) needed to be configured to use the same cgroup driver, or else the kubelet would exit with an error. This was a source of headaches for many cluster admins. However, there is light at the end of the tunnel!
Automated cgroup driver detection
In v1.28.0, the SIG Node community introduced the feature gate KubeletCgroupDriverFromCRI, which instructs the kubelet to ask the CRI implementation which cgroup driver to use. A few minor releases of Kubernetes happened whilst we waited for support to land in the major two CRI implementations (containerd and CRI-O), but as of v1.31.0, this feature is now beta!
In addition to setting the feature gate, a cluster admin needs to ensure their CRI implementation is new enough:
containerd: Support was added in v2.0.0
CRI-O: Support was added in v1.28.0
Then, they should ensure their CRI implementation is configured to the cgroup_driver they would like to use.
Future work
Eventually, support for the kubelet's cgroupDriver configuration field will be dropped, and the kubelet will fail to start if the CRI implementation isn't new enough to have support for this feature.
via Kubernetes Blog https://kubernetes.io/
August 20, 2024 at 08:00PM
Who needs GitHub Copilot when you roll your own
Hands on Code assistants have gained considerable attention as an early use case for generative AI – especially following the launch of Microsoft's GitHub…
August 20, 2024 at 10:43AM
via Instapaper
The Window-Knocking Machine Test · ines.io
AI is making futurists of us all. With the dizzying speed of new innovations, it’s clear that our lives and work are going to change. So what’s next? How will…
August 20, 2024 at 10:38AM
via Instapaper
continuedev/continue: ⏩ Continue is the leading open-source AI code assistant. You can connect any models and any context to build custom autocomplete and chat experiences inside VS Code and JetBrains
August 20, 2024 at 09:47AM
via Instapaper
Kubernetes 1.31: Streaming Transitions from SPDY to WebSockets
https://kubernetes.io/blog/2024/08/20/websockets-transition/
In Kubernetes 1.31, by default kubectl now uses the WebSocket protocol instead of SPDY for streaming.
This post describes what these changes mean for you and why these streaming APIs matter.
Streaming APIs in Kubernetes
In Kubernetes, specific endpoints that are exposed as an HTTP or RESTful interface are upgraded to streaming connections, which require a streaming protocol. Unlike HTTP, which is a request-response protocol, a streaming protocol provides a persistent connection that's bi-directional, low-latency, and lets you interact in real-time. Streaming protocols support reading and writing data between your client and the server, in both directions, over the same connection. This type of connection is useful, for example, when you create a shell in a running container from your local workstation and run commands in the container.
Why change the streaming protocol?
Before the v1.31 release, Kubernetes used the SPDY/3.1 protocol by default when upgrading streaming connections. SPDY/3.1 has been deprecated for eight years, and it was never standardized. Many modern proxies, gateways, and load balancers no longer support the protocol. As a result, you might notice that commands like kubectl cp, kubectl attach, kubectl exec, and kubectl port-forward stop working when you try to access your cluster through a proxy or gateway.
As of Kubernetes v1.31, SIG API Machinery has modified the streaming protocol that a Kubernetes client (such as kubectl) uses for these commands to the more modern WebSocket streaming protocol. The WebSocket protocol is a currently supported standardized streaming protocol that guarantees compatibility and interoperability with different components and programming languages. The WebSocket protocol is more widely supported by modern proxies and gateways than SPDY.
How streaming APIs work
Kubernetes upgrades HTTP connections to streaming connections by adding specific upgrade headers to the originating HTTP request. For example, an HTTP upgrade request for running the date command on an nginx container within a cluster is similar to the following:
$ kubectl exec -v=8 nginx -- date GET https://127.0.0.1:43251/api/v1/namespaces/default/pods/nginx/exec?command=date… Request Headers: Connection: Upgrade Upgrade: websocket Sec-Websocket-Protocol: v5.channel.k8s.io User-Agent: kubectl/v1.31.0 (linux/amd64) kubernetes/6911225
If the container runtime supports the WebSocket streaming protocol and at least one of the subprotocol versions (e.g. v5.channel.k8s.io), the server responds with a successful 101 Switching Protocols status, along with the negotiated subprotocol version:
Response Status: 101 Switching Protocols in 3 milliseconds Response Headers: Upgrade: websocket Connection: Upgrade Sec-Websocket-Accept: j0/jHW9RpaUoGsUAv97EcKw8jFM= Sec-Websocket-Protocol: v5.channel.k8s.io
At this point the TCP connection used for the HTTP protocol has changed to a streaming connection. Subsequent STDIN, STDOUT, and STDERR data (as well as terminal resizing data and process exit code data) for this shell interaction is then streamed over this upgraded connection.
How to use the new WebSocket streaming protocol
If your cluster and kubectl are on version 1.29 or later, there are two control plane feature gates and two kubectl environment variables that govern the use of the WebSockets rather than SPDY. In Kubernetes 1.31, all of the following feature gates are in beta and are enabled by default:
Feature gates
TranslateStreamCloseWebsocketRequests
.../exec
.../attach
PortForwardWebsockets
.../port-forward
kubectl feature control environment variables
KUBECTL_REMOTE_COMMAND_WEBSOCKETS
kubectl exec
kubectl cp
kubectl attach
KUBECTL_PORT_FORWARD_WEBSOCKETS
kubectl port-forward
If you're connecting to an older cluster but can manage the feature gate settings, turn on both TranslateStreamCloseWebsocketRequests (added in Kubernetes v1.29) and PortForwardWebsockets (added in Kubernetes v1.30) to try this new behavior. Version 1.31 of kubectl can automatically use the new behavior, but you do need to connect to a cluster where the server-side features are explicitly enabled.
Learn more about streaming APIs
KEP 4006 - Transitioning from SPDY to WebSockets
RFC 6455 - The WebSockets Protocol
Container Runtime Interface streaming explained
via Kubernetes Blog https://kubernetes.io/
August 19, 2024 at 08:00PM
(21) Post | Feed | LinkedIn
1 notification total Repost successful. View repost…
August 19, 2024 at 11:24AM
via Instapaper
Tags:
via Pocket https://www.reuters.com/world/uk/black-britons-uk-riots-leave-lasting-scars-2024-08-19/
August 19, 2024 at 09:37AM
Tags:
August 19, 2024 at 09:37AM
The Dark Side of Open Source: Are We All Just Selfish?
Open-source software is often seen as a free-for-all, but the reality is more complex. Many companies invest heavily in open source projects as a go-to-market strategy, paying full-time maintainers to ensure project success. This video explores the motivations behind open source, the role of big companies like Google and AWS, and the impact of license changes by companies like MongoDB and HashiCorp. Discover why no open-source project should be owned by a single company and the benefits of foundation-owned projects like Kubernetes and Linux. Learn how you can contribute to and support the open source ecosystem.
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ Twitter: https://twitter.com/vfarcic ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=4l_kK90khNA
Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA
https://kubernetes.io/blog/2024/08/19/kubernetes-1-31-pod-failure-policy-for-jobs-goes-ga/
This post describes Pod failure policy, which graduates to stable in Kubernetes 1.31, and how to use it in your Jobs.
About Pod failure policy
When you run workloads on Kubernetes, Pods might fail for a variety of reasons. Ideally, workloads like Jobs should be able to ignore transient, retriable failures and continue running to completion.
To allow for these transient failures, Kubernetes Jobs include the backoffLimit field, which lets you specify a number of Pod failures that you're willing to tolerate during Job execution. However, if you set a large value for the backoffLimit field and rely solely on this field, you might notice unnecessary increases in operating costs as Pods restart excessively until the backoffLimit is met.
This becomes particularly problematic when running large-scale Jobs with thousands of long-running Pods across thousands of nodes.
The Pod failure policy extends the backoff limit mechanism to help you reduce costs in the following ways:
Gives you control to fail the Job as soon as a non-retriable Pod failure occurs.
Allows you to ignore retriable errors without increasing the backoffLimit field.
For example, you can use a Pod failure policy to run your workload on more affordable spot machines by ignoring Pod failures caused by graceful node shutdown.
The policy allows you to distinguish between retriable and non-retriable Pod failures based on container exit codes or Pod conditions in a failed Pod.
How it works
You specify a Pod failure policy in the Job specification, represented as a list of rules.
For each rule you define match requirements based on one of the following properties:
Container exit codes: the onExitCodes property.
Pod conditions: the onPodConditions property.
Additionally, for each rule, you specify one of the following actions to take when a Pod matches the rule:
Ignore: Do not count the failure towards the backoffLimit or backoffLimitPerIndex.
FailJob: Fail the entire Job and terminate all running Pods.
FailIndex: Fail the index corresponding to the failed Pod. This action works with the Backoff limit per index feature.
Count: Count the failure towards the backoffLimit or backoffLimitPerIndex. This is the default behavior.
When Pod failures occur in a running Job, Kubernetes matches the failed Pod status against the list of Pod failure policy rules, in the specified order, and takes the corresponding actions for the first matched rule.
Note that when specifying the Pod failure policy, you must also set the Job's Pod template with restartPolicy: Never. This prevents race conditions between the kubelet and the Job controller when counting Pod failures.
Kubernetes-initiated Pod disruptions
To allow matching Pod failure policy rules against failures caused by disruptions initiated by Kubernetes, this feature introduces the DisruptionTarget Pod condition.
Kubernetes adds this condition to any Pod, regardless of whether it's managed by a Job controller, that fails because of a retriable disruption scenario. The DisruptionTarget condition contains one of the following reasons that corresponds to these disruption scenarios:
PreemptionByKubeScheduler: Preemption by kube-scheduler to accommodate a new Pod that has a higher priority.
DeletionByTaintManager - the Pod is due to be deleted by kube-controller-manager due to a NoExecute taint that the Pod doesn't tolerate.
EvictionByEvictionAPI - the Pod is due to be deleted by an API-initiated eviction.
DeletionByPodGC - the Pod is bound to a node that no longer exists, and is due to be deleted by Pod garbage collection.
TerminationByKubelet - the Pod was terminated by graceful node shutdown, node pressure eviction or preemption for system critical pods.
In all other disruption scenarios, like eviction due to exceeding Pod container limits, Pods don't receive the DisruptionTarget condition because the disruptions were likely caused by the Pod and would reoccur on retry.
Example
The Pod failure policy snippet below demonstrates an example use:
podFailurePolicy: rules:
In this example, the Pod failure policy does the following:
Ignores any failed Pods that have the built-in DisruptionTarget condition. These Pods don't count towards Job backoff limits.
Fails the Job if any failed Pods have the custom user-supplied ConfigIssue condition, which was added either by a custom controller or webhook.
Fails the Job if any containers exited with the exit code 42.
Counts all other Pod failures towards the default backoffLimit (or backoffLimitPerIndex if used).
Learn more
For a hands-on guide to using Pod failure policy, see Handling retriable and non-retriable pod failures with Pod failure policy
Read the documentation for Pod failure policy and Backoff limit per index
Read the documentation for Pod disruption conditions
Read the KEP for Pod failure policy
Related work
Based on the concepts introduced by Pod failure policy, the following additional work is in progress:
JobSet integration: Configurable Failure Policy API
Pod failure policy extension to add more granular failure reasons
Support for Pod failure policy via JobSet in Kubeflow Training v2
Proposal: Disrupted Pods should be removed from endpoints
Get involved
This work was sponsored by batch working group in close collaboration with the SIG Apps, and SIG Node, and SIG Scheduling communities.
If you are interested in working on new features in the space we recommend subscribing to our Slack channel and attending the regular community meetings.
Acknowledgments
I would love to thank everyone who was involved in this project over the years - it's been a journey and a joint community effort! The list below is my best-effort attempt to remember and recognize people who made an impact. Thank you!
Aldo Culquicondor for guidance and reviews throughout the process
Jordan Liggitt for KEP and API reviews
David Eads for API reviews
Maciej Szulik for KEP reviews from SIG Apps PoV
Clayton Coleman for guidance and SIG Node reviews
Sergey Kanzhelev for KEP reviews from SIG Node PoV
Dawn Chen for KEP reviews from SIG Node PoV
Daniel Smith for reviews from SIG API machinery PoV
Antoine Pelisse for reviews from SIG API machinery PoV
John Belamaric for PRR reviews
Filip Křepinský for thorough reviews from SIG Apps PoV and bug-fixing
David Porter for thorough reviews from SIG Node PoV
Jensen Lo for early requirements discussions, testing and reporting issues
Daniel Vega-Myhre for advancing JobSet integration and reporting issues
Abdullah Gharaibeh for early design discussions and guidance
Antonio Ojea for test reviews
Yuki Iwai for reviews and aligning implementation of the closely related Job features
Kevin Hannon for reviews and aligning implementation of the closely related Job features
Tim Bannister for docs reviews
Shannon Kularathna for docs reviews
Paola Cortés for docs reviews
via Kubernetes Blog https://kubernetes.io/
August 18, 2024 at 08:00PM
LEGOs labeled and bins for partial projects allocated. It’s a LEGO playroom now.
August 18, 2024 at 04:56PM
via Instagram https://instagr.am/p/C-01F5yvXDK/
.@juliemshort massively reorganized Max’s playroom a couple days ago into more of a LEGO builder space since that’s really all Max does in here.
Max asked me to come see his battlefield that he’d set up. The organizer is new to help keep minifigs sorted as the previous solution was overflowing. I went to grab a droid and had no idea where they were. Max tells me and I immediately forget.
So this is what I’m doing right now. Labeling drawers after Julie came through and made sure everything was in the right place (it wasn’t; hence the labeling). Happy Sunday! #LEGO #organization #legominifigs
August 18, 2024 at 12:43PM
via Instagram https://instagr.am/p/C-0YI-gvPmH/