54924 bookmarks

Newest

Impossible Puzzles Quantum Computers Dream of Solving

Imagine you’re the mayor of a bustling city. Every day you face decisions that seem impossible: how to schedule buses so no one waits too long, how to route traffic without creating new jams, how to balance electricity demand when everyone cranks up their air conditioning at once.

1_r/devopsish

·linkedin.com·Sep 3, 2025

Impossible Puzzles Quantum Computers Dream of Solving

Kubernetes v1.34: Introducing CPU Manager Static Policy Option for Uncore Cache Alignment

https://kubernetes.io/blog/2025/09/02/kubernetes-v1-34-prefer-align-by-uncore-cache-cpumanager-static-policy-optimization/

A new CPU Manager Static Policy Option called prefer-align-cpus-by-uncorecache was introduced in Kubernetes v1.32 as an alpha feature, and has graduated to beta in Kubernetes v1.34. This CPU Manager Policy Option is designed to optimize performance for specific workloads running on processors with a split uncore cache architecture. In this article, I'll explain what that means and why it's useful.

Understanding the feature

What is uncore cache?

Until relatively recently, nearly all mainstream computer processors had a monolithic last-level-cache cache that was shared across every core in a multiple CPU package. This monolithic cache is also referred to as uncore cache (because it is not linked to a specific core), or as Level 3 cache. As well as the Level 3 cache, there is other cache, commonly called Level 1 and Level 2 cache, that is associated with a specific CPU core.

In order to reduce access latency between the CPU cores and their cache, recent AMD64 and ARM architecture based processors have introduced a split uncore cache architecture, where the last-level-cache is divided into multiple physical caches, that are aligned to specific CPU groupings within the physical package. The shorter distances within the CPU package help to reduce latency.

Kubernetes is able to place workloads in a way that accounts for the cache topology within the CPU package(s).

Cache-aware workload placement

The matrix below shows the CPU-to-CPU latency measured in nanoseconds (lower is better) when passing a packet between CPUs, via its cache coherence protocol on a processor that uses split uncore cache. In this example, the processor package consists of 2 uncore caches. Each uncore cache serves 8 CPU cores.

Blue entries in the matrix represent latency between CPUs sharing the same uncore cache, while grey entries indicate latency between CPUs corresponding to different uncore caches. Latency between CPUs that correspond to different caches are higher than the latency between CPUs that belong to the same cache.

With prefer-align-cpus-by-uncorecache enabled, the static CPU Manager attempts to allocates CPU resources for a container, such that all CPUs assigned to a container share the same uncore cache. This policy operates on a best-effort basis, aiming to minimize the distribution of a container's CPU resources across uncore caches, based on the container's requirements, and accounting for allocatable resources on the node.

By running a workload, where it can, on a set of CPUS that use the smallest feasible number of uncore caches, applications benefit from reduced cache latency (as seen in the matrix above), and from reduced contention against other workloads, which can result in overall higher throughput. The benefit only shows up if your nodes use a split uncore cache topology for their processors.

The following diagram below illustrates uncore cache alignment when the feature is enabled.

By default, Kubernetes does not account for uncore cache topology; containers are assigned CPU resources using a packed methodology. As a result, Container 1 and Container 2 can experience a noisy neighbor impact due to cache access contention on Uncore Cache 0. Additionally, Container 2 will have CPUs distributed across both caches which can introduce a cross-cache latency.

With prefer-align-cpus-by-uncorecache enabled, each container is isolated on an individual cache. This resolves the cache contention between the containers and minimizes the cache latency for the CPUs being utilized.

Use cases

Common use cases can include telco applications like vRAN, Mobile Packet Core, and Firewalls. It's important to note that the optimization provided by prefer-align-cpus-by-uncorecache can be dependent on the workload. For example, applications that are memory bandwidth bound may not benefit from uncore cache alignment, as utilizing more uncore caches can increase memory bandwidth access.

Enabling the feature

To enable this feature, set the CPU Manager Policy to static and enable the CPU Manager Policy Options with prefer-align-cpus-by-uncorecache.

For Kubernetes 1.34, the feature is in the beta stage and requires the CPUManagerPolicyBetaOptions feature gate to also be enabled.

Append the following to the kubelet configuration file:

kind: KubeletConfiguration apiVersion: kubelet.config.k8s.io/v1beta1 featureGates: ... CPUManagerPolicyBetaOptions: true cpuManagerPolicy: "static" cpuManagerPolicyOptions: prefer-align-cpus-by-uncorecache: "true" reservedSystemCPUs: "0" ...

If you're making this change to an existing node, remove the cpu_manager_state file and then restart kubelet.

prefer-align-cpus-by-uncorecache can be enabled on nodes with a monolithic uncore cache processor. The feature will mimic a best-effort socket alignment effect and will pack CPU resources on the socket similar to the default static CPU Manager policy.

AIImplementation #VectorDatabases #RAG

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/stop-blaming-ai-vector-dbs-+-rag-=-game-changer 🔗 Qdrant: https://qdrant.tech

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Vector Databases for AI Agents 02:03 Outskill (sponsor) 03:34 Why AI Hallucinates About Your Code 11:15 Vector Databases for AI Context 21:47 RAG: How AI Gets Your Context 30:50 Fix Your AI Implementation Now

via YouTube https://www.youtube.com/watch?v=zqpJr1qZhTg

1_r/devopsish

·youtube.com·Sep 1, 2025

AI & DevOps Toolkit - Stop Blaming AI: Vector DBs RAG = Game Changer - https://www.youtube.com/watch?v=zqpJr1qZhTg

How to Improve CUDA Kernel Performance with Shared Memory Register Spilling | NVIDIA Technical Blog

When a CUDA kernel requires more hardware registers than are available, the compiler is forced to move the excess variables into local memory, a process known as register spilling.

1_r/devopsish

·developer.nvidia.com·Aug 30, 2025

How to Improve CUDA Kernel Performance with Shared Memory Register Spilling | NVIDIA Technical Blog

Connecticut Man's Case Believed to Be First Murder-Suicide Associated With AI Psychosis

Several suicides have been blamed on AI. This appears to be the first homicide.

1_r/devopsish

·gizmodo.com·Aug 30, 2025

Connecticut Man's Case Believed to Be First Murder-Suicide Associated With AI Psychosis

Intel’s “Clearwater Forest” Xeon 7 E-Core CPU Will Be A Beast

With AMD having attaining more than 40 percent revenue share and more than 27 percent shipment share in the X86 server CPU market in the first half of

1_r/devopsish

·nextplatform.com·Aug 30, 2025

Intel’s “Clearwater Forest” Xeon 7 E-Core CPU Will Be A Beast

OVHcloud legal eagle on Microsoft's sovereignty admission

Interview: French provider seizes on Redmond's admission that US law could override local protections

1_r/devopsish

·theregister.com·Aug 30, 2025

OVHcloud legal eagle on Microsoft's sovereignty admission

Kubernetes v1.34: Finer-Grained Control Over Container Restarts

https://kubernetes.io/blog/2025/08/29/kubernetes-v1-34-per-container-restart-policy/

With the release of Kubernetes 1.34, a new alpha feature is introduced that gives you more granular control over container restarts within a Pod. This feature, named Container Restart Policy and Rules, allows you to specify a restart policy for each container individually, overriding the Pod's global restart policy. In addition, it also allows you to conditionally restart individual containers based on their exit codes. This feature is available behind the alpha feature gate ContainerRestartRules.

This has been a long-requested feature. Let's dive into how it works and how you can use it.

The problem with a single restart policy

Before this feature, the restartPolicy was set at the Pod level. This meant that all containers in a Pod shared the same restart policy (Always, OnFailure, or Never). While this works for many use cases, it can be limiting in others.

For example, consider a Pod with a main application container and an init container that performs some initial setup. You might want the main container to always restart on failure, but the init container should only run once and never restart. With a single Pod-level restart policy, this wasn't possible.

Introducing per-container restart policies

With the new ContainerRestartRules feature gate, you can now specify a restartPolicy for each container in your Pod's spec. You can also define restartPolicyRules to control restarts based on exit codes. This gives you the fine-grained control you need to handle complex scenarios.

Use cases

Let's look at some real-life use cases where per-container restart policies can be beneficial.

In-place restarts for training jobs

In ML research, it's common to orchestrate a large number of long-running AI/ML training workloads. In these scenarios, workload failures are unavoidable. When a workload fails with a retriable exit code, you want the container to restart quickly without rescheduling the entire Pod, which consumes a significant amount of time and resources. Restarting the failed container "in-place" is critical for better utilization of compute resources. The container should only restart "in-place" if it failed due to a retriable error; otherwise, the container and Pod should terminate and possibly be rescheduled.

This can now be achieved with container-level restartPolicyRules. The workload can exit with different codes to represent retriable and non-retriable errors. With restartPolicyRules, the workload can be restarted in-place quickly, but only when the error is retriable.

Try-once init containers

Init containers are often used to perform initialization work for the main container, such as setting up environments and credentials. Sometimes, you want the main container to always be restarted, but you don't want to retry initialization if it fails.

With a container-level restartPolicy, this is now possible. The init container can be executed only once, and its failure would be considered a Pod failure. If the initialization succeeds, the main container can be always restarted.

Pods with multiple containers

For Pods that run multiple containers, you might have different restart requirements for each container. Some containers might have a clear definition of success and should only be restarted on failure. Others might need to be always restarted.

This is now possible with a container-level restartPolicy, allowing individual containers to have different restart policies.

How to use it

To use this new feature, you need to enable the ContainerRestartRules feature gate on your Kubernetes cluster control-plane and worker nodes running Kubernetes 1.34+. Once enabled, you can specify the restartPolicy and restartPolicyRules fields in your container definitions.

Here are some examples:

Example 1: Restarting on specific exit codes

In this example, the container should restart if and only if it fails with a retriable error, represented by exit code 42.

To achieve this, the container has restartPolicy: Never, and a restart policy rule that tells Kubernetes to restart the container in-place if it exits with code 42.

apiVersion: v1 kind: Pod metadata: name: restart-on-exit-codes annotations: kubernetes.io/description: "This Pod only restart the container only when it exits with code 42." spec: restartPolicy: Never containers:

name: restart-on-exit-codes image: docker.io/library/busybox:1.28 command: ['sh', '-c', 'sleep 60 && exit 0'] restartPolicy: Never # Container restart policy must be specified if rules are specified restartPolicyRules: # Only restart the container if it exits with code 42
action: Restart exitCodes: operator: In values: [42]

Example 2: A try-once init container

In this example, a Pod should always be restarted once the initialization succeeds. However, the initialization should only be tried once.

To achieve this, the Pod has an Always restart policy. The init-once init container will only try once. If it fails, the Pod will fail. This allows the Pod to fail if the initialization failed, but also keep running once the initialization succeeds.

apiVersion: v1 kind: Pod metadata: name: fail-pod-if-init-fails annotations: kubernetes.io/description: "This Pod has an init container that runs only once. After initialization succeeds, the main container will always be restarted." spec: restartPolicy: Always initContainers:

name: init-once # This init container will only try once. If it fails, the Pod will fail. image: docker.io/library/busybox:1.28 command: ['sh', '-c', 'echo "Failing initialization" && sleep 10 && exit 1'] restartPolicy: Never containers:
name: main-container # This container will always be restarted once initialization succeeds. image: docker.io/library/busybox:1.28 command: ['sh', '-c', 'sleep 1800 && exit 0']

Example 3: Containers with different restart policies

In this example, there are two containers with different restart requirements. One should always be restarted, while the other should only be restarted on failure.

This is achieved by using a different container-level restartPolicy on each of the two containers.

apiVersion: v1 kind: Pod metadata: name: on-failure-pod annotations: kubernetes.io/description: "This Pod has two containers with different restart policies." spec: containers:

name: restart-on-failure image: docker.io/library/busybox:1.28 command: ['sh', '-c', 'echo "Not restarting after success" && sleep 10 && exit 0'] restartPolicy: OnFailure
name: restart-always image: docker.io/library/busybox:1.28 command: ['sh', '-c', 'echo "Always restarting" && sleep 1800 && exit 0'] restartPolicy: Always

Learn more

Read the documentation for container restart policy.

Read the KEP for the Container Restart Rules

Roadmap

More actions and signals to restart Pods and containers are coming! Notably, there are plans to add support for restarting the entire Pod. Planning and discussions on these features are in progress. Feel free to share feedback or requests with the SIG Node community!

Your feedback is welcome!

This is an alpha feature, and the Kubernetes project would love to hear your feedback. Please try it out. This feature is driven by the SIG Node. If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please reach out to the SIG Node community!

You can reach SIG Node by several means:

Slack: #sig-node

Mailing list

Open Community Issues/PRs

via Kubernetes Blog https://kubernetes.io/

August 29, 2025 at 02:30PM

1_r/devopsish

·kubernetes.io·Aug 30, 2025

Kubernetes v1.34: Finer-Grained Control Over Container Restarts

rqlite/rqlite: The lightweight, user-friendly, distributed relational database built on SQLite.

The lightweight, user-friendly, distributed relational database built on SQLite. - rqlite/rqlite

1_r/devopsish

·github.com·Aug 29, 2025

rqlite/rqlite: The lightweight, user-friendly, distributed relational database built on SQLite.

Practical guide for avoiding burnout and living a happier life

Jono Bacon shares some quite ridiculous life choices from his early years that illustrate important ways of keeping healthy in mind, body, and spirit.

1_r/devopsish

·opensource.com·Aug 29, 2025

Practical guide for avoiding burnout and living a happier life

Linux Foundation Opens the Door to DocumentDB

Amazon Web Services and Microsoft will both work on the open source, document-oriented database system, per the annoucement at Open Source Summit Europe.

1_r/devopsish

·thenewstack.io·Aug 29, 2025

Linux Foundation Opens the Door to DocumentDB

Kubernetes v1.34: User preferences (kuberc) are available for testing in kubectl 1.34

https://kubernetes.io/blog/2025/08/28/kubernetes-v1-34-kubectl-kuberc-beta/

Have you ever wished you could enable interactive delete, by default, in kubectl? Or maybe, you'd like to have custom aliases defined, but not necessarily generate hundreds of them manually? Look no further. SIG-CLI has been working hard to add user preferences to kubectl, and we are happy to announce that this functionality is reaching beta as part of the Kubernetes v1.34 release.

How it works

A full description of this functionality is available in our official documentation, but this blog post will answer both of the questions from the beginning of this article.

Before we dive into details, let's quickly cover what the user preferences file looks like and where to place it. By default, kubectl will look for kuberc file in your default kubeconfig directory, which is $HOME/.kube. Alternatively, you can specify this location using --kuberc option or the KUBERC environment variable.

Just like every Kubernetes manifest, kuberc file will start with an apiVersion and kind:

apiVersion: kubectl.config.k8s.io/v1beta1 kind: Preference

the user preferences will follow here

Defaults

Let's start by setting default values for kubectl command options. Our goal is to always use interactive delete, which means we want the --interactive option for kubectl delete to always be set to true. This can be achieved with the following addition to our kuberc file:

defaults:

command: delete options:
- name: interactive default: "true"

In the above example, I'm introducing defaults section, which allows users to define default values for kubectl options. In this case, we're setting the interactive option for kubectl delete to be true by default. This default can be overridden if a user explicitly provides a different value such as kubectl delete --interactive=false, in which case the explicit option takes precedence.

Another highly encouraged default from SIG-CLI, is using Server-Side Apply. To do so, you can add the following snippet to your preferences:

continuing defaults section

command: apply options:
- name: server-side default: "true"

Aliases

The ability to define aliases allows us to save precious seconds when typing commands. I bet that you most likely have one defined for kubectl, because typing seven letters is definitely longer than just pressing k.

For this reason, the ability to define aliases was a must-have when we decided to implement user preferences, alongside defaulting. To define an alias for any of the built-in commands, expand your kuberc file with the following addition:

aliases:

name: gns command: get prependArgs:
- namespace options:
- name: output default: json

There's a lot going on above, so let me break this down. First, we're introducing a new section: aliases. Here, we're defining a new alias gns, which is mapped to the command get command. Next, we're defining arguments (namespace resource) that will be inserted right after the command name. Additionally, we're setting --output=json option for this alias. The structure of options block is identical to the one in the defaults section.

You probably noticed that we've introduced a mechanism for prepending arguments, and you might wonder if there is a complementary setting for appending them (in other words, adding to the end of the command, after user-provided arguments). This can be achieved through appendArgs block, which is presented below:

continuing aliases section

name: runx command: run options:
- name: image default: busybox
- name: namespace default: test-ns appendArgs:
- --
- custom-arg

Here, we're introducing another alias: runx, which invokes kubectl run command, passing --image and --namespace options with predefined values, and also appending -- and custom-arg at the end of the invocation.

Debugging

We hope that kubectl user preferences will open up new possibilities for our users. Whenever you're in doubt, feel free to run kubectl with increased verbosity. At -v=5, you should get all the possible debugging information from this feature, which will be crucial when reporting issues.

To learn more, I encourage you to read through our official documentation and the actual proposal.

Get involved

Kubectl user preferences feature has reached beta, and we are very interested in your feedback. We'd love to hear what you like about it and what problems you'd like to see it solve. Feel free to join SIG-CLI slack channel, or open an issue against kubectl repository. You can also join us at our community meetings, which happen every other Wednesday, and share your stories with us.

via Kubernetes Blog https://kubernetes.io/

August 28, 2025 at 02:30PM

1_r/devopsish

·kubernetes.io·Aug 29, 2025

Kubernetes v1.34: User preferences (kuberc) are available for testing in kubectl 1.34

KYAML · Issue #5295 · kubernetes/enhancements

Enhancement Description One-line enhancement description (can be used as a release note): Add KYAML output for kubectl Kubernetes Enhancement Proposal: https://github.com/kubernetes/enhancements/bl...

1_r/devopsish #Kubernetes #KYAML

·kep.k8s.io·Aug 27, 2025

KYAML · Issue #5295 · kubernetes/enhancements

Last Week in Kubernetes Development - Week Ending August 24 2025

Week Ending August 24, 2025

https://lwkd.info/2025/20250827

Developer News

Kubernetes 1.34 is released! This version, named “Of Wind & Will”, includes DRA GA, KYAML spec, structured authentication config, better watch cache initialization, and much more.

Yuki Iwai is nominated as a new Working Group Batch lead, joining Marcin and Kevin, as Swati and Maciej step down. Raise any Concerns before September 4, 2025.

Tim Hockin is stepping down as SIG Network co-chair and nominating Bowei Du as his replacement. He will remain a SIG Network Tech Lead. Lazy consensus on August 29, 2025.

Steering Committee Election

The Steering Committee election has started. This first stage is candidate nominations, to register potential new steering members. Have you considered working on the Steering Committee?

It is also time to verify if you are am eligible voter. If you are not, and should be, file a ballot exception.

Release Schedule

Next Deadline: Release Day 27th August

Kubernetes v1.34 is released.

A regression in kube-proxy v1.34.* that prevented startup on single-stack IPv4 or IPv6 hosts was identified and fixed ahead of release cut. A huge thank you to all contributors, reviewers, and release team members whose efforts made this release possible!

The next scheduled patch releases are on September 9, 2025 (cherry pick deadline: September 5, 2025). As a reminder, Kubernetes 1.31 will enter maintenance mode on August 28, 2025, with End of Life (EOL) planned for October 28, 2025.

Featured PRs

133604: Fix storage counting all objects instead of objects for resource

This PR fixes a regression where apiserver_storage_objects was overcounted by counting all etcd objects (using /registry) instead of just the target resource (e.g., pods); It now counts only that resource’s objects, thus giving accurate per-resource metrics and avoiding extra work when the watch cache is disabled.

KEP of the Week

KEP 24: Add AppArmor Support

This KEP introduces support for AppArmor within a cluster. AppArmor can enable users to run a more secure deployment, and/or provide better auditing and monitoring of their systems. The AppArmor support provides users an alternative to SELinux, and provides an interface for users that are already maintaining a set of AppArmor profiles. This KEP is proposing a minimal path to GA, per the no perma-Beta requirement.

This KEP was released as Stable in 1.34

Other Merges

Count storage types accurately when filtering per type

Prevent data race around claimsToAllocate

Subprojects and Dependency Updates

cluster-api v1.11.0 adds support for Kubernetes v1.33 (management and workload clusters), introduces the v1beta2 API, and includes new providers (Scaleway, cdk8s)

kubespray v2.28.1 fixes etcd and kubeadm issues while improving Cilium, Hubble, and Calico networking stability

Shoutouts

Christian Schlotter (@chrischdi): Thanks to Fabrizio Pandini (@fabrizio.pandini) and Stefan Büringer (@sbueringer) for the huge amount of work they did for the latest cluster api :cluster-api: v1.11.0 release to set the stage for the v1beta2 api version, which benefits all users to have a more clear and consistent API as well as a better feedback loop!

via Last Week in Kubernetes Development https://lwkd.info/

August 27, 2025 at 05:50PM

1_r/devopsish

·lwkd.info·Aug 27, 2025

Last Week in Kubernetes Development - Week Ending August 24 2025

A hacker used AI to automate an 'unprecedented' cybercrime spree, Anthropic says

The company behind the Claude chatbot said it caught a hacker using its chatbot to identify, hack and extort at least 17 companies.

1_r/devopsish

·nbcnews.com·Aug 27, 2025

A hacker used AI to automate an 'unprecedented' cybercrime spree, Anthropic says

Kubernetes v1.34: Of Wind & Will (O' WaW)

https://kubernetes.io/blog/2025/08/27/kubernetes-v1-34-release/

Editors: Agustina Barbetta, Alejandro Josue Leon Bellido, Graziano Casto, Melony Qin, Dipesh Rawat

Similar to previous releases, the release of Kubernetes v1.34 introduces new stable, beta, and alpha features. The consistent delivery of high-quality releases underscores the strength of our development cycle and the vibrant support from our community.

This release consists of 58 enhancements. Of those enhancements, 23 have graduated to Stable, 22 have entered Beta, and 13 have entered Alpha.

There are also some deprecations and removals in this release; make sure to read about those.

Release theme and logo

A release powered by the wind around us — and the will within us.

Every release cycle, we inherit winds that we don't really control — the state of our tooling, documentation, and the historical quirks of our project. Sometimes these winds fill our sails, sometimes they push us sideways or die down.

What keeps Kubernetes moving isn't the perfect winds, but the will of our sailors who adjust the sails, man the helm, chart the courses and keep the ship steady. The release happens not because conditions are always ideal, but because of the people who build it, the people who release it, and the bears ^, cats, dogs, wizards, and curious minds who keep Kubernetes sailing strong — no matter which way the wind blows.

This release, Of Wind & Will (O' WaW), honors the winds that have shaped us, and the will that propels us forward.

^ Oh, and you wonder why bears? Keep wondering!

Spotlight on key updates

Kubernetes v1.34 is packed with new features and improvements. Here are a few select updates the Release Team would like to highlight!

Stable: The core of DRA is GA

Dynamic Resource Allocation (DRA) enables more powerful ways to select, allocate, share, and configure GPUs, TPUs, NICs and other devices.

Since the v1.30 release, DRA has been based around claiming devices using structured parameters that are opaque to the core of Kubernetes. This enhancement took inspiration from dynamic provisioning for storage volumes. DRA with structured parameters relies on a set of supporting API kinds: ResourceClaim, DeviceClass, ResourceClaimTemplate, and ResourceSlice API types under resource.k8s.io, while extending the .spec for Pods with a new resourceClaims field.

The resource.k8s.io/v1 APIs have graduated to stable and are now available by default.

This work was done as part of KEP #4381 led by WG Device Management.

Beta: Projected ServiceAccount tokens for kubelet image credential providers

The kubelet credential providers, used for pulling private container images, traditionally relied on long-lived Secrets stored on the node or in the cluster. This approach increased security risks and management overhead, as these credentials were not tied to the specific workload and did not rotate automatically.

To solve this, the kubelet can now request short-lived, audience-bound ServiceAccount tokens for authenticating to container registries. This allows image pulls to be authorized based on the Pod's own identity rather than a node-level credential.

The primary benefit is a significant security improvement. It eliminates the need for long-lived Secrets for image pulls, reducing the attack surface and simplifying credential management for both administrators and developers.

This work was done as part of KEP #4412 led by SIG Auth and SIG Node.

Alpha: Support for KYAML, a Kubernetes dialect of YAML

KYAML aims to be a safer and less ambiguous YAML subset, and was designed specifically for Kubernetes. Whatever version of Kubernetes you use, starting from Kubernetes v1.34 you are able to use KYAML as a new output format for kubectl.

KYAML addresses specific challenges with both YAML and JSON. YAML's significant whitespace requires careful attention to indentation and nesting, while its optional string-quoting can lead to unexpected type coercion (for example: "The Norway Bug"). Meanwhile, JSON lacks comment support and has strict requirements for trailing commas and quoted keys.

You can write KYAML and pass it as an input to any version of kubectl, because all KYAML files are also valid as YAML. With kubectl v1.34, you are also able to request KYAML output (as in kubectl get -o kyaml …) by setting environment variable KUBECTL_KYAML=true. If you prefer, you can still request the output in JSON or YAML format.

This work was done as part of KEP #5295 led by SIG CLI.

Features graduating to Stable

This is a selection of some of the improvements that are now stable following the v1.34 release.

Delayed creation of Job’s replacement Pods

By default, Job controllers create replacement Pods immediately when a Pod starts terminating, causing both Pods to run simultaneously. This can cause resource contention in constrained clusters, where the replacement Pod may struggle to find available nodes until the original Pod fully terminates. The situation can also trigger unwanted cluster autoscaler scale-ups. Additionally, some machine learning frameworks like TensorFlow and JAX require only one Pod per index to run at a time, making simultaneous Pod execution problematic. This feature introduces .spec.podReplacementPolicy in Jobs. You may choose to create replacement Pods only when the Pod is fully terminated (has .status.phase: Failed). To do this, set .spec.podReplacementPolicy: Failed.

Introduced as alpha in v1.28, this feature has graduated to stable in v1.34.

This work was done as part of KEP #3939 led by SIG Apps.

Recovery from volume expansion failure

This feature allows users to cancel volume expansions that are unsupported by the underlying storage provider, and retry volume expansion with smaller values that may succeed.

Introduced as alpha in v1.23, this feature has graduated to stable in v1.34.

This work was done as part of KEP #1790 led by SIG Storage.

VolumeAttributesClass for volume modification

VolumeAttributesClass has graduated to stable in v1.34. VolumeAttributesClass is a generic, Kubernetes-native API for modifying volume parameters like provisioned IO. It allows workloads to vertically scale their volumes on-line to balance cost and performance, if supported by their provider.

Like all new volume features in Kubernetes, this API is implemented via the container storage interface (CSI). Your provisioner-specific CSI driver must support the new ModifyVolume API which is the CSI side of this feature.

This work was done as part of KEP #3751 led by SIG Storage.

Structured authentication configuration

Kubernetes v1.29 introduced a configuration file format to manage API server client authentication, moving away from the previous reliance on a large set of command-line options. The AuthenticationConfiguration kind allows administrators to support multiple JWT authenticators, CEL expression validation, and dynamic reloading. This change significantly improves the manageability and auditability of the cluster's authentication settings - and has graduated to stable in v1.34.

This work was done as part of KEP #3331 led by SIG Auth.

Finer-grained authorization based on selectors

Kubernetes authorizers, including webhook authorizers and the built-in node authorizer, can now make authorization decisions based on field and label selectors in incoming requests. When you send list, watch or deletecollection requests with selectors, the authorization layer can now evaluate access with that additional context.

For example, you can write an authorization policy that only allows listing Pods bound to a specific .spec.nodeName. The client (perhaps the kubelet on a particular node) must specify the field selector that the policy requires, otherwise the request is forbidden. This change makes it feasible to set up least privilege rules, provided that the client knows how to conform to the restrictions you set. Kubernetes v1.34 now supports more granular control in environments like per-node isolation or custom multi-tenant setups.

This work was done as part of KEP #4601 led by SIG Auth.

Restrict anonymous requests with fine-grained controls

Instead of fully enabling or disabling anonymous access, you can now configure a strict list of endpoints where unauthenticated requests are allowed. This provides a safer alternative for clusters that rely on anonymous access to health or bootstrap endpoints like /healthz, /readyz, or /livez.

With this feature, accidental RBAC misconfigurations that grant broad access to anonymous users can be avoided without requiring changes to external probes or bootstrapping tools.

This work was done as part of KEP #4633 led by SIG Auth.

More efficient requeueing through plugin-specific callbacks

The kube-scheduler can now make more accurate decisions about when to retry scheduling Pods that were previously unschedulable. Each scheduling plugin can now register callback functions that tell the scheduler whether an incoming cluster event is likely to make a rejected Pod schedulable again.

This reduces unnecessary retries and improves overall scheduling throughput - especially in clusters using dynamic resource allocation. The feature also lets certain plugins skip the usual backoff delay when it is safe to do so, making scheduling faster in specific cases.

This work was done as part of KEP #4247 led by SIG Scheduling.

Ordered Namespace deletion

Semi-random resource deletion order can create security gaps or unintended behavior, such as Pods persisting after their associated NetworkPolicies are deleted.

This improvement introduces a more structured deletion process for Kubernetes namespaces to ensure secure and deterministic resource removal. By enforcing a structured deletion sequence that respects logical and security dependencies, this approach ensures Pods are removed before other resources.

This feature was introduced in Kubernetes v1.33 and graduated to stable in

1_r/devopsish

·kubernetes.io·Aug 27, 2025

Kubernetes v1.34: Of Wind & Will (O' WaW)

DigitalOcean MCP Server is now available | DigitalOcean

DigitalOcean is thrilled to offer its support for MCP Server, a new way to leverage AI.

1_r/devopsish

·digitalocean.com·Aug 27, 2025

DigitalOcean MCP Server is now available | DigitalOcean

Shocked I tell you, shocked /s | Dish gives up on becoming the fourth major wireless carrier

Boost Mobile will primarily use AT&T for connectivity.

1_r/devopsish

·theverge.com·Aug 27, 2025

Shocked I tell you, shocked /s | Dish gives up on becoming the fourth major wireless carrier

AI & DevOps Toolkit - Ep33 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=7_-PoHIWVl4

Ep33 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=7_-PoHIWVl4

1_r/devopsish

·youtube.com·Aug 26, 2025

AI & DevOps Toolkit - Ep33 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=7_-PoHIWVl4

OSPO Jobs | TODO Group

Current openings for Open Source Program Office (OSPO) professionals - program managers, compliance, policy, governance & more

1_r/devopsish

·todogroup.org·Aug 26, 2025

OSPO Jobs | TODO Group

The air is hissing out of the overinflated AI balloon

Opinion: Are tech giants getting nervous? They should be

1_r/devopsish

·theregister.com·Aug 26, 2025

The air is hissing out of the overinflated AI balloon

AWS, Cloudflare, Google, helped Feds identify DDOS suspect

Infosec in brief: Comet AI browser fooled; Microsoft sets sail for quantum safety; Sailor sent down for espionage

1_r/devopsish

·theregister.com·Aug 26, 2025

AWS, Cloudflare, Google, helped Feds identify DDOS suspect

Inside the A.I. Talent Wars

Researchers in the technology have been landing quarter-billion dollar salaries.

1_r/devopsish

·nytimes.com·Aug 26, 2025

Inside the A.I. Talent Wars

Arch Linux Project Responding to Week-Long DDoS Attack

The Arch Linux Project has been targeted in a DDoS attack that disrupted its website, repository, and forums.

1_r/devopsish

·securityweek.com·Aug 26, 2025

Arch Linux Project Responding to Week-Long DDoS Attack

What are SLOs, SLAs, and SLIs? A complete guide to service reliability etrics | Blog | incident.io

SRE is necessary to build sustainable software systems. In this article, we explain the fundamentals of SRE, including SLO, SLI, and SLA, and how they function.

1_r/devopsish

·incident.io·Aug 26, 2025

What are SLOs, SLAs, and SLIs? A complete guide to service reliability etrics | Blog | incident.io

Teaching Kubernetes to Scale with a MacBook Screen Lock with Brian Donelan

Teaching Kubernetes to Scale with a MacBook Screen Lock, with Brian Donelan

https://ku.bz/sFd8TL1cS

Brian Donelan, VP Cloud Platform Engineering at JPMorgan Chase, shares his ingenious side project that automatically scales Kubernetes workloads based on whether his MacBook is open or closed.

By connecting macOS screen lock events to CloudWatch, KEDA, and Karpenter, he built a system that achieves 80% cost savings by scaling pods and nodes to zero when he's away from his laptop.

You will learn:

How KEDA differs from traditional Kubernetes HPA - including its scale-to-zero capabilities, event-driven scaling, and extensive ecosystem of 60+ built-in scalers

The technical architecture connecting macOS notifications through CloudWatch to trigger Kubernetes autoscaling using Swift, AWS SDKs, and custom metrics

Cost optimization strategies including how to calculate actual savings, account for API costs, and identify leading indicators of compute demand

Creative approaches to autoscaling signals beyond CPU and memory, including examples from financial services and e-commerce that could revolutionize workload management

Sponsor

This episode is brought to you by Testkube—the ultimate Continuous Testing Platform for Cloud Native applications. Scale fast, test continuously, and ship confidently. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/sFd8TL1cS

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

August 26, 2025 at 06:00AM

1_r/devopsish

·kube.fm·Aug 26, 2025

Teaching Kubernetes to Scale with a MacBook Screen Lock with Brian Donelan

Reimagining Cluster Management for the AI Era | Jonathon Anderson, CIQ - TFiR

CIQ’s Warewulf Pro modernizes HPC cluster management with a web UI, pre-built images, and AI-ready provisioning, building on a 20-year open-source legacy.

1_r/devopsish

·tfir.io·Aug 25, 2025

Reimagining Cluster Management for the AI Era | Jonathon Anderson, CIQ - TFiR

AI & DevOps Toolkit - Stop Wasting Time: Turn AI Prompts Into Production Code - https://www.youtube.com/watch?v=XwWCFINXIoU

Stop Wasting Time: Turn AI Prompts Into Production Code

Spent three hours writing the perfect AI prompt only to watch it fail spectacularly? You're not alone. The problem isn't bad AI – it's that most developers treat prompts like throwaway commands instead of production code. This video reveals why context is everything in AI development, walking you through the evolution of a real prompt from 5 words to 500, and showing how proper prompt engineering can transform your team's productivity.

But here's the kicker: even perfect prompts are useless if your team can't share them effectively. I'll demonstrate how to turn your carefully crafted prompts into a shared asset using the Model Context Protocol (MCP), creating a system that evolves with your team and deploys like any other code. By the end, you'll understand why prompt management – not smarter models – is the real future of AI development, and you'll have the tools to build that future for your organization.

AIPrompts #MCPProtocol #DevOpsAI

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/stop-wasting-time-turn-ai-prompts-into-production-code 🔗 Model Context Protocol: https://modelcontextprotocol.io

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Introduction to AI Context and MCPs 01:23 AI Context Management Explained 05:44 Prompt Engineering Best Practices 09:40 Sharing AI Prompts Across Teams 13:25 MCP for Prompt Distribution 16:03 Prompt Management Key Takeaways

via YouTube https://www.youtube.com/watch?v=XwWCFINXIoU

1_r/devopsish

·youtube.com·Aug 25, 2025

AI & DevOps Toolkit - Stop Wasting Time: Turn AI Prompts Into Production Code - https://www.youtube.com/watch?v=XwWCFINXIoU