Found 55290 bookmarks

Newest

Kubernetes v1.35 Sneak Peek

https://kubernetes.io/blog/2025/11/26/kubernetes-v1-35-sneak-peek/

As the release of Kubernetes v1.35 approaches, the Kubernetes project continues to evolve. Features may be deprecated, removed, or replaced to improve the project's overall health. This blog post outlines planned changes for the v1.35 release that the release team believes you should be aware of to ensure the continued smooth operation of your Kubernetes cluster(s), and to keep you up to date with the latest developments. The information below is based on the current status of the v1.35 release and is subject to change before the final release date.

Deprecations and removals for Kubernetes v1.35

cgroup v1 support

On Linux nodes, container runtimes typically rely on cgroups (short for "control groups"). Support for using cgroup v2 has been stable in Kubernetes since v1.25, providing an alternative to the original v1 cgroup support. While cgroup v1 provided the initial resource control mechanism, it suffered from well-known inconsistencies and limitations. Adding support for cgroup v2 allowed use of a unified control group hierarchy, improved resource isolation, and served as the foundation for modern features, making legacy cgroup v1 support ready for removal. The removal of cgroup v1 support will only impact cluster administrators running nodes on older Linux distributions that do not support cgroup v2; on those nodes, the kubelet will fail to start. Administrators must migrate their nodes to systems with cgroup v2 enabled. More details on compatibility requirements will be available in a blog post soon after the v1.35 release.

To learn more, read about cgroup v2;

you can also track the switchover work via KEP-5573: Remove cgroup v1 support.

Deprecation of ipvs mode in kube-proxy

Many releases ago, the Kubernetes project implemented an ipvs mode in kube-proxy. It was adopted as a way to provide high-performance service load balancing, with better performance than the existing iptables mode. However, maintaining feature parity between ipvs and other kube-proxy modes became difficult, due to technical complexity and diverging requirements. This created significant technical debt and made the ipvs backend impractical to support alongside newer networking capabilities.

The Kubernetes project intends to deprecate kube-proxy ipvs mode in the v1.35 release, to streamline the kube-proxy codebase. For Linux nodes, the recommended kube-proxy mode is already nftables.

You can find more in KEP-5495: Deprecate ipvs mode in kube-proxy

Kubernetes is deprecating containerd v1.y support

While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases of containerd, as a consequence of automated cgroup driver detection, the Kubernetes SIG Node community has formally agreed upon a final support timeline for containerd v1.X. Kubernetes v1.35 is the last release to offer this support (aligned with containerd 1.7 EOL).

This is a final warning that if you are using containerd 1.X, you must switch to 2.0 or later before upgrading Kubernetes to the next version. You are able to monitor the kubelet_cri_losing_support metric to determine if any nodes in your cluster are using a containerd version that will soon be unsupported.

You can find more in the official blog post or in KEP-4033: Discover cgroup driver from CRI

Featured enhancements of Kubernetes v1.35

The following enhancements are some of those likely to be included in the v1.35 release. This is not a commitment, and the release content is subject to change.

Node declared features

When scheduling Pods, Kubernetes uses node labels, taints, and tolerations to match workload requirements with node capabilities. However, managing feature compatibility becomes challenging during cluster upgrades due to version skew between the control plane and nodes. This can lead to Pods being scheduled on nodes that lack required features, resulting in runtime failures.

The node declared features framework will introduce a standard mechanism for nodes to declare their supported Kubernetes features. With the new alpha feature enabled, a Node reports the features it can support, publishing this information to the control plane through a new .status.declaredFeatures field. Then, the kube-scheduler, admission controllers and third-party components can use these declarations. For example, you can enforce scheduling and API validation constraints, ensuring that Pods run only on compatible nodes.

This approach reduces manual node labeling, improves scheduling accuracy, and prevents incompatible pod placements proactively. It also integrates with the Cluster Autoscaler for informed scale-up decisions. Feature declarations are temporary and tied to Kubernetes feature gates, enabling safe rollout and cleanup.

Targeting alpha in v1.35, node declared features aims to solve version skew scheduling issues by making node capabilities explicit, enhancing reliability and cluster stability in heterogeneous version environments.

To learn more about this before the official documentation is published, you can read KEP-5328.

In-place update of Pod resources

Kubernetes is graduating in-place updates for Pod resources to General Availability (GA). This feature allows users to adjust cpu and memory resources without restarting Pods or Containers. Previously, such modifications required recreating Pods, which could disrupt workloads, particularly for stateful or batch applications. Previous Kubernetes releases already allowed you to change infrastructure resources settings (requests and limits) for existing Pods. This allows for smoother vertical scaling, improves efficiency, and can also simplify solution development.

The Container Runtime Interface (CRI) has also been improved, extending the UpdateContainerResources API for Windows and future runtimes while allowing ContainerStatus to report real-time resource configurations. Together, these changes make scaling in Kubernetes faster, more flexible, and disruption-free. The feature was introduced as alpha in v1.27, graduated to beta in v1.33, and is targeting graduation to stable in v1.35.

You can find more in KEP-1287: In-place Update of Pod Resources

Pod certificates

When running microservices, Pods often require a strong cryptographic identity to authenticate with each other using mutual TLS (mTLS). While Kubernetes provides Service Account tokens, these are designed for authenticating to the API server, not for general-purpose workload identity.

Before this enhancement, operators had to rely on complex, external projects like SPIFFE/SPIRE or cert-manager to provision and rotate certificates for their workloads. But what if you could issue a unique, short-lived certificate to your Pods natively and automatically? KEP-4317 is designed to enable such native workload identity. It opens up various possibilities for securing pod-to-pod communication by allowing the kubelet to request and mount certificates for a Pod via a projected volume.

This provides a built-in mechanism for workload identity, complete with automated certificate rotation, significantly simplifying the setup of service meshes and other zero-trust network policies. This feature was introduced as alpha in v1.34 and is targeting beta in v1.35.

You can find more in KEP-4317: Pod Certificates

Numeric values for taints

Kubernetes is enhancing taints and tolerations by adding numeric comparison operators, such as Gt (Greater Than) and Lt (Less Than).

Previously, tolerations supported only exact (Equal) or existence (Exists) matches, which were not suitable for numeric properties such as reliability SLAs.

With this change, a Pod can use a toleration to "opt-in" to nodes that meet a specific numeric threshold. For example, a Pod can require a Node with an SLA taint value greater than 950 (operator: Gt, value: "950").

This approach is more powerful than Node Affinity because it supports the NoExecute effect, allowing Pods to be automatically evicted if a node's numeric value drops below the tolerated threshold.

You can find more in KEP-5471: Enable SLA-based Scheduling

User namespaces

When running Pods, you can use securityContext to drop privileges, but containers inside the pod often still run as root (UID 0). This simplicity poses a significant challenge, as that container UID 0 maps directly to the host's root user.

Before this enhancement, a container breakout vulnerability could grant an attacker full root access to the node. But what if you could dynamically remap the container's root user to a safe, unprivileged user on the host? KEP-127 specifically allows such native support for Linux User Namespaces. It opens up various possibilities for pod security by isolating container and host user/group IDs. This allows a process to have root privileges (UID 0) within its namespace, while running as a non-privileged, high-numbered UID on the host.

Released as alpha in v1.25 and beta in v1.30, this feature continues to progress through beta maturity, paving the way for truly "rootless" containers that drastically reduce the attack surface for a whole class of security vulnerabilities.

You can find more in KEP-127: User Namespaces

Support for mounting OCI images as volumes

When provisioning a Pod, you often need to bundle data, binaries, or configuration files for your containers. Before this enhancement, people often included that kind of data directly into the main container image, or required a custom init container to download and unpack files into an emptyDir. You can still take either of those approaches, of course.

But what if you could populate a volume directly from a data-only artifact in an OCI registry, just like pulling a container image? Kubernetes v1.31 added support for the image volume type, allowing Pods to pull and unpack OCI container image artifacts into a volume declaratively.

This allows for seamless distribution of data, binaries, or ML mode

1_r/devopsish

·kubernetes.io·Nov 26, 2025

Kubernetes v1.35 Sneak Peek

DevOps & AI Toolkit - Ep40 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=nomAGBszjQo

Ep40 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=nomAGBszjQo

1_r/devopsish

·youtube.com·Nov 25, 2025

DevOps & AI Toolkit - Ep40 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=nomAGBszjQo

DevOps & AI Toolkit - Ep40 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=0TtOJbMOVbs

Ep40 - Ask Me Anything About Anything with Scott Rosenberg 📱

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=0TtOJbMOVbs

1_r/devopsish

·youtube.com·Nov 25, 2025

DevOps & AI Toolkit - Ep40 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=0TtOJbMOVbs

More Kubernetes Than I Bargained For with Amos Wenger

More Kubernetes Than I Bargained For, with Amos Wenger

https://ku.bz/6Ll_7slr9

Amos Wenger walks through his production incident where adding a home computer as a Kubernetes node caused TLS certificate renewals to fail. The discussion covers debugging techniques using tools like netshoot and K9s, and explores the unexpected interactions between Kubernetes overlay networks and consumer routers.

You will learn:

How Kubernetes networking assumptions break when mixing cloud VMs with nodes behind consumer routers, and why cert-manager challenges fail in NAT environments

The differences between CNI plugins like Flannel and Calico, particularly how they handle IPv6 translation

Debugging techniques for network issues using tools like netshoot, K9s, and iproute2

Best practices for mixed infrastructure including proper node labeling, taints, and scheduling controls

Sponsor

This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/6Ll_7slr9

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

November 25, 2025 at 05:00AM

1_r/devopsish

·kube.fm·Nov 25, 2025

More Kubernetes Than I Bargained For with Amos Wenger

Kubernetes Configuration Good Practices

https://kubernetes.io/blog/2025/11/25/configuration-good-practices/

Configuration is one of those things in Kubernetes that seems small until it's not. Configuration is at the heart of every Kubernetes workload. A missing quote, a wrong API version or a misplaced YAML indent can ruin your entire deploy.

This blog brings together tried-and-tested configuration best practices. The small habits that make your Kubernetes setup clean, consistent and easier to manage. Whether you are just starting out or already deploying apps daily, these are the little things that keep your cluster stable and your future self sane.

This blog is inspired by the original Configuration Best Practices page, which has evolved through contributions from many members of the Kubernetes community.

General configuration practices

Use the latest stable API version

Kubernetes evolves fast. Older APIs eventually get deprecated and stop working. So, whenever you are defining resources, make sure you are using the latest stable API version. You can always check with

kubectl api-resources

This simple step saves you from future compatibility issues.

Store configuration in version control

Never apply manifest files directly from your desktop. Always keep them in a version control system like Git, it's your safety net. If something breaks, you can instantly roll back to a previous commit, compare changes or recreate your cluster setup without panic.

Write configs in YAML not JSON

Write your configuration files using YAML rather than JSON. Both work technically, but YAML is just easier for humans. It's cleaner to read and less noisy and widely used in the community.

YAML has some sneaky gotchas with boolean values: Use only true or false. Don't write yes, no, on or off. They might work in one version of YAML but break in another. To be safe, quote anything that looks like a Boolean (for example "yes").

Keep configuration simple and minimal

Avoid setting default values that are already handled by Kubernetes. Minimal manifests are easier to debug, cleaner to review and less likely to break things later.

Group related objects together

If your Deployment, Service and ConfigMap all belong to one app, put them in a single manifest file.

It's easier to track changes and apply them as a unit. See the Guestbook all-in-one.yaml file for an example of this syntax.

You can even apply entire directories with:

kubectl apply -f configs/

One command and boom everything in that folder gets deployed.

Add helpful annotations

Manifest files are not just for machines, they are for humans too. Use annotations to describe why something exists or what it does. A quick one-liner can save hours when debugging later and also allows better collaboration.

The most helpful annotation to set is kubernetes.io/description. It's like using comment, except that it gets copied into the API so that everyone else can see it even after you deploy.

Managing Workloads: Pods, Deployments, and Jobs

A common early mistake in Kubernetes is creating Pods directly. Pods work, but they don't reschedule themselves if something goes wrong.

Naked Pods (Pods not managed by a controller, such as Deployment or a StatefulSet) are fine for testing, but in real setups, they are risky.

Why? Because if the node hosting that Pod dies, the Pod dies with it and Kubernetes won't bring it back automatically.

Use Deployments for apps that should always be running

A Deployment, which both creates a ReplicaSet to ensure that the desired number of Pods is always available, and specifies a strategy to replace Pods (such as RollingUpdate), is almost always preferable to creating Pods directly. You can roll out a new version, and if something breaks, roll back instantly.

Use Jobs for tasks that should finish

A Job is perfect when you need something to run once and then stop like database migration or batch processing task. It will retry if the pods fails and report success when it's done.

Service Configuration and Networking

Services are how your workloads talk to each other inside (and sometimes outside) your cluster. Without them, your pods exist but can't reach anyone. Let's make sure that doesn't happen.

Create Services before workloads that use them

When Kubernetes starts a Pod, it automatically injects environment variables for existing Services. So, if a Pod depends on a Service, create a Service before its corresponding backend workloads (Deployments or StatefulSets), and before any workloads that need to access it.

For example, if a Service named foo exists, all containers will get the following variables in their initial environment:

FOO_SERVICE_HOST=<the host the Service runs on> FOO_SERVICE_PORT=<the port the Service runs on>

DNS based discovery doesn't have this problem, but it's a good habit to follow anyway.

Use DNS for Service discovery

If your cluster has the DNS add-on (most do), every Service automatically gets a DNS entry. That means you can access it by name instead of IP:

curl http://my-service.default.svc.cluster.local

It's one of those features that makes Kubernetes networking feel magical.

Avoid hostPort and hostNetwork unless absolutely necessary

You'll sometimes see these options in manifests:

hostPort: 8080 hostNetwork: true

But here's the thing: They tie your Pods to specific nodes, making them harder to schedule and scale. Because each <hostIP, hostPort, protocol> combination must be unique. If you don't specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol. Unless you're debugging or building something like a network plugin, avoid them.

If you just need local access for testing, try kubectl port-forward:

kubectl port-forward deployment/web 8080:80

See Use Port Forwarding to access applications in a cluster to learn more. Or if you really need external access, use a type: NodePort Service. That's the safer, Kubernetes-native way.

Use headless Services for internal discovery

Sometimes, you don't want Kubernetes to load balance traffic. You want to talk directly to each Pod. That's where headless Services come in.

You create one by setting clusterIP: None. Instead of a single IP, DNS gives you a list of all Pods IPs, perfect for apps that manage connections themselves.

Working with labels effectively

Labels are key/value pairs that are attached to objects such as Pods. Labels help you organize, query and group your resources. They don't do anything by themselves, but they make everything else from Services to Deployments work together smoothly.

Use semantics labels

Good labels help you understand what's what, even after months later. Define and use labels that identify semantic attributes of your application or Deployment. For example;

labels: app.kubernetes.io/name: myapp app.kubernetes.io/component: web tier: frontend phase: test

app.kubernetes.io/name : what the app is

tier : which layer it belongs to (frontend/backend)

phase : which stage it's in (test/prod)

You can then use these labels to make powerful selectors. For example:

kubectl get pods -l tier=frontend

This will list all frontend Pods across your cluster, no matter which Deployment they came from. Basically you are not manually listing Pod names; you are just describing what you want. See the guestbook app for examples of this approach.

Use common Kubernetes labels

Kubernetes actually recommends a set of common labels. It's a standardized way to name things across your different workloads or projects. Following this convention makes your manifests cleaner, and it means that tools such as Headlamp, dashboard, or third-party monitoring systems can all automatically understand what's running.

Manipulate labels for debugging

Since controllers (like ReplicaSets or Deployments) use labels to manage Pods, you can remove a label to “detach” a Pod temporarily.

Example:

kubectl label pod mypod app-

The app- part removes the label key app. Once that happens, the controller won’t manage that Pod anymore. It’s like isolating it for inspection, a “quarantine mode” for debugging. To interactively remove or add labels, use kubectl label.

You can then check logs, exec into it and once done, delete it manually. That’s a super underrated trick every Kubernetes engineer should know.

Handy kubectl tips

These small tips make life much easier when you are working with multiple manifest files or clusters.

Apply entire directories

Instead of applying one file at a time, apply the whole folder:

Using server-side apply is also a good practice

kubectl apply -f configs/ --server-side

This command looks for .yaml, .yml and .json files in that folder and applies them all together. It's faster, cleaner and helps keep things grouped by app.

Use label selectors to get or delete resources

You don't always need to type out resource names one by one. Instead, use selectors to act on entire groups at once:

kubectl get pods -l app=myapp kubectl delete pod -l phase=test

It's especially useful in CI/CD pipelines, where you want to clean up test resources dynamically.

Quickly create Deployments and Services

For quick experiments, you don't always need to write a manifest. You can spin up a Deployment right from the CLI:

kubectl create deployment webapp --image=nginx

Then expose it as a Service:

kubectl expose deployment webapp --port=80

This is great when you just want to test something before writing full manifests. Also, see Use a Service to Access an Application in a cluster for an example.

Conclusion

Cleaner configuration leads to calmer cluster administrators. If you stick to a few simple habits: keep configuration simple and minimal, version-control everything, use consistent labels, and avoid relying on naked Pods, you'll save yourself hours of debugging down the road.

The best part? Clean configurations stay readable. Even after months, you or anyone on yo

1_r/devopsish

·kubernetes.io·Nov 25, 2025

Kubernetes Configuration Good Practices

KubeCon Wrap Up by Chris Short, Head of Open Source at CIQ

Last week, CIQ participated in KubeCon + CloudNativeCon North America 2025 in Atlanta, Georgia. CIQ's very own Chris Short hosted a talk during the event, participated in the CNCF Maintainers Summit…

1_r/devopsish

·ciq.com·Nov 24, 2025

KubeCon Wrap Up by Chris Short, Head of Open Source at CIQ

DevOps & AI Toolkit - Gemini 3 Is Fast But Gaslights You at 128 Tokens/Second - https://www.youtube.com/watch?v=AUoqr5r1pBY

Gemini 3 Is Fast But Gaslights You at 128 Tokens/Second

Gemini 3 is undeniably fast and impressive on benchmarks, but after a full week of real-world software engineering work, the reality is more complicated. While everyone's been hyping its capabilities based on day-one reviews and marketing materials, this video digs into what actually matters: how Gemini 3 performs with coding agents on real projects, not just one-shot Tetris games or simple websites. The speed is remarkable at 128 tokens per second, but it comes with serious trade-offs that affect daily pair programming work.

The core issues are frustrating: Gemini 3 is nearly impossible to redirect once it commits to a plan, suffers from an 88% hallucination rate (nearly double Sonnet 4.5's 48%), and confidently claims tasks are complete when they're not. It ignores context from earlier in conversations, struggles with complex multi-step instructions, and dismisses suggestions like a grumpy coder who thinks they know best. While it excels at one-shot code generation, it falls short as a collaborative partner for serious software development. Gemini 3 is genuinely one of the best models available (probably second place behind Sonnet 4.5) but it's not the massive leap forward that the hype suggests, and the gap between Claude Code and Gemini CLI remains significant.

Gemini3 #AIcoding #SoftwareEngineering

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/gemini-3-is-fast-but-gaslights-you-at-128-tokens-second 🔗 Gemini 3: https://deepmind.google/models/gemini

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Gamini 3 with Gamini CLI 00:25 Gemini 3 Real-World Testing 02:54 Gemini 3's Biggest Problems 10:10 Is Gemini 3 Worth It?

via YouTube https://www.youtube.com/watch?v=AUoqr5r1pBY

1_r/devopsish

·youtube.com·Nov 24, 2025

DevOps & AI Toolkit - Gemini 3 Is Fast But Gaslights You at 128 Tokens/Second - https://www.youtube.com/watch?v=AUoqr5r1pBY

How we built a 130,000-node GKE cluster | Google Cloud Blog

Learn about the architectural innovations we used to build a 130,000-node Kubernetes cluster, and the trends driving demand for these environments.

1_r/devopsish

·cloud.google.com·Nov 23, 2025

How we built a 130,000-node GKE cluster | Google Cloud Blog

Is Kubernetes Ready for AI? Googles New Agent Tech | TSG Ep. 967

Is Kubernetes Ready for AI? Google’s New Agent Tech | TSG Ep. 967

https://chrisshort.net/video/techstrong-gang-ep967/

Alan Shimel, Mike Vizard, and Chris Short discuss the state of Kubernetes following the KubeCon + CloudNativeCon North America 2025 conference.

via Chris Short https://chrisshort.net/

November 14, 2025

1_r/devopsish

·chrisshort.net·Nov 21, 2025

Is Kubernetes Ready for AI? Googles New Agent Tech | TSG Ep. 967

Overcoming BEOL Patterning Challenges At The 3nm Node

Using virtual fabrication to assess edge placement error and pattern 18nm metal pitch successfully.

1_r/devopsish

·semiengineering.com·Nov 21, 2025

Overcoming BEOL Patterning Challenges At The 3nm Node

Last Week in Kubernetes Development - Week Ending November 16 2025

Week Ending November 16, 2025

https://lwkd.info/2025/20251120

Developer News

Kubernetes SIG Network and the Security Response Committee have announced the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026.

Release Schedule

Next Deadline: Feature blogs ready for review, November 24th

We are in Code Freeze. Release lead Drew Hagen shared the state of the release.

The Feature blog is a great way to highlight and share information about your enhancement with the community. Feature blogs are especially encouraged for high visibility changes as well as deprecations and removals. The official deadline has passed, but but opt-ins are still welcome. If you are interested in writing a blog for your enhancement, please create placeholder PR and contact your lead ASAP.

Kubernetes v1.35.0-beta.0 and patch releases v1.32.10, v1.31.14 v1.33.6 and v1.34.2 are now live!

KEP of the Week

KEP-5067: Pod Generation

This KEP introduces proper use of metadata.generation and a new status.observedGeneration field to show which PodSpec version the kubelet has actually processed. This helps eliminate uncertainty when multiple updates occur, making Pod status tracking consistent with other Kubernetes resources.

This KEP is tracked for stable in v1.35

Other Merges

Implement opportunistic batching to speed up pod scheduling

Allow constraining impersonation for specific resources

NominatedNodeName has integration tests

DRA device health check timeouts are configurable

Disguish between nil and not present in validation racheting

You can mutate job directives even if they’re suspended

Volume Group Snapshots are now v1beta2 API

Overhaul Device Taint Eviction in DRA

ScheduleAsyncAPICalls has been re-enabled by default after debugging

Device class selection is deterministic

StatefulSets won’t trigger a rollout when upgrading to 1.34

Don’t schedule pods that need storage to a node with no CSI

kuberc gets view and set commands

v1alpha1 structured response for /flagz

Pod statuses stay the same after kubelet restart

Let’s schedule the whole darned gang through the new workload API

DRA: prioritized list scoring and Extended Resource Metrics and extended resource quota

Operators get more tolerations

Mutate persistent volume node affinity

Auto-restart of all containers in a pod when one of them exits

Promotions

KubeletEnsureSecretPulledImages is Beta

Image Volume Source to Beta

PodTopologyLabelsAdmission to Beta

NominatedNodeNameForExpectation and ClearingNominatedNodeNameAfterBinding to Beta

SupplementalGroupsPolicy to GA

JobManagedBy to GA

InPlacePodVerticalScaling tests to Conformance

KubeletCrashLoopBackOffMax to Beta

Pod Certificates to Beta

EnvFiles to Beta

WatchListClient to Beta

Deprecations

Drop networking v1beta1 Ingress from kubectl

AggregatedDiscoveryRemoveBetaType gate removed

Version Updates

go to v1.25.4

CoreDNS to 1.13.1

Subprojects and Dependency Updates

prometheus v3.8.0-rc.0 stabilizes native histograms (now an optional stable feature via scrape_native_histogram), tightens validation for custom-bounds histograms, adds detailed target relabeling views in the UI, improves OTLP target_info de-duplication, expands alerting and promtool support (including Remote-Write 2.0 for promtool push metrics), and delivers multiple PromQL and UI performance fixes for large rule/alert pages.

cloud-provider-aws v1.31.9 bumps the AWS Go SDK to 1.24.7 for CVE coverage, completes migration to AWS SDK v2 for EC2, ELB and ELBV2, adds support for a new AWS partition in the credential provider, and includes defensive fixes for potential nil pointer dereferences alongside the usual 1.31 release line version bump.

cloud-provider-aws v1.30.10 mirrors the 1.31.9 line with backported updates to AWS SDK Go v2 (EC2 and load balancers), a Go SDK 1.24.7 security bump, support for the new AWS partition in credential provider logic, improved nil-pointer safety, and includes contributions from a new external maintainer.

cloud-provider-aws v1.29.10 provides a straightforward version bump for the 1.29 branch, while cloud-provider-aws v1.29.9 backports key changes including EC2/load balancer migration to AWS SDK Go v2, the Go SDK 1.24.7 CVE update, and new-partition support in the credential provider to keep older clusters aligned with current AWS environments.

cluster-api v1.12.0-beta.1 continues the v1.12 beta with chained-upgrade Runtime SDK improvements, blocking AfterClusterUpgrade hooks for safer rollouts, new features such as taint propagation in Machine APIs, MachineDeployment in-place update support, clusterctl describe condition filters, and a broad set of bugfixes and dependency bumps (including etcd v3.6.6 and Kubernetes v0.34.2 libraries).

cluster-api-provider-vsphere v1.15.0-beta.1 refreshes CAPV against CAPI v1.12.0-beta.1, upgrades Go to 1.24.10 and core Kubernetes/etcd libraries, and focuses on test and tooling improvements such as enhanced e2e network debugging, junit output from e2e runs, and refined CI configuration ahead of the 1.15 release.

kubebuilder v4.10.1 is a fast follow-up bugfix release that retracts the problematic v4.10.0 Go module, fixes nested JSON tag omitempty handling in generated APIs, stabilizes metrics e2e tests with webhooks, and tightens Go module validation to prevent future module install issues while keeping scaffold auto-update guidance intact.

kubebuilder v4.10.0 (now retracted as a Go module) introduced the new helm/v2-alpha plugin to replace helm/v1-alpha, improved multi-arch support and Go/tooling versions (golangci-lint, controller-runtime, cert-manager), added external plugin enhancements (PluginChain, ProjectConfig access), support for custom webhook paths, and a series of CLI and scaffolding fixes including better handling of directories with spaces.

cluster-api-provider-vsphere v1.15.0-beta.0 introduces the next beta version of CAPV for testing upcoming Cluster API v1.15 functionality on vSphere. This release is intended only for testing and feedback.

vsphere-csi-driver v3.6.0 adds compatibility with Kubernetes v1.34 and brings improvements such as shared session support on vCenter login and enhanced task monitoring. Updated manifests for this release are available under the versioned manifests/vanilla directory.

kustomize kyaml v0.21.0 updates structured data replacement capabilities, upgrades Go to 1.24.6, refreshes dependencies following security alerts, and includes minor YAML handling fixes.

kustomize v5.8.0 enhances YAML/JSON replacement features, fixes namespace propagation for Helm integrations, and adds improvements such as regex support for replacements, new patch argument types, validation fixes, improved error messages, and performance optimizations.

kustomize cmd/config v0.21.0 aligns with kyaml updates, adopts Go 1.24.6, and brings dependency updates based on recent security advisories.

kustomize api v0.21.0 includes structured-data replacement enhancements, regex selector support, patch argument additions, namespace propagation fixes, validation improvements, Go 1.24.6 updates, and dependency refreshes.

etcd v3.6.6 provides a new patch update for the v3.6 series with all changes documented in the linked changelog. Installation steps and supported platform updates are also included.

etcd v3.5.25 delivers maintenance updates for the v3.5 series along with relevant upgrade guidance and support documentation.

etcd v3.4.39 introduces the newest patches for the v3.4 branch with installation instructions and detailed platform support notes.

cri-o v1.34.2 improves GRPC debug log formatting and ships updated, signed release bundles and SPDX SBOMs for all supported architectures.

cri-o v1.33.6 publishes refreshed signed artifacts and SPDX documents for the 1.33 line, with no dependency changes recorded.

cri-o v1.32.10 updates the 1.32 branch with new signed release artifacts and SBOM files, without dependency modifications.

nerdctl v2.2.0 fixes a namestore path issue, adds mount-manager support, introduces checkpoint lifecycle commands, and enhances image conversion through a new estargz helper flag. The full bundle includes updated containerd, runc, BuildKit, and Stargz Snapshotter.

Shoutouts

Danilo Gemoli: Shoutout to @Petr Muller who is trying to gather new contributors in #prow. He arranged a meeting in which we had the possibility to bring on the table several interesting idea on how to ease the entry barriers for newcomers

via Last Week in Kubernetes Development https://lwkd.info/

November 20, 2025 at 07:59AM

1_r/devopsish

·lwkd.info·Nov 21, 2025

Last Week in Kubernetes Development - Week Ending November 16 2025

Testing shows Apple N1 Wi-Fi chip improves on older Broadcom chips in every way

Apple’s in-house Wi-Fi chip doesn’t set records, but it’s a reliable performer.

1_r/devopsish

·arstechnica.com·Nov 20, 2025

Testing shows Apple N1 Wi-Fi chip improves on older Broadcom chips in every way

Skyway: Cloud cost management for the 9-figure club

Introducing Skyway: contract management for enterprise cloud spend. Built by the team overseeing tens-of-billions in enterprise cloud spend.

1_r/devopsish

·duckbillhq.com·Nov 20, 2025

Skyway: Cloud cost management for the 9-figure club

Cloudflare outage on November 18, 2025

Cloudflare suffered a service outage on November 18, 2025. The outage was triggered by a bug in generation logic for a Bot Management feature file causing many Cloudflare services to be affected.

1_r/devopsish

·blog.cloudflare.com·Nov 20, 2025

Cloudflare outage on November 18, 2025

Kubernetes Cluster Goes Mobile In Pet Carrier

There’s been a bit of a virtualization revolution going on for the last decade or so, where tools like Docker and LXC have made it possible to quickly deploy server applications without worry…

1_r/devopsish

·hackaday.com·Nov 20, 2025

Kubernetes Cluster Goes Mobile In Pet Carrier

Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog

Today, we are unveiling the next Fairwater site of Azure AI datacenters in Atlanta, Georgia. This purpose-built datacenter is connected to our first Fairwater site in Wisconsin, prior generations of AI supercomputers and the broader Azure global datacenter footprint to create the world’s first planet-scale AI superfactory. By packing computing power more densely than ever...

1_r/devopsish

·blogs.microsoft.com·Nov 20, 2025

Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog

Pouring packages with Homebrew

The Homebrew project is an open-source package-management system that comes with a repository o [...]

1_r/devopsish

·lwn.net·Nov 19, 2025

Pouring packages with Homebrew

Responsibly Sunsetting OSS Projects: A Guide for OSPOs | Fast Wonder

1_r/devopsish

·fastwonderblog.com·Nov 19, 2025

Responsibly Sunsetting OSS Projects: A Guide for OSPOs | Fast Wonder

Cloud-native computing is poised to explode, thanks to AI inference work

CNCF leaders predict hundreds of billions of dollars more of AI work for cloud-native computing in the next 18 months.

1_r/devopsish

·zdnet.com·Nov 19, 2025

Cloud-native computing is poised to explode, thanks to AI inference work

One bad click sent AWS bill into the stratosphere

Who, Me?: Yes, he knows the 40x increase could have been avoided with some pretty simple automation

1_r/devopsish

·theregister.com·Nov 19, 2025

One bad click sent AWS bill into the stratosphere

Larry Summers resigns from OpenAI board after release of emails with Epstein

Details about Summers' communications with sex offender Jeffrey Epstein were made public last week.

1_r/devopsish

·cnbc.com·Nov 19, 2025

Larry Summers resigns from OpenAI board after release of emails with Epstein

How Kubernetes Became the New Linux

Learn why AWS is building core features instead of competing products in this episode of The New Stack Makers.

1_r/devopsish

·thenewstack.io·Nov 19, 2025

How Kubernetes Became the New Linux

Cloudflare Outage Disrupts X, ChatGPT and Other Parts of the Internet

Services from Cloudflare, a software company, underpin thousands of websites, including X, Spotify and OpenAI. The company said a crash in a software system was to blame.

1_r/devopsish

·nytimes.com·Nov 19, 2025

Cloudflare Outage Disrupts X, ChatGPT and Other Parts of the Internet

fullcalendar/fullcalendar: Full-sized drag & drop event calendar in JavaScript

Full-sized drag & drop event calendar in JavaScript - fullcalendar/fullcalendar

1_r/devopsish

·github.com·Nov 18, 2025

fullcalendar/fullcalendar: Full-sized drag & drop event calendar in JavaScript

Agent Sandbox provides a secure and isolated execution layer to safely deploy autonomous AI agents on Kubernetes that generate and run untrusted code at scale.

Agent Sandbox is a cloud native controller for sandboxes

Agent Sandbox provides a secure, and isolated execution layer to safely deploy autonomous AI agents on Kubernetes that generate and run untrusted code at scale.

1_r/devopsish

·agent-sandbox.sigs.k8s.io·Nov 18, 2025

Agent Sandbox provides a secure and isolated execution layer to safely deploy autonomous AI agents on Kubernetes that generate and run untrusted code at scale.

DevOps & AI Toolkit - Ep39 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=tafANChjv3g

Ep39 - Ask Me Anything About Anything with Scott Rosenberg

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=tafANChjv3g

1_r/devopsish

·youtube.com·Nov 18, 2025

DevOps & AI Toolkit - Ep39 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=tafANChjv3g

The Karpenter Effect: Redefining Kubernetes Operations with Tanat Lokejaroenlarb

The Karpenter Effect: Redefining Kubernetes Operations, with Tanat Lokejaroenlarb

https://ku.bz/T6hDSWYhb

Tanat Lokejaroenlarb shares the complete journey of replacing EKS Managed Node Groups and Cluster Autoscaler with AWS Karpenter. He explains how this migration transformed their Kubernetes operations, from eliminating brittle upgrade processes to achieving significant cost savings of €30,000 per month through automated instance selection and AMD adoption.

You will learn:

How to decouple control plane and data plane upgrades using Karpenter's asynchronous node rollout capabilities

Cost optimization strategies including flexible instance selection, automated AMD migration, and the trade-offs between cheapest-first selection versus performance considerations

Scaling and performance tuning techniques such as implementing over-provisioning with low-priority placeholder pods

Policy automation and operational practices using Kyverno for user experience simplification, implementing proper Pod Disruption Budgets

Sponsor

This episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimization

More info

Find all the links and info for this episode here: https://ku.bz/T6hDSWYhb

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

November 18, 2025 at 05:00AM

1_r/devopsish

·kube.fm·Nov 18, 2025

The Karpenter Effect: Redefining Kubernetes Operations with Tanat Lokejaroenlarb

DevOps & AI Toolkit - AI vs Manual: Kubernetes Troubleshooting Showdown 2025 - https://www.youtube.com/watch?v=UbPyEelCh-I

AI vs Manual: Kubernetes Troubleshooting Showdown 2025

Tired of waking up at 3 AM to troubleshoot Kubernetes issues? This video shows you how to automate the entire incident response process using AI-powered remediation. We walk through the traditional manual troubleshooting workflow—detecting issues through kubectl events, analyzing pods and their controllers, identifying root causes, and validating fixes—then demonstrate how AI agents can handle all four phases automatically. Using the open-source DevOps AI Toolkit with the Model Context Protocol (MCP) and a custom Kubernetes controller, you'll see how AI can detect failing pods, analyze the root cause (like a missing PersistentVolumeClaim), suggest remediation, and validate that the fix worked, all while you stay in bed.

The video breaks down the complete architecture, showing how a Kubernetes controller monitors events defined in RemediationPolicy resources, triggers the MCP server for analysis, and either automatically applies fixes or sends Slack notifications for manual approval based on confidence thresholds and risk levels. You'll learn how the MCP agent loops with an LLM using read-only tools to gather data and analyze issues, while keeping write operations isolated and requiring explicit approval. Whether you want fully automated remediation for low-risk issues or human-in-the-loop approval for everything, this approach gives you intelligent troubleshooting that scales beyond what you can predict and prepare for manually.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: JFrog Fly 🔗 https://jfrog.com/fly_viktor ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Kubernetes #AIAutomation #DevOps

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/ai-vs-manual-kubernetes-troubleshooting-showdown-2025 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Analysis and Remediation with AI 01:15 JFrog Fly (sponsor) 02:46 Kubernetes Troubleshooting Manual Process 11:37 AI-Powered Kubernetes Remediation 14:38 MCP Architecture and Controller Design 20:49 Key Takeaways and Next Steps

via YouTube https://www.youtube.com/watch?v=UbPyEelCh-I

1_r/devopsish

·youtube.com·Nov 17, 2025

DevOps & AI Toolkit - AI vs Manual: Kubernetes Troubleshooting Showdown 2025 - https://www.youtube.com/watch?v=UbPyEelCh-I

About 35 years too late | Weapons makers have 'conned' US military into buying expensive equipment, Army Secretary says

Large defense companies have "conned" the U.S. military into buying expensive equipment when cheaper commercial options would have been available, U.S. Army Secretary Dan Driscoll said.

1_r/devopsish

·reuters.com·Nov 14, 2025

About 35 years too late | Weapons makers have 'conned' US military into buying expensive equipment, Army Secretary says

Ingress NGINX Retirement: What You Need to Know

https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/

To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.

We recommend migrating to one of the many alternatives. Consider migrating to Gateway API, the modern replacement for Ingress. If you must continue using Ingress, many alternative Ingress controllers are listed in the Kubernetes documentation. Continue reading for further information about the history and current state of Ingress NGINX, as well as next steps.

About Ingress NGINX

Ingress is the original user-friendly way to direct network traffic to workloads running on Kubernetes. (Gateway API is a newer way to achieve many of the same goals.) In order for an Ingress to work in your cluster, there must be an Ingress controller running. There are many Ingress controller choices available, which serve the needs of different users and use cases. Some are cloud-provider specific, while others have more general applicability.

Ingress NGINX was an Ingress controller, developed early in the history of the Kubernetes project as an example implementation of the API. It became very popular due to its tremendous flexibility, breadth of features, and independence from any particular cloud or infrastructure provider. Since those days, many other Ingress controllers have been created within the Kubernetes project by community groups, and by cloud native vendors. Ingress NGINX has continued to be one of the most popular, deployed as part of many hosted Kubernetes platforms and within innumerable independent users’ clusters.

History and Challenges

The breadth and flexibility of Ingress NGINX has caused maintenance challenges. Changing expectations about cloud native software have also added complications. What were once considered helpful options have sometimes come to be considered serious security flaws, such as the ability to add arbitrary NGINX configuration directives via the "snippets" annotations. Yesterday’s flexibility has become today’s insurmountable technical debt.

Despite the project’s popularity among users, Ingress NGINX has always struggled with insufficient or barely-sufficient maintainership. For years, the project has had only one or two people doing development work, on their own time, after work hours and on weekends. Last year, the Ingress NGINX maintainers announced their plans to wind down Ingress NGINX and develop a replacement controller together with the Gateway API community. Unfortunately, even that announcement failed to generate additional interest in helping maintain Ingress NGINX or develop InGate to replace it. (InGate development never progressed far enough to create a mature replacement; it will also be retired.)

Current State and Next Steps

Currently, Ingress NGINX is receiving best-effort maintenance. SIG Network and the Security Response Committee have exhausted our efforts to find additional support to make Ingress NGINX sustainable. To prioritize user safety, we must retire the project.

In March 2026, Ingress NGINX maintenance will be halted, and the project will be retired. After that time, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. The GitHub repositories will be made read-only and left available for reference.

Existing deployments of Ingress NGINX will not be broken. Existing project artifacts such as Helm charts and container images will remain available.

In most cases, you can check whether you use Ingress NGINX by running kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx with cluster administrator permissions.

We would like to thank the Ingress NGINX maintainers for their work in creating and maintaining this project–their dedication remains impressive. This Ingress controller has powered billions of requests in datacenters and homelabs all around the world. In a lot of ways, Kubernetes wouldn’t be where it is without Ingress NGINX, and we are grateful for so many years of incredible effort.

SIG Network and the Security Response Committee recommend that all Ingress NGINX users begin migration to Gateway API or another Ingress controller immediately. Many options are listed in the Kubernetes documentation: Gateway API, Ingress. Additional options may be available from vendors you work with.

via Kubernetes Blog https://kubernetes.io/

November 11, 2025 at 01:30PM

1_r/devopsish

·kubernetes.io·Nov 12, 2025

Ingress NGINX Retirement: What You Need to Know