Kubernetes v1.35 Sneak Peek
https://kubernetes.io/blog/2025/11/26/kubernetes-v1-35-sneak-peek/
As the release of Kubernetes v1.35 approaches, the Kubernetes project continues to evolve. Features may be deprecated, removed, or replaced to improve the project's overall health. This blog post outlines planned changes for the v1.35 release that the release team believes you should be aware of to ensure the continued smooth operation of your Kubernetes cluster(s), and to keep you up to date with the latest developments. The information below is based on the current status of the v1.35 release and is subject to change before the final release date.
Deprecations and removals for Kubernetes v1.35
cgroup v1 support
On Linux nodes, container runtimes typically rely on cgroups (short for "control groups"). Support for using cgroup v2 has been stable in Kubernetes since v1.25, providing an alternative to the original v1 cgroup support. While cgroup v1 provided the initial resource control mechanism, it suffered from well-known inconsistencies and limitations. Adding support for cgroup v2 allowed use of a unified control group hierarchy, improved resource isolation, and served as the foundation for modern features, making legacy cgroup v1 support ready for removal. The removal of cgroup v1 support will only impact cluster administrators running nodes on older Linux distributions that do not support cgroup v2; on those nodes, the kubelet will fail to start. Administrators must migrate their nodes to systems with cgroup v2 enabled. More details on compatibility requirements will be available in a blog post soon after the v1.35 release.
To learn more, read about cgroup v2;
you can also track the switchover work via KEP-5573: Remove cgroup v1 support.
Deprecation of ipvs mode in kube-proxy
Many releases ago, the Kubernetes project implemented an ipvs mode in kube-proxy. It was adopted as a way to provide high-performance service load balancing, with better performance than the existing iptables mode. However, maintaining feature parity between ipvs and other kube-proxy modes became difficult, due to technical complexity and diverging requirements. This created significant technical debt and made the ipvs backend impractical to support alongside newer networking capabilities.
The Kubernetes project intends to deprecate kube-proxy ipvs mode in the v1.35 release, to streamline the kube-proxy codebase. For Linux nodes, the recommended kube-proxy mode is already nftables.
You can find more in KEP-5495: Deprecate ipvs mode in kube-proxy
Kubernetes is deprecating containerd v1.y support
While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases of containerd, as a consequence of automated cgroup driver detection, the Kubernetes SIG Node community has formally agreed upon a final support timeline for containerd v1.X. Kubernetes v1.35 is the last release to offer this support (aligned with containerd 1.7 EOL).
This is a final warning that if you are using containerd 1.X, you must switch to 2.0 or later before upgrading Kubernetes to the next version. You are able to monitor the kubelet_cri_losing_support metric to determine if any nodes in your cluster are using a containerd version that will soon be unsupported.
You can find more in the official blog post or in KEP-4033: Discover cgroup driver from CRI
Featured enhancements of Kubernetes v1.35
The following enhancements are some of those likely to be included in the v1.35 release. This is not a commitment, and the release content is subject to change.
Node declared features
When scheduling Pods, Kubernetes uses node labels, taints, and tolerations to match workload requirements with node capabilities. However, managing feature compatibility becomes challenging during cluster upgrades due to version skew between the control plane and nodes. This can lead to Pods being scheduled on nodes that lack required features, resulting in runtime failures.
The node declared features framework will introduce a standard mechanism for nodes to declare their supported Kubernetes features. With the new alpha feature enabled, a Node reports the features it can support, publishing this information to the control plane through a new .status.declaredFeatures field. Then, the kube-scheduler, admission controllers and third-party components can use these declarations. For example, you can enforce scheduling and API validation constraints, ensuring that Pods run only on compatible nodes.
This approach reduces manual node labeling, improves scheduling accuracy, and prevents incompatible pod placements proactively. It also integrates with the Cluster Autoscaler for informed scale-up decisions. Feature declarations are temporary and tied to Kubernetes feature gates, enabling safe rollout and cleanup.
Targeting alpha in v1.35, node declared features aims to solve version skew scheduling issues by making node capabilities explicit, enhancing reliability and cluster stability in heterogeneous version environments.
To learn more about this before the official documentation is published, you can read KEP-5328.
In-place update of Pod resources
Kubernetes is graduating in-place updates for Pod resources to General Availability (GA). This feature allows users to adjust cpu and memory resources without restarting Pods or Containers. Previously, such modifications required recreating Pods, which could disrupt workloads, particularly for stateful or batch applications. Previous Kubernetes releases already allowed you to change infrastructure resources settings (requests and limits) for existing Pods. This allows for smoother vertical scaling, improves efficiency, and can also simplify solution development.
The Container Runtime Interface (CRI) has also been improved, extending the UpdateContainerResources API for Windows and future runtimes while allowing ContainerStatus to report real-time resource configurations. Together, these changes make scaling in Kubernetes faster, more flexible, and disruption-free. The feature was introduced as alpha in v1.27, graduated to beta in v1.33, and is targeting graduation to stable in v1.35.
You can find more in KEP-1287: In-place Update of Pod Resources
Pod certificates
When running microservices, Pods often require a strong cryptographic identity to authenticate with each other using mutual TLS (mTLS). While Kubernetes provides Service Account tokens, these are designed for authenticating to the API server, not for general-purpose workload identity.
Before this enhancement, operators had to rely on complex, external projects like SPIFFE/SPIRE or cert-manager to provision and rotate certificates for their workloads. But what if you could issue a unique, short-lived certificate to your Pods natively and automatically? KEP-4317 is designed to enable such native workload identity. It opens up various possibilities for securing pod-to-pod communication by allowing the kubelet to request and mount certificates for a Pod via a projected volume.
This provides a built-in mechanism for workload identity, complete with automated certificate rotation, significantly simplifying the setup of service meshes and other zero-trust network policies. This feature was introduced as alpha in v1.34 and is targeting beta in v1.35.
You can find more in KEP-4317: Pod Certificates
Numeric values for taints
Kubernetes is enhancing taints and tolerations by adding numeric comparison operators, such as Gt (Greater Than) and Lt (Less Than).
Previously, tolerations supported only exact (Equal) or existence (Exists) matches, which were not suitable for numeric properties such as reliability SLAs.
With this change, a Pod can use a toleration to "opt-in" to nodes that meet a specific numeric threshold. For example, a Pod can require a Node with an SLA taint value greater than 950 (operator: Gt, value: "950").
This approach is more powerful than Node Affinity because it supports the NoExecute effect, allowing Pods to be automatically evicted if a node's numeric value drops below the tolerated threshold.
You can find more in KEP-5471: Enable SLA-based Scheduling
User namespaces
When running Pods, you can use securityContext to drop privileges, but containers inside the pod often still run as root (UID 0). This simplicity poses a significant challenge, as that container UID 0 maps directly to the host's root user.
Before this enhancement, a container breakout vulnerability could grant an attacker full root access to the node. But what if you could dynamically remap the container's root user to a safe, unprivileged user on the host? KEP-127 specifically allows such native support for Linux User Namespaces. It opens up various possibilities for pod security by isolating container and host user/group IDs. This allows a process to have root privileges (UID 0) within its namespace, while running as a non-privileged, high-numbered UID on the host.
Released as alpha in v1.25 and beta in v1.30, this feature continues to progress through beta maturity, paving the way for truly "rootless" containers that drastically reduce the attack surface for a whole class of security vulnerabilities.
You can find more in KEP-127: User Namespaces
Support for mounting OCI images as volumes
When provisioning a Pod, you often need to bundle data, binaries, or configuration files for your containers. Before this enhancement, people often included that kind of data directly into the main container image, or required a custom init container to download and unpack files into an emptyDir. You can still take either of those approaches, of course.
But what if you could populate a volume directly from a data-only artifact in an OCI registry, just like pulling a container image? Kubernetes v1.31 added support for the image volume type, allowing Pods to pull and unpack OCI container image artifacts into a volume declaratively.
This allows for seamless distribution of data, binaries, or ML mode
Ep40 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=nomAGBszjQo
Ep40 - Ask Me Anything About Anything with Scott Rosenberg 📱
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=0TtOJbMOVbs
More Kubernetes Than I Bargained For, with Amos Wenger
Amos Wenger walks through his production incident where adding a home computer as a Kubernetes node caused TLS certificate renewals to fail. The discussion covers debugging techniques using tools like netshoot and K9s, and explores the unexpected interactions between Kubernetes overlay networks and consumer routers.
You will learn:
How Kubernetes networking assumptions break when mixing cloud VMs with nodes behind consumer routers, and why cert-manager challenges fail in NAT environments
The differences between CNI plugins like Flannel and Calico, particularly how they handle IPv6 translation
Debugging techniques for network issues using tools like netshoot, K9s, and iproute2
Best practices for mixed infrastructure including proper node labeling, taints, and scheduling controls
Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/6Ll_7slr9
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
November 25, 2025 at 05:00AM
Kubernetes Configuration Good Practices
https://kubernetes.io/blog/2025/11/25/configuration-good-practices/
Configuration is one of those things in Kubernetes that seems small until it's not. Configuration is at the heart of every Kubernetes workload. A missing quote, a wrong API version or a misplaced YAML indent can ruin your entire deploy.
This blog brings together tried-and-tested configuration best practices. The small habits that make your Kubernetes setup clean, consistent and easier to manage. Whether you are just starting out or already deploying apps daily, these are the little things that keep your cluster stable and your future self sane.
This blog is inspired by the original Configuration Best Practices page, which has evolved through contributions from many members of the Kubernetes community.
General configuration practices
Use the latest stable API version
Kubernetes evolves fast. Older APIs eventually get deprecated and stop working. So, whenever you are defining resources, make sure you are using the latest stable API version. You can always check with
kubectl api-resources
This simple step saves you from future compatibility issues.
Store configuration in version control
Never apply manifest files directly from your desktop. Always keep them in a version control system like Git, it's your safety net. If something breaks, you can instantly roll back to a previous commit, compare changes or recreate your cluster setup without panic.
Write configs in YAML not JSON
Write your configuration files using YAML rather than JSON. Both work technically, but YAML is just easier for humans. It's cleaner to read and less noisy and widely used in the community.
YAML has some sneaky gotchas with boolean values: Use only true or false. Don't write yes, no, on or off. They might work in one version of YAML but break in another. To be safe, quote anything that looks like a Boolean (for example "yes").
Keep configuration simple and minimal
Avoid setting default values that are already handled by Kubernetes. Minimal manifests are easier to debug, cleaner to review and less likely to break things later.
Group related objects together
If your Deployment, Service and ConfigMap all belong to one app, put them in a single manifest file.
It's easier to track changes and apply them as a unit. See the Guestbook all-in-one.yaml file for an example of this syntax.
You can even apply entire directories with:
kubectl apply -f configs/
One command and boom everything in that folder gets deployed.
Add helpful annotations
Manifest files are not just for machines, they are for humans too. Use annotations to describe why something exists or what it does. A quick one-liner can save hours when debugging later and also allows better collaboration.
The most helpful annotation to set is kubernetes.io/description. It's like using comment, except that it gets copied into the API so that everyone else can see it even after you deploy.
Managing Workloads: Pods, Deployments, and Jobs
A common early mistake in Kubernetes is creating Pods directly. Pods work, but they don't reschedule themselves if something goes wrong.
Naked Pods (Pods not managed by a controller, such as Deployment or a StatefulSet) are fine for testing, but in real setups, they are risky.
Why? Because if the node hosting that Pod dies, the Pod dies with it and Kubernetes won't bring it back automatically.
Use Deployments for apps that should always be running
A Deployment, which both creates a ReplicaSet to ensure that the desired number of Pods is always available, and specifies a strategy to replace Pods (such as RollingUpdate), is almost always preferable to creating Pods directly. You can roll out a new version, and if something breaks, roll back instantly.
Use Jobs for tasks that should finish
A Job is perfect when you need something to run once and then stop like database migration or batch processing task. It will retry if the pods fails and report success when it's done.
Service Configuration and Networking
Services are how your workloads talk to each other inside (and sometimes outside) your cluster. Without them, your pods exist but can't reach anyone. Let's make sure that doesn't happen.
Create Services before workloads that use them
When Kubernetes starts a Pod, it automatically injects environment variables for existing Services. So, if a Pod depends on a Service, create a Service before its corresponding backend workloads (Deployments or StatefulSets), and before any workloads that need to access it.
For example, if a Service named foo exists, all containers will get the following variables in their initial environment:
FOO_SERVICE_HOST=<the host the Service runs on> FOO_SERVICE_PORT=<the port the Service runs on>
DNS based discovery doesn't have this problem, but it's a good habit to follow anyway.
Use DNS for Service discovery
If your cluster has the DNS add-on (most do), every Service automatically gets a DNS entry. That means you can access it by name instead of IP:
curl http://my-service.default.svc.cluster.local
It's one of those features that makes Kubernetes networking feel magical.
Avoid hostPort and hostNetwork unless absolutely necessary
You'll sometimes see these options in manifests:
hostPort: 8080 hostNetwork: true
But here's the thing: They tie your Pods to specific nodes, making them harder to schedule and scale. Because each <hostIP, hostPort, protocol> combination must be unique. If you don't specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol. Unless you're debugging or building something like a network plugin, avoid them.
If you just need local access for testing, try kubectl port-forward:
kubectl port-forward deployment/web 8080:80
See Use Port Forwarding to access applications in a cluster to learn more. Or if you really need external access, use a type: NodePort Service. That's the safer, Kubernetes-native way.
Use headless Services for internal discovery
Sometimes, you don't want Kubernetes to load balance traffic. You want to talk directly to each Pod. That's where headless Services come in.
You create one by setting clusterIP: None. Instead of a single IP, DNS gives you a list of all Pods IPs, perfect for apps that manage connections themselves.
Working with labels effectively
Labels are key/value pairs that are attached to objects such as Pods. Labels help you organize, query and group your resources. They don't do anything by themselves, but they make everything else from Services to Deployments work together smoothly.
Use semantics labels
Good labels help you understand what's what, even after months later. Define and use labels that identify semantic attributes of your application or Deployment. For example;
labels: app.kubernetes.io/name: myapp app.kubernetes.io/component: web tier: frontend phase: test
app.kubernetes.io/name : what the app is
tier : which layer it belongs to (frontend/backend)
phase : which stage it's in (test/prod)
You can then use these labels to make powerful selectors. For example:
kubectl get pods -l tier=frontend
This will list all frontend Pods across your cluster, no matter which Deployment they came from. Basically you are not manually listing Pod names; you are just describing what you want. See the guestbook app for examples of this approach.
Use common Kubernetes labels
Kubernetes actually recommends a set of common labels. It's a standardized way to name things across your different workloads or projects. Following this convention makes your manifests cleaner, and it means that tools such as Headlamp, dashboard, or third-party monitoring systems can all automatically understand what's running.
Manipulate labels for debugging
Since controllers (like ReplicaSets or Deployments) use labels to manage Pods, you can remove a label to “detach” a Pod temporarily.
Example:
kubectl label pod mypod app-
The app- part removes the label key app. Once that happens, the controller won’t manage that Pod anymore. It’s like isolating it for inspection, a “quarantine mode” for debugging. To interactively remove or add labels, use kubectl label.
You can then check logs, exec into it and once done, delete it manually. That’s a super underrated trick every Kubernetes engineer should know.
Handy kubectl tips
These small tips make life much easier when you are working with multiple manifest files or clusters.
Apply entire directories
Instead of applying one file at a time, apply the whole folder:
Using server-side apply is also a good practice
kubectl apply -f configs/ --server-side
This command looks for .yaml, .yml and .json files in that folder and applies them all together. It's faster, cleaner and helps keep things grouped by app.
Use label selectors to get or delete resources
You don't always need to type out resource names one by one. Instead, use selectors to act on entire groups at once:
kubectl get pods -l app=myapp kubectl delete pod -l phase=test
It's especially useful in CI/CD pipelines, where you want to clean up test resources dynamically.
Quickly create Deployments and Services
For quick experiments, you don't always need to write a manifest. You can spin up a Deployment right from the CLI:
kubectl create deployment webapp --image=nginx
Then expose it as a Service:
kubectl expose deployment webapp --port=80
This is great when you just want to test something before writing full manifests. Also, see Use a Service to Access an Application in a cluster for an example.
Conclusion
Cleaner configuration leads to calmer cluster administrators. If you stick to a few simple habits: keep configuration simple and minimal, version-control everything, use consistent labels, and avoid relying on naked Pods, you'll save yourself hours of debugging down the road.
The best part? Clean configurations stay readable. Even after months, you or anyone on yo
Gemini 3 Is Fast But Gaslights You at 128 Tokens/Second
Gemini 3 is undeniably fast and impressive on benchmarks, but after a full week of real-world software engineering work, the reality is more complicated. While everyone's been hyping its capabilities based on day-one reviews and marketing materials, this video digs into what actually matters: how Gemini 3 performs with coding agents on real projects, not just one-shot Tetris games or simple websites. The speed is remarkable at 128 tokens per second, but it comes with serious trade-offs that affect daily pair programming work.
The core issues are frustrating: Gemini 3 is nearly impossible to redirect once it commits to a plan, suffers from an 88% hallucination rate (nearly double Sonnet 4.5's 48%), and confidently claims tasks are complete when they're not. It ignores context from earlier in conversations, struggles with complex multi-step instructions, and dismisses suggestions like a grumpy coder who thinks they know best. While it excels at one-shot code generation, it falls short as a collaborative partner for serious software development. Gemini 3 is genuinely one of the best models available (probably second place behind Sonnet 4.5) but it's not the massive leap forward that the hype suggests, and the gap between Claude Code and Gemini CLI remains significant.
Gemini3 #AIcoding #SoftwareEngineering
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/gemini-3-is-fast-but-gaslights-you-at-128-tokens-second 🔗 Gemini 3: https://deepmind.google/models/gemini
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Gamini 3 with Gamini CLI 00:25 Gemini 3 Real-World Testing 02:54 Gemini 3's Biggest Problems 10:10 Is Gemini 3 Worth It?
via YouTube https://www.youtube.com/watch?v=AUoqr5r1pBY
Is Kubernetes Ready for AI? Google’s New Agent Tech | TSG Ep. 967
https://chrisshort.net/video/techstrong-gang-ep967/
Alan Shimel, Mike Vizard, and Chris Short discuss the state of Kubernetes following the KubeCon + CloudNativeCon North America 2025 conference.
via Chris Short https://chrisshort.net/
November 14, 2025
Week Ending November 16, 2025
https://lwkd.info/2025/20251120
Developer News
Kubernetes SIG Network and the Security Response Committee have announced the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026.
Release Schedule
Next Deadline: Feature blogs ready for review, November 24th
We are in Code Freeze. Release lead Drew Hagen shared the state of the release.
The Feature blog is a great way to highlight and share information about your enhancement with the community. Feature blogs are especially encouraged for high visibility changes as well as deprecations and removals. The official deadline has passed, but but opt-ins are still welcome. If you are interested in writing a blog for your enhancement, please create placeholder PR and contact your lead ASAP.
Kubernetes v1.35.0-beta.0 and patch releases v1.32.10, v1.31.14 v1.33.6 and v1.34.2 are now live!
KEP of the Week
KEP-5067: Pod Generation
This KEP introduces proper use of metadata.generation and a new status.observedGeneration field to show which PodSpec version the kubelet has actually processed. This helps eliminate uncertainty when multiple updates occur, making Pod status tracking consistent with other Kubernetes resources.
This KEP is tracked for stable in v1.35
Other Merges
Implement opportunistic batching to speed up pod scheduling
Allow constraining impersonation for specific resources
NominatedNodeName has integration tests
DRA device health check timeouts are configurable
Disguish between nil and not present in validation racheting
You can mutate job directives even if they’re suspended
Volume Group Snapshots are now v1beta2 API
Overhaul Device Taint Eviction in DRA
ScheduleAsyncAPICalls has been re-enabled by default after debugging
Device class selection is deterministic
StatefulSets won’t trigger a rollout when upgrading to 1.34
Don’t schedule pods that need storage to a node with no CSI
kuberc gets view and set commands
v1alpha1 structured response for /flagz
Pod statuses stay the same after kubelet restart
Let’s schedule the whole darned gang through the new workload API
DRA: prioritized list scoring and Extended Resource Metrics and extended resource quota
Operators get more tolerations
Mutate persistent volume node affinity
Auto-restart of all containers in a pod when one of them exits
Promotions
KubeletEnsureSecretPulledImages is Beta
Image Volume Source to Beta
PodTopologyLabelsAdmission to Beta
NominatedNodeNameForExpectation and ClearingNominatedNodeNameAfterBinding to Beta
SupplementalGroupsPolicy to GA
JobManagedBy to GA
InPlacePodVerticalScaling tests to Conformance
KubeletCrashLoopBackOffMax to Beta
Pod Certificates to Beta
EnvFiles to Beta
WatchListClient to Beta
Deprecations
Drop networking v1beta1 Ingress from kubectl
AggregatedDiscoveryRemoveBetaType gate removed
Version Updates
go to v1.25.4
CoreDNS to 1.13.1
Subprojects and Dependency Updates
prometheus v3.8.0-rc.0 stabilizes native histograms (now an optional stable feature via scrape_native_histogram), tightens validation for custom-bounds histograms, adds detailed target relabeling views in the UI, improves OTLP target_info de-duplication, expands alerting and promtool support (including Remote-Write 2.0 for promtool push metrics), and delivers multiple PromQL and UI performance fixes for large rule/alert pages.
cloud-provider-aws v1.31.9 bumps the AWS Go SDK to 1.24.7 for CVE coverage, completes migration to AWS SDK v2 for EC2, ELB and ELBV2, adds support for a new AWS partition in the credential provider, and includes defensive fixes for potential nil pointer dereferences alongside the usual 1.31 release line version bump.
cloud-provider-aws v1.30.10 mirrors the 1.31.9 line with backported updates to AWS SDK Go v2 (EC2 and load balancers), a Go SDK 1.24.7 security bump, support for the new AWS partition in credential provider logic, improved nil-pointer safety, and includes contributions from a new external maintainer.
cloud-provider-aws v1.29.10 provides a straightforward version bump for the 1.29 branch, while cloud-provider-aws v1.29.9 backports key changes including EC2/load balancer migration to AWS SDK Go v2, the Go SDK 1.24.7 CVE update, and new-partition support in the credential provider to keep older clusters aligned with current AWS environments.
cluster-api v1.12.0-beta.1 continues the v1.12 beta with chained-upgrade Runtime SDK improvements, blocking AfterClusterUpgrade hooks for safer rollouts, new features such as taint propagation in Machine APIs, MachineDeployment in-place update support, clusterctl describe condition filters, and a broad set of bugfixes and dependency bumps (including etcd v3.6.6 and Kubernetes v0.34.2 libraries).
cluster-api-provider-vsphere v1.15.0-beta.1 refreshes CAPV against CAPI v1.12.0-beta.1, upgrades Go to 1.24.10 and core Kubernetes/etcd libraries, and focuses on test and tooling improvements such as enhanced e2e network debugging, junit output from e2e runs, and refined CI configuration ahead of the 1.15 release.
kubebuilder v4.10.1 is a fast follow-up bugfix release that retracts the problematic v4.10.0 Go module, fixes nested JSON tag omitempty handling in generated APIs, stabilizes metrics e2e tests with webhooks, and tightens Go module validation to prevent future module install issues while keeping scaffold auto-update guidance intact.
kubebuilder v4.10.0 (now retracted as a Go module) introduced the new helm/v2-alpha plugin to replace helm/v1-alpha, improved multi-arch support and Go/tooling versions (golangci-lint, controller-runtime, cert-manager), added external plugin enhancements (PluginChain, ProjectConfig access), support for custom webhook paths, and a series of CLI and scaffolding fixes including better handling of directories with spaces.
cluster-api-provider-vsphere v1.15.0-beta.0 introduces the next beta version of CAPV for testing upcoming Cluster API v1.15 functionality on vSphere. This release is intended only for testing and feedback.
vsphere-csi-driver v3.6.0 adds compatibility with Kubernetes v1.34 and brings improvements such as shared session support on vCenter login and enhanced task monitoring. Updated manifests for this release are available under the versioned manifests/vanilla directory.
kustomize kyaml v0.21.0 updates structured data replacement capabilities, upgrades Go to 1.24.6, refreshes dependencies following security alerts, and includes minor YAML handling fixes.
kustomize v5.8.0 enhances YAML/JSON replacement features, fixes namespace propagation for Helm integrations, and adds improvements such as regex support for replacements, new patch argument types, validation fixes, improved error messages, and performance optimizations.
kustomize cmd/config v0.21.0 aligns with kyaml updates, adopts Go 1.24.6, and brings dependency updates based on recent security advisories.
kustomize api v0.21.0 includes structured-data replacement enhancements, regex selector support, patch argument additions, namespace propagation fixes, validation improvements, Go 1.24.6 updates, and dependency refreshes.
etcd v3.6.6 provides a new patch update for the v3.6 series with all changes documented in the linked changelog. Installation steps and supported platform updates are also included.
etcd v3.5.25 delivers maintenance updates for the v3.5 series along with relevant upgrade guidance and support documentation.
etcd v3.4.39 introduces the newest patches for the v3.4 branch with installation instructions and detailed platform support notes.
cri-o v1.34.2 improves GRPC debug log formatting and ships updated, signed release bundles and SPDX SBOMs for all supported architectures.
cri-o v1.33.6 publishes refreshed signed artifacts and SPDX documents for the 1.33 line, with no dependency changes recorded.
cri-o v1.32.10 updates the 1.32 branch with new signed release artifacts and SBOM files, without dependency modifications.
nerdctl v2.2.0 fixes a namestore path issue, adds mount-manager support, introduces checkpoint lifecycle commands, and enhances image conversion through a new estargz helper flag. The full bundle includes updated containerd, runc, BuildKit, and Stargz Snapshotter.
Shoutouts
Danilo Gemoli: Shoutout to @Petr Muller who is trying to gather new contributors in #prow. He arranged a meeting in which we had the possibility to bring on the table several interesting idea on how to ease the entry barriers for newcomers
via Last Week in Kubernetes Development https://lwkd.info/
November 20, 2025 at 07:59AM
Ep39 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=tafANChjv3g
The Karpenter Effect: Redefining Kubernetes Operations, with Tanat Lokejaroenlarb
Tanat Lokejaroenlarb shares the complete journey of replacing EKS Managed Node Groups and Cluster Autoscaler with AWS Karpenter. He explains how this migration transformed their Kubernetes operations, from eliminating brittle upgrade processes to achieving significant cost savings of €30,000 per month through automated instance selection and AMD adoption.
You will learn:
How to decouple control plane and data plane upgrades using Karpenter's asynchronous node rollout capabilities
Cost optimization strategies including flexible instance selection, automated AMD migration, and the trade-offs between cheapest-first selection versus performance considerations
Scaling and performance tuning techniques such as implementing over-provisioning with low-priority placeholder pods
Policy automation and operational practices using Kyverno for user experience simplification, implementing proper Pod Disruption Budgets
Sponsor
This episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimization
More info
Find all the links and info for this episode here: https://ku.bz/T6hDSWYhb
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
November 18, 2025 at 05:00AM
AI vs Manual: Kubernetes Troubleshooting Showdown 2025
Tired of waking up at 3 AM to troubleshoot Kubernetes issues? This video shows you how to automate the entire incident response process using AI-powered remediation. We walk through the traditional manual troubleshooting workflow—detecting issues through kubectl events, analyzing pods and their controllers, identifying root causes, and validating fixes—then demonstrate how AI agents can handle all four phases automatically. Using the open-source DevOps AI Toolkit with the Model Context Protocol (MCP) and a custom Kubernetes controller, you'll see how AI can detect failing pods, analyze the root cause (like a missing PersistentVolumeClaim), suggest remediation, and validate that the fix worked, all while you stay in bed.
The video breaks down the complete architecture, showing how a Kubernetes controller monitors events defined in RemediationPolicy resources, triggers the MCP server for analysis, and either automatically applies fixes or sends Slack notifications for manual approval based on confidence thresholds and risk levels. You'll learn how the MCP agent loops with an LLM using read-only tools to gather data and analyze issues, while keeping write operations isolated and requiring explicit approval. Whether you want fully automated remediation for low-risk issues or human-in-the-loop approval for everything, this approach gives you intelligent troubleshooting that scales beyond what you can predict and prepare for manually.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: JFrog Fly 🔗 https://jfrog.com/fly_viktor ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
Kubernetes #AIAutomation #DevOps
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/ai-vs-manual-kubernetes-troubleshooting-showdown-2025 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Analysis and Remediation with AI 01:15 JFrog Fly (sponsor) 02:46 Kubernetes Troubleshooting Manual Process 11:37 AI-Powered Kubernetes Remediation 14:38 MCP Architecture and Controller Design 20:49 Key Takeaways and Next Steps
via YouTube https://www.youtube.com/watch?v=UbPyEelCh-I