
Suggested Reads
more filename tips (in the shell)
filename tips (in the shell)
Week Ending May 18, 2025
https://lwkd.info/2025/20250521
Developer News
James Sturtevant and Amim Knabben are stepping down from their roles as techincal leads in SIG Windows and Yuanliang Zhang is notimated as the new Lead
Wenjia Zhang has stepped down as the co-chair of Kubernetes SIG etcd. Siyuan Zhang is nominated to take over Wenjia’s role as the co-chair.
SIG Contributor Experience has updated the help-wanted guidelines to remove the “low barrier to entry” requirement. This improves the distinction between “good first issue” and “help-wanted” and better aligns with other open source projects. The help-wanted issues still require clear tasks, goldilocks priority and must be up-to-date.
Release Schedule
Next Deadline: v1.34 cycle starts May 19
The v1.34 release cycle has officially started this week, with a planned release date of 27th August.
Patch releases v1.33.1, 1.32.5, 1.31.9 and 1.30.13 are available. This is mostly a bugfix release, with a golang update.
Featured PRs
131299: DRA: prevent admin access claims from getting duplicate devices
This PR fixes a bug where ResourceClaims with adminAccess could be allocated the same device multiple times within a single claim; The DRA allocator now checks that each device is used only once per claim, preventing invalid CDI specs and ensuring correct behavior for device sharing with Dynamic Resource Allocation.
131345: scheduler: return UnschedulableAndUnresolvable when node capacity is insufficient
This PR updates the NodeResourcesFit plugin to return UnschedulableAndUnresolvable when a pod’s resource requests exceed a node’s allocatable capacity, even if the node is empty; This avoids unnecessary preemption attempts for nodes that can never satisfy the request, improves scheduling efficiency in large clusters, and provides clearer signals for unschedulable pods.
KEP of the Week
KEP 4247: Per-plugin callback functions for efficient requeueing in the scheduling queue
This KEP introduced the QueueingHint functionality to the Kubernetes scheduler, enabling plugins to provide more precise suggestions for when to requeue Pods. By filtering out low-impact events such as unnecessary Node updates for NodeAffinity the scheduler reduced redundant retries and improved scheduling throughput. The KEP also allowed plugins like the DRA plugin to skip backoff in specific cases, enhancing performance for Pods requiring dynamic resource allocation by avoiding unnecessary delays while waiting for device driver updates.
This KEP is tracked for beta in v1.34.
Other Merges
e2e tests for kuberc added
Scheduler improved the backoff calculation to O(1)
Response body closed after http calls in watch test
Error message improved when a pod with user namespaces is created and the runtime doesn’t support user namespaces
DRA: Reject NodePrepareResources if the cached claim UID doesn’t match resource claim
suggestChangeEmulationVersion to clarify how to test a locked feature for emulation version
kubelet removed the deprecated –cloud-config flag
Non-scheduling related errors to not lengthen the Pod scheduling backoff time
kube-log-runner adds log rotation
Scheduler introduced pInfo.GatingPlugin to filter out events more generally
Subprojects and Dependency Updates
etcd releases v3.6.0 bringing bugfixes and features like robust downgrade support, full migration to the v3store backend, Kubernetes-style feature gates, major memory optimizations and new health check endpoints for improved cluster monitoring.
Shoutouts
Josh Berkus (@jberkus): A big TY to Benjamin Wang (@Benjamin Wang) and Wenjia Zhang (@wenjiaswe) for getting Etcd 3.6 out the door, and to Tim Bannister (@LMKTFY), Ryota Sawada (@Ryota), Mario Fahlandt (@Mario Fahlandt) and Kaslin Fields (@kaslin) for helping promote it!
via Last Week in Kubernetes Development https://lwkd.info/
May 21, 2025 at 04:00PM
Managing 100s of Kubernetes Clusters using Cluster API, with Zain Malik
Discover how to manage Kubernetes at scale with declarative infrastructure and automation principles.
Zain Malik shares his experience managing multi-tenant Kubernetes clusters with up to 30,000 pods across clusters capped at 950 nodes. He explains how his team transitioned from Terraform to Cluster API for declarative cluster lifecycle management, contributing upstream to improve AKS support while implementing GitOps workflows.
You will learn:
How to address challenges in large-scale Kubernetes operations, including node pool management inconsistencies and lengthy provisioning times
Why Cluster API provides a powerful foundation for multi-cloud cluster management, and how to extend it with custom operators for production-specific needs
How implementing GitOps principles eliminates manual intervention in critical operations like cluster upgrades
Strategies for handling production incidents and bugs when adopting emerging technologies like Cluster API
Sponsor
This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/5PLksqVlk
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
May 20, 2025 at 06:00AM
Ep22 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, Cloud, Kubernetes, Platform Engineering, containers, or anything else. We'll have special guests Scott Rosenberg and Ramiro Berrelleza to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=7brdKxUiB9s
Outdated AI Responses? Context7 Solves LLMs' Biggest Flaw
Discover the power of AI-enhanced coding with Context7! This video explores how to overcome outdated LLM information using Context7, an MCP server that provides up-to-date documentation. See how Context7 integrates with AI agents, improving their ability to provide current, reliable information for over 11000 projects. Boost your development workflow and stay ahead with cutting-edge tools and techniques.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Korbit AI 🔗 https://korbit.ai ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
AIAgents #Context7 #AIDocs
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/outdated-ai-responses?-context7-solves-llms-biggest-flaw 🔗 Context7: https://context7.com
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 The Problem with Models (LLMs) 01:07 Korbit AI (sponsor) 02:13 The Problem with Models (LLMs) (cont.) 02:23 Agents Using LLM Alone 04:19 Agents with Context7 MCP 07:04 What Is Context7?
via YouTube https://www.youtube.com/watch?v=DeZ-gw_aop0
Kubernetes v1.33: In-Place Pod Resize Graduated to Beta
https://kubernetes.io/blog/2025/05/16/kubernetes-v1-33-in-place-pod-resize-beta/
On behalf of the Kubernetes project, I am excited to announce that the in-place Pod resize feature (also known as In-Place Pod Vertical Scaling), first introduced as alpha in Kubernetes v1.27, has graduated to Beta and will be enabled by default in the Kubernetes v1.33 release! This marks a significant milestone in making resource management for Kubernetes workloads more flexible and less disruptive.
What is in-place Pod resize?
Traditionally, changing the CPU or memory resources allocated to a container required restarting the Pod. While acceptable for many stateless applications, this could be disruptive for stateful services, batch jobs, or any workloads sensitive to restarts.
In-place Pod resizing allows you to change the CPU and memory requests and limits assigned to containers within a running Pod, often without requiring a container restart.
Here's the core idea:
The spec.containers[*].resources field in a Pod specification now represents the desired resources and is mutable for CPU and memory.
The status.containerStatuses[*].resources field reflects the actual resources currently configured on a running container.
You can trigger a resize by updating the desired resources in the Pod spec via the new resize subresource.
You can try it out on a v1.33 Kubernetes cluster by using kubectl to edit a Pod (requires kubectl v1.32+):
kubectl edit pod <pod-name> --subresource resize
For detailed usage instructions and examples, please refer to the official Kubernetes documentation: Resize CPU and Memory Resources assigned to Containers.
Why does in-place Pod resize matter?
Kubernetes still excels at scaling workloads horizontally (adding or removing replicas), but in-place Pod resizing unlocks several key benefits for vertical scaling:
Reduced Disruption: Stateful applications, long-running batch jobs, and sensitive workloads can have their resources adjusted without suffering the downtime or state loss associated with a Pod restart.
Improved Resource Utilization: Scale down over-provisioned Pods without disruption, freeing up resources in the cluster. Conversely, provide more resources to Pods under heavy load without needing a restart.
Faster Scaling: Address transient resource needs more quickly. For example Java applications often need more CPU during startup than during steady-state operation. Start with higher CPU and resize down later.
What's changed between Alpha and Beta?
Since the alpha release in v1.27, significant work has gone into maturing the feature, improving its stability, and refining the user experience based on feedback and further development. Here are the key changes:
Notable user-facing changes
resize Subresource: Modifying Pod resources must now be done via the Pod's resize subresource (kubectl patch pod <name> --subresource resize ...). kubectl versions v1.32+ support this argument.
Resize Status via Conditions: The old status.resize field is deprecated. The status of a resize operation is now exposed via two Pod conditions:
PodResizePending: Indicates the Kubelet cannot grant the resize immediately (e.g., reason: Deferred if temporarily unable, reason: Infeasible if impossible on the node).
PodResizeInProgress: Indicates the resize is accepted and being applied. Errors encountered during this phase are now reported in this condition's message with reason: Error.
Sidecar Support: Resizing sidecar containers in-place is now supported.
Stability and reliability enhancements
Refined Allocated Resources Management: The allocation management logic with the Kubelet was significantly reworked, making it more consistent and robust. The changes eliminated whole classes of bugs, and greatly improved the reliability of in-place Pod resize.
Improved Checkpointing & State Tracking: A more robust system for tracking "allocated" and "actuated" resources was implemented, using new checkpoint files (allocated_pods_state, actuated_pods_state) to reliably manage resize state across Kubelet restarts and handle edge cases where runtime-reported resources differ from requested ones. Several bugs related to checkpointing and state restoration were fixed. Checkpointing efficiency was also improved.
Faster Resize Detection: Enhancements to the Kubelet's Pod Lifecycle Event Generator (PLEG) allow the Kubelet to respond to and complete resizes much more quickly.
Enhanced CRI Integration: A new UpdatePodSandboxResources CRI call was added to better inform runtimes and plugins (like NRI) about Pod-level resource changes.
Numerous Bug Fixes: Addressed issues related to systemd cgroup drivers, handling of containers without limits, CPU minimum share calculations, container restart backoffs, error propagation, test stability, and more.
What's next?
Graduating to Beta means the feature is ready for broader adoption, but development doesn't stop here! Here's what the community is focusing on next:
Stability and Productionization: Continued focus on hardening the feature, improving performance, and ensuring it is robust for production environments.
Addressing Limitations: Working towards relaxing some of the current limitations noted in the documentation, such as allowing memory limit decreases.
VerticalPodAutoscaler (VPA) Integration: Work to enable VPA to leverage in-place Pod resize is already underway. A new InPlaceOrRecreate update mode will allow it to attempt non-disruptive resizes first, or fall back to recreation if needed. This will allow users to benefit from VPA's recommendations with significantly less disruption.
User Feedback: Gathering feedback from users adopting the beta feature is crucial for prioritizing further enhancements and addressing any uncovered issues or bugs.
Getting started and providing feedback
With the InPlacePodVerticalScaling feature gate enabled by default in v1.33, you can start experimenting with in-place Pod resizing right away!
Refer to the documentation for detailed guides and examples.
As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels (GitHub issues, mailing lists, Slack). You can also review the KEP-1287: In-place Update of Pod Resources for the full in-depth design details.
We look forward to seeing how the community leverages in-place Pod resize to build more efficient and resilient applications on Kubernetes!
via Kubernetes Blog https://kubernetes.io/
May 16, 2025 at 02:30PM