54838 bookmarks

Custom sorting

Last Week in Kubernetes Development - Week Ending October 19 2025

Week Ending October 19, 2025

https://lwkd.info/2025/20251022

Developer News

SIG-Etcd has found another potential upgrade failure preventing some users upgrading to etcd 3.6. The blog gives instructions on steps to avoid it, mainly updating to 3.5.24.

Release Schedule

Next Deadline: Docs Deadline for placeholder PRs, October 23

The deadline for opening your placeholder docs PRs is coming up soon. If you have a KEP tracked for v1.35, make sure that you have a placeholder PR in k/website for your docs before the deadline.

THe v1.35 Enhancements Freeze is in effect from October 17th. Out of the 101 KEPs opted in for the release, 75 made the cut for enhancements freeze.

Steering Committee Election

The Steering Committee Election voting ends later this week on Friday, 24th October, AoE. You can check your eligibility to vote in the voting app. Don’t forget to cast your votes if you haven’t already!

The deadline to file an exception request is 22nd October, AoE. Submit an exception request soon if you think you’re eligible!

KEP of the Week

KEP-4742: Expose Node Topology Labels via Downward API

This KEP introduces a built-in Kubernetes admission plugin that automatically copies node topology labels (like zone, region, or rack) onto Pods. It allows Pods to access this topology data through the Downward API without using privileged init containers or custom scripts. The change simplifies topology-aware workloads such as distributed AI/ML training, CNI optimizations, and sharded databases, making topology awareness a secure and native part of Kubernetes.

This KEP is tracked for beta in v1.35.

Other Merges

Declarative validation tags have a StabilityLevel

Test external VolumeGroupSnapshots in 1.35

AllocationConfigSource is validated

APF properly counts legacy watches

Declarative Validation rollout: DeviceClassName, update, ResourceClaim, maxItems, DRA fields, DeviceAllocationMode

Simplify kube-cross builds

Promotions

ExecProbeTimeout to GA

max-allowable-numa-nodes to GA

Deprecated

storage.k8s.io/v1alpha1 is no longer served

Version Updates

Golang update: 1.24.9 in 1.31 through 1.34, 1.25.3 in 1.35

etcd to v3.5.23, just in time to replace it with 3.5.24

Shoutouts

Rayan Das – A big shout-out to the v1.35 Enhancements shadows ( @dchan @jmickey @aibarbetta @Subhasmita @Faeka Ansari) for their hard work leading up to Enhancements Freeze yesterday.

via Last Week in Kubernetes Development https://lwkd.info/

October 22, 2025 at 07:55PM

·lwkd.info·Oct 23, 2025

Last Week in Kubernetes Development - Week Ending October 19 2025

ChatGPT just came out with its own web browser. Use it with caution.

OpenAI’s Atlas promises AI-powered convenience. The price? Letting ChatGPT track and store “memories” of what you do online.

·washingtonpost.com·Oct 22, 2025

ChatGPT just came out with its own web browser. Use it with caution.

The Majority AI View - Anil Dash

A blog about making culture. Since 1999.

·anildash.com·Oct 22, 2025

The Majority AI View - Anil Dash

What if hard work felt easier?

Rethinking productivity, motivation, and the path of least resistance

·jeanhsu.substack.com·Oct 22, 2025

What if hard work felt easier?

A Cascade of Failures: A Breakdown of the Massive AWS Outage

The problem started with misconfigured DNS, but soon infected EC2 launches as well, bringing hiccups to many of the largest internet services.

·thenewstack.io·Oct 22, 2025

A Cascade of Failures: A Breakdown of the Massive AWS Outage

Should You Go All-In on Vite? A Risk vs. Reward Analysis

If you're thinking about going all-in on Vite, we look at the pros of its speed and ecosystem vs. the risks of lock-in and rising competition from Turbopack.

·thenewstack.io·Oct 22, 2025

Should You Go All-In on Vite? A Risk vs. Reward Analysis

Why Modern IPv6 Failed This Massive Kubernetes Networking Test

Deutsche Telekom pushes the limits of Kubernetes, containers and networks in its satellite network simulation.

·thenewstack.io·Oct 22, 2025

Why Modern IPv6 Failed This Massive Kubernetes Networking Test

Amazon Plans to Replace More Than Half a Million Jobs With Robots

Internal documents show the company that changed how people shop has a far-reaching plan to automate 75 percent of its operations.

·nytimes.com·Oct 22, 2025

Amazon Plans to Replace More Than Half a Million Jobs With Robots

DevOps & AI Toolkit - Ep37 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=--z0NqQN3J8

Ep37 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=--z0NqQN3J8

·youtube.com·Oct 21, 2025

DevOps & AI Toolkit - Ep37 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=--z0NqQN3J8

How to Contribute to Open Source

Want to contribute to open source? A guide to making open source contributions, for first-timers and veterans.

·opensource.guide·Oct 21, 2025

How to Contribute to Open Source

[Monday] is when the Amazon brain drain finally sent AWS down the spout

column: When your best engineers log off for good, don’t be surprised when the cloud forgets how DNS works

·theregister.com·Oct 21, 2025

[Monday] is when the Amazon brain drain finally sent AWS down the spout

The Double-Edged Sword of AI-Assisted Kubernetes Operations with Mai Nishitani

The Double-Edged Sword of AI-Assisted Kubernetes Operations, with Mai Nishitani

https://ku.bz/3hWvQjXxp

Mai Nishitani, Director of Enterprise Architecture at NTT Data and AWS Community Builder, demonstrates how Model Context Protocol (MCP) enables Claude to directly interact with Kubernetes clusters through natural language commands.

You will learn:

How MCP servers work and why they're significant for standardizing AI integration with DevOps tools, moving beyond custom integrations to a universal protocol

The practical capabilities and critical limitations of AI in Kubernetes operations

Why fundamental troubleshooting skills matter more than ever as AI abstractions can fail in unexpected ways, especially during crisis scenarios and complex system failures

How DevOps roles are evolving from manual administration toward strategic architecture and orchestration

Sponsor

This episode is brought to you by Testkube—where teams run millions of performance tests in real Kubernetes infrastructure. From air-gapped environments to massive scale deployments, orchestrate every testing tool in one platform. Check it out at testkube.io

More info

Find all the links and info for this episode here: https://ku.bz/3hWvQjXxp

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 21, 2025 at 06:00AM

·kube.fm·Oct 21, 2025

The Double-Edged Sword of AI-Assisted Kubernetes Operations with Mai Nishitani

DevOps & AI Toolkit - MCP Server Deployment Guide: From Local To Production - https://www.youtube.com/watch?v=MHf-M8qOogY

MCP Server Deployment Guide: From Local To Production

Discover the four main ways to deploy MCP servers, from simple local execution to enterprise-ready Kubernetes clusters. This comprehensive guide explores the trade-offs between NPX local deployment, Docker containerization, Kubernetes production setups, and cloud platform alternatives like Fly.io and Cloudflare Workers.

You'll see practical demonstrations of each approach using a real MCP server, learning about security implications, scalability challenges, and team collaboration benefits. The video covers why local NPX execution creates security risks and dependency nightmares, how Docker provides better isolation but remains single-user, and why Kubernetes offers the best solution for shared organizational infrastructure. We also examine the ToolHive operator's limitations and explore various cloud deployment options with their respective vendor lock-in considerations. Whether you're developing MCP servers or deploying them for your team, this guide will help you choose the right deployment strategy for your specific needs.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Browserbase 🔗 https://browserbase.com ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

MCP #ModelContextProtocol

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/mcp-server-deployment-guide-from-local-to-production 🔗 Model Context Protocol: https://modelcontextprotocol.io

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Model Context Protocol (MCP) Deployment 01:40 Browserbase (sponsor) 02:50 MCP Local NPX Deployment 06:32 MCP Docker Container Deployment 09:23 MCP Kubernetes Production Deployment 14:09 MCP ToolHive Kubernetes Operator 19:15 Alternative MCP Deployment Options 22:46 Choosing the Right MCP Deployment

via YouTube https://www.youtube.com/watch?v=MHf-M8qOogY

·youtube.com·Oct 20, 2025

DevOps & AI Toolkit - MCP Server Deployment Guide: From Local To Production - https://www.youtube.com/watch?v=MHf-M8qOogY

7 Common Kubernetes Pitfalls (and How I Learned to Avoid Them)

https://kubernetes.io/blog/2025/10/20/seven-kubernetes-pitfalls-and-how-to-avoid/

It’s no secret that Kubernetes can be both powerful and frustrating at times. When I first started dabbling with container orchestration, I made more than my fair share of mistakes enough to compile a whole list of pitfalls. In this post, I want to walk through seven big gotchas I’ve encountered (or seen others run into) and share some tips on how to avoid them. Whether you’re just kicking the tires on Kubernetes or already managing production clusters, I hope these insights help you steer clear of a little extra stress.

Skipping resource requests and limits

The pitfall: Not specifying CPU and memory requirements in Pod specifications. This typically happens because Kubernetes does not require these fields, and workloads can often start and run without them—making the omission easy to overlook in early configurations or during rapid deployment cycles.

Context: In Kubernetes, resource requests and limits are critical for efficient cluster management. Resource requests ensure that the scheduler reserves the appropriate amount of CPU and memory for each pod, guaranteeing that it has the necessary resources to operate. Resource limits cap the amount of CPU and memory a pod can use, preventing any single pod from consuming excessive resources and potentially starving other pods. When resource requests and limits are not set:

Resource Starvation: Pods may get insufficient resources, leading to degraded performance or failures. This is because Kubernetes schedules pods based on these requests. Without them, the scheduler might place too many pods on a single node, leading to resource contention and performance bottlenecks.

Resource Hoarding: Conversely, without limits, a pod might consume more than its fair share of resources, impacting the performance and stability of other pods on the same node. This can lead to issues such as other pods getting evicted or killed by the Out-Of-Memory (OOM) killer due to lack of available memory.

How to avoid it:

Start with modest requests (for example 100m CPU, 128Mi memory) and see how your app behaves.

Monitor real-world usage and refine your values; the HorizontalPodAutoscaler can help automate scaling based on metrics.

Keep an eye on kubectl top pods or your logging/monitoring tool to confirm you’re not over- or under-provisioning.

My reality check: Early on, I never thought about memory limits. Things seemed fine on my local cluster. Then, on a larger environment, Pods got OOMKilled left and right. Lesson learned. For detailed instructions on configuring resource requests and limits for your containers, please refer to Assign Memory Resources to Containers and Pods (part of the official Kubernetes documentation).

Underestimating liveness and readiness probes

The pitfall: Deploying containers without explicitly defining how Kubernetes should check their health or readiness. This tends to happen because Kubernetes will consider a container “running” as long as the process inside hasn’t exited. Without additional signals, Kubernetes assumes the workload is functioning—even if the application inside is unresponsive, initializing, or stuck.

Context:

Liveness, readiness, and startup probes are mechanisms Kubernetes uses to monitor container health and availability.

Liveness probes determine if the application is still alive. If a liveness check fails, the container is restarted.

Readiness probes control whether a container is ready to serve traffic. Until the readiness probe passes, the container is removed from Service endpoints.

Startup probes help distinguish between long startup times and actual failures.

How to avoid it:

Add a simple HTTP livenessProbe to check a health endpoint (for example /healthz) so Kubernetes can restart a hung container.

Use a readinessProbe to ensure traffic doesn’t reach your app until it’s warmed up.

Keep probes simple. Overly complex checks can create false alarms and unnecessary restarts.

My reality check: I once forgot a readiness probe for a web service that took a while to load. Users hit it prematurely, got weird timeouts, and I spent hours scratching my head. A 3-line readiness probe would have saved the day.

For comprehensive instructions on configuring liveness, readiness, and startup probes for containers, please refer to Configure Liveness, Readiness and Startup Probes in the official Kubernetes documentation.

“We’ll just look at container logs” (famous last words)

The pitfall: Relying solely on container logs retrieved via kubectl logs. This often happens because the command is quick and convenient, and in many setups, logs appear accessible during development or early troubleshooting. However, kubectl logs only retrieves logs from currently running or recently terminated containers, and those logs are stored on the node’s local disk. As soon as the container is deleted, evicted, or the node is restarted, the log files may be rotated out or permanently lost.

How to avoid it:

Centralize logs using CNCF tools like Fluentd or Fluent Bit to aggregate output from all Pods.

Adopt OpenTelemetry for a unified view of logs, metrics, and (if needed) traces. This lets you spot correlations between infrastructure events and app-level behavior.

Pair logs with Prometheus metrics to track cluster-level data alongside application logs. If you need distributed tracing, consider CNCF projects like Jaeger.

My reality check: The first time I lost Pod logs to a quick restart, I realized how flimsy “kubectl logs” can be on its own. Since then, I’ve set up a proper pipeline for every cluster to avoid missing vital clues.

Treating dev and prod exactly the same

The pitfall: Deploying the same Kubernetes manifests with identical settings across development, staging, and production environments. This often occurs when teams aim for consistency and reuse, but overlook that environment-specific factors—such as traffic patterns, resource availability, scaling needs, or access control—can differ significantly. Without customization, configurations optimized for one environment may cause instability, poor performance, or security gaps in another.

How to avoid it:

Use environment overlays or kustomize to maintain a shared base while customizing resource requests, replicas, or config for each environment.

Extract environment-specific configuration into ConfigMaps and / or Secrets. You can use a specialized tool such as Sealed Secrets to manage confidential data.

Plan for scale in production. Your dev cluster can probably get away with minimal CPU/memory, but prod might need significantly more.

My reality check: One time, I scaled up replicaCount from 2 to 10 in a tiny dev environment just to “test.” I promptly ran out of resources and spent half a day cleaning up the aftermath. Oops.

Leaving old stuff floating around

The pitfall: Leaving unused or outdated resources—such as Deployments, Services, ConfigMaps, or PersistentVolumeClaims—running in the cluster. This often happens because Kubernetes does not automatically remove resources unless explicitly instructed, and there is no built-in mechanism to track ownership or expiration. Over time, these forgotten objects can accumulate, consuming cluster resources, increasing cloud costs, and creating operational confusion, especially when stale Services or LoadBalancers continue to route traffic.

How to avoid it:

Label everything with a purpose or owner label. That way, you can easily query resources you no longer need.

Regularly audit your cluster: run kubectl get all -n <namespace> to see what’s actually running, and confirm it’s all legit.

Adopt Kubernetes’ Garbage Collection: K8s docs show how to remove dependent objects automatically.

Leverage policy automation: Tools like Kyverno can automatically delete or block stale resources after a certain period, or enforce lifecycle policies so you don’t have to remember every single cleanup step.

My reality check: After a hackathon, I forgot to tear down a “test-svc” pinned to an external load balancer. Three weeks later, I realized I’d been paying for that load balancer the entire time. Facepalm.

Diving too deep into networking too soon

The pitfall: Introducing advanced networking solutions—such as service meshes, custom CNI plugins, or multi-cluster communication—before fully understanding Kubernetes' native networking primitives. This commonly occurs when teams implement features like traffic routing, observability, or mTLS using external tools without first mastering how core Kubernetes networking works: including Pod-to-Pod communication, ClusterIP Services, DNS resolution, and basic ingress traffic handling. As a result, network-related issues become harder to troubleshoot, especially when overlays introduce additional abstractions and failure points.

How to avoid it:

Start small: a Deployment, a Service, and a basic ingress controller such as one based on NGINX (e.g., Ingress-NGINX).

Make sure you understand how traffic flows within the cluster, how service discovery works, and how DNS is configured.

Only move to a full-blown mesh or advanced CNI features when you actually need them, complex networking adds overhead.

My reality check: I tried Istio on a small internal app once, then spent more time debugging Istio itself than the actual app. Eventually, I stepped back, removed Istio, and everything worked fine.

Going too light on security and RBAC

The pitfall: Deploying workloads with insecure configurations, such as running containers as the root user, using the latest image tag, disabling security contexts, or assigning overly broad RBAC roles like cluster-admin. These practices persist because Kubernetes does not enforce strict security defaults out of the box, and the platform is designed to be flexible rather than opinionated. Without explicit securi

·kubernetes.io·Oct 20, 2025

7 Common Kubernetes Pitfalls (and How I Learned to Avoid Them)

CHAOSS Calendar - CHAOSS

·chaoss.community·Oct 20, 2025

CHAOSS Calendar - CHAOSS

The FTC Is Disappearing Blog Posts About AI Published During Lina Khan’s Tenure

The Federal Trade Commission removed several blog posts in recent months about open source and potential risks to consumers from the rapid spread of commercial AI tools.

·wired.com·Oct 20, 2025

The FTC Is Disappearing Blog Posts About AI Published During Lina Khan’s Tenure

The Making of Flux: The Future a KubeFM Original Series

The Making of Flux: The Future, a KubeFM Original Series

https://ku.bz/tVqKwNYQH

In this closing episode, Bryan Ross (Field CTO at GitLab), Jane Yan (Principal Program Manager at Microsoft), Sean O’Meara (CTO at Mirantis) and William Rizzo (Strategy Lead, CTO Office at Mirantis) discuss how GitOps evolves in practice.

How enterprises are embedding Flux into developer platforms and managed cloud services.

Why bridging CI/CD and infrastructure remains a core challenge—and how GitOps addresses it.

What leading platform teams (GitLab, Microsoft, Mirantis) see as the next frontier for GitOps.

Sponsor

Join the Flux maintainers and community at FluxCon, November 11th in Atlanta—register here

More info

Find all the links and info for this episode here: https://ku.bz/tVqKwNYQH

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 20, 2025 at 06:00AM

·kube.fm·Oct 20, 2025

The Making of Flux: The Future a KubeFM Original Series

Major AWS outage across US-East region sows chaos online

: Amazon reports DNS issues hitting DynamoDB, leaving services from Roblox to McDonald's struggling

·theregister.com·Oct 20, 2025

Major AWS outage across US-East region sows chaos online

Amazon cloud computing outage disrupts Snapchat, Robinhood and many other online services

Amazon said its cloud computing service was recovering from a major outage that disrupted online activity around the world on Monday.

·wxyz.com·Oct 20, 2025

Amazon cloud computing outage disrupts Snapchat, Robinhood and many other online services

Spotlight on Policy Working Group

https://kubernetes.io/blog/2025/10/18/wg-policy-spotlight-2025/

(Note: The Policy Working Group has completed its mission and is no longer active. This article reflects its work, accomplishments, and insights into how a working group operates.)

In the complex world of Kubernetes, policies play a crucial role in managing and securing clusters. But have you ever wondered how these policies are developed, implemented, and standardized across the Kubernetes ecosystem? To answer that, let's take a look back at the work of the Policy Working Group.

The Policy Working Group was dedicated to a critical mission: providing an overall architecture that encompasses both current policy-related implementations and future policy proposals in Kubernetes. Their goal was both ambitious and essential: to develop a universal policy architecture that benefits developers and end-users alike.

Through collaborative methods, this working group strove to bring clarity and consistency to the often complex world of Kubernetes policies. By focusing on both existing implementations and future proposals, they ensured that the policy landscape in Kubernetes remains coherent and accessible as the technology evolves.

This blog post dives deeper into the work of the Policy Working Group, guided by insights from its former co-chairs:

Jim Bugwadia

Poonam Lamba

Andy Suderman

Interviewed by Arujjwal Negi.

These co-chairs explained what the Policy Working Group was all about.

Introduction

Hello, thank you for the time! Let’s start with some introductions, could you tell us a bit about yourself, your role, and how you got involved in Kubernetes?

Jim Bugwadia: My name is Jim Bugwadia, and I am a co-founder and the CEO at Nirmata which provides solutions that automate security and compliance for cloud-native workloads. At Nirmata, we have been working with Kubernetes since it started in 2014. We initially built a Kubernetes policy engine in our commercial platform and later donated it to CNCF as the Kyverno project. I joined the CNCF Kubernetes Policy Working Group to help build and standardize various aspects of policy management for Kubernetes and later became a co-chair.

Andy Suderman: My name is Andy Suderman and I am the CTO of Fairwinds, a managed Kubernetes-as-a-Service provider. I began working with Kubernetes in 2016 building a web conferencing platform. I am an author and/or maintainer of several Kubernetes-related open-source projects such as Goldilocks, Pluto, and Polaris. Polaris is a JSON-schema-based policy engine, which started Fairwinds' journey into the policy space and my involvement in the Policy Working Group.

Poonam Lamba: My name is Poonam Lamba, and I currently work as a Product Manager for Google Kubernetes Engine (GKE) at Google. My journey with Kubernetes began back in 2017 when I was building an SRE platform for a large enterprise, using a private cloud built on Kubernetes. Intrigued by its potential to revolutionize the way we deployed and managed applications at the time, I dove headfirst into learning everything I could about it. Since then, I've had the opportunity to build the policy and compliance products for GKE. I lead and contribute to GKE CIS benchmarks. I am involved with the Gatekeeper project as well as I have contributed to Policy-WG for over 2 years and served as a co-chair for the group.

Responses to the following questions represent an amalgamation of insights from the former co-chairs.

About Working Groups

One thing even I am not aware of is the difference between a working group and a SIG. Can you help us understand what a working group is and how it is different from a SIG?

Unlike SIGs, working groups are temporary and focused on tackling specific, cross-cutting issues or projects that may involve multiple SIGs. Their lifespan is defined, and they disband once they've achieved their objective. Generally, working groups don't own code or have long-term responsibility for managing a particular area of the Kubernetes project.

(To know more about SIGs, visit the list of Special Interest Groups)

You mentioned that Working Groups involve multiple SIGS. What SIGS was the Policy WG closely involved with, and how did you coordinate with them?

The group collaborated closely with Kubernetes SIG Auth throughout our existence, and more recently, the group also worked with SIG Security since its formation. Our collaboration occurred in a few ways. We provided periodic updates during the SIG meetings to keep them informed of our progress and activities. Additionally, we utilize other community forums to maintain open lines of communication and ensured our work aligned with the broader Kubernetes ecosystem. This collaborative approach helped the group stay coordinated with related efforts across the Kubernetes community.

Policy WG

Why was the Policy Working Group created?

To enable a broad set of use cases, we recognize that Kubernetes is powered by a highly declarative, fine-grained, and extensible configuration management system. We've observed that a Kubernetes configuration manifest may have different portions that are important to various stakeholders. For example, some parts may be crucial for developers, while others might be of particular interest to security teams or address operational concerns. Given this complexity, we believe that policies governing the usage of these intricate configurations are essential for success with Kubernetes.

Our Policy Working Group was created specifically to research the standardization of policy definitions and related artifacts. We saw a need to bring consistency and clarity to how policies are defined and implemented across the Kubernetes ecosystem, given the diverse requirements and stakeholders involved in Kubernetes deployments.

Can you give me an idea of the work you did in the group?

We worked on several Kubernetes policy-related projects. Our initiatives included:

We worked on a Kubernetes Enhancement Proposal (KEP) for the Kubernetes Policy Reports API. This aims to standardize how policy reports are generated and consumed within the Kubernetes ecosystem.

We conducted a CNCF survey to better understand policy usage in the Kubernetes space. This helped gauge the practices and needs across the community at the time.

We wrote a paper that will guide users in achieving PCI-DSS compliance for containers. This is intended to help organizations meet important security standards in their Kubernetes environments.

We also worked on a paper highlighting how shifting security down can benefit organizations. This focuses on the advantages of implementing security measures earlier in the development and deployment process.

Can you tell us what were the main objectives of the Policy Working Group and some of your key accomplishments?

The charter of the Policy WG was to help standardize policy management for Kubernetes and educate the community on best practices.

To accomplish this we updated the Kubernetes documentation (Policies | Kubernetes), produced several whitepapers (Kubernetes Policy Management, Kubernetes GRC), and created the Policy Reports API (API reference) which standardizes reporting across various tools. Several popular tools such as Falco, Trivy, Kyverno, kube-bench, and others support the Policy Report API. A major milestone for the Policy WG was promoting the Policy Reports API to a SIG-level API or finding it a stable home.

Beyond that, as ValidatingAdmissionPolicy and MutatingAdmissionPolicy approached GA in Kubernetes, a key goal of the WG was to guide and educate the community on the tradeoffs and appropriate usage patterns for these built-in API objects and other CNCF policy management solutions like OPA/Gatekeeper and Kyverno.

Challenges

What were some of the major challenges that the Policy Working Group worked on?

During our work in the Policy Working Group, we encountered several challenges:

One of the main issues we faced was finding time to consistently contribute. Given that many of us have other professional commitments, it can be difficult to dedicate regular time to the working group's initiatives.

Another challenge we experienced was related to our consensus-driven model. While this approach ensures that all voices are heard, it can sometimes lead to slower decision-making processes. We valued thorough discussion and agreement, but this can occasionally delay progress on our projects.

We've also encountered occasional differences of opinion among group members. These situations require careful navigation to ensure that we maintain a collaborative and productive environment while addressing diverse viewpoints.

Lastly, we've noticed that newcomers to the group may find it difficult to contribute effectively without consistent attendance at our meetings. The complex nature of our work often requires ongoing context, which can be challenging for those who aren't able to participate regularly.

Can you tell me more about those challenges? How did you discover each one? What has the impact been? What were some strategies you used to address them?

There are no easy answers, but having more contributors and maintainers greatly helps! Overall the CNCF community is great to work with and is very welcoming to beginners. So, if folks out there are hesitating to get involved, I highly encourage them to attend a WG or SIG meeting and just listen in.

It often takes a few meetings to fully understand the discussions, so don't feel discouraged if you don't grasp everything right away. We made a point to emphasize this and encouraged new members to review documentation as a starting point for getting involved.

Additionally, differences of opinion were valued and encouraged within the Policy-WG. We adhered to the CNCF core values and resolve disagreements by maintaining respect for one another. We also strove to timebox our decisions and assign clear responsibilities to keep things movin

·kubernetes.io·Oct 18, 2025

Spotlight on Policy Working Group

Blog: Spotlight on Policy Working Group

https://www.kubernetes.dev/blog/2025/10/18/wg-policy-spotlight-2025/

(Note: The Policy Working Group has completed its mission and is no longer active. This article reflects its work, accomplishments, and insights into how a working group operates.)

In the complex world of Kubernetes, policies play a crucial role in managing and securing clusters. But have you ever wondered how these policies are developed, implemented, and standardized across the Kubernetes ecosystem? To answer that, let’s take a look back at the work of the Policy Working Group.

This blog post dives deeper into the work of the Policy Working Group, guided by insights from its former co-chairs:

Jim Bugwadia

Poonam Lamba

Andy Suderman

Interviewed by Arujjwal Negi.

These co-chairs explained what the Policy Working Group was all about.

Introduction

Hello, thank you for the time! Let’s start with some introductions, could you tell us a bit about yourself, your role, and how you got involved in Kubernetes?

Andy Suderman: My name is Andy Suderman and I am the CTO of Fairwinds, a managed Kubernetes-as-a-Service provider. I began working with Kubernetes in 2016 building a web conferencing platform. I am an author and/or maintainer of several Kubernetes-related open-source projects such as Goldilocks, Pluto, and Polaris. Polaris is a JSON-schema-based policy engine, which started Fairwinds’ journey into the policy space and my involvement in the Policy Working Group.

Poonam Lamba: My name is Poonam Lamba, and I currently work as a Product Manager for Google Kubernetes Engine (GKE) at Google. My journey with Kubernetes began back in 2017 when I was building an SRE platform for a large enterprise, using a private cloud built on Kubernetes. Intrigued by its potential to revolutionize the way we deployed and managed applications at the time, I dove headfirst into learning everything I could about it. Since then, I’ve had the opportunity to build the policy and compliance products for GKE. I lead and contribute to GKE CIS benchmarks. I am involved with the Gatekeeper project as well as I have contributed to Policy-WG for over 2 years and served as a co-chair for the group.

Responses to the following questions represent an amalgamation of insights from the former co-chairs.

About Working Groups

One thing even I am not aware of is the difference between a working group and a SIG. Can you help us understand what a working group is and how it is different from a SIG?

Unlike SIGs, working groups are temporary and focused on tackling specific, cross-cutting issues or projects that may involve multiple SIGs. Their lifespan is defined, and they disband once they’ve achieved their objective. Generally, working groups don’t own code or have long-term responsibility for managing a particular area of the Kubernetes project.

(To know more about SIGs, visit the list of Special Interest Groups)

You mentioned that Working Groups involve multiple SIGS. What SIGS was the Policy WG closely involved with, and how did you coordinate with them?

Policy WG

Why was the Policy Working Group created?

To enable a broad set of use cases, we recognize that Kubernetes is powered by a highly declarative, fine-grained, and extensible configuration management system. We’ve observed that a Kubernetes configuration manifest may have different portions that are important to various stakeholders. For example, some parts may be crucial for developers, while others might be of particular interest to security teams or address operational concerns. Given this complexity, we believe that policies governing the usage of these intricate configurations are essential for success with Kubernetes.

Can you give me an idea of the work you did in the group?

We worked on several Kubernetes policy-related projects. Our initiatives included:

We worked on a Kubernetes Enhancement Proposal (KEP) for the Kubernetes Policy Reports API. This aims to standardize how policy reports are generated and consumed within the Kubernetes ecosystem.

We conducted a CNCF survey to better understand policy usage in the Kubernetes space. This helped gauge the practices and needs across the community at the time.

We wrote a paper that will guide users in achieving PCI-DSS compliance for containers. This is intended to help organizations meet important security standards in their Kubernetes environments.

Can you tell us what were the main objectives of the Policy Working Group and some of your key accomplishments?

The charter of the Policy WG was to help standardize policy management for Kubernetes and educate the community on best practices.

Challenges

What were some of the major challenges that the Policy Working Group worked on?

During our work in the Policy Working Group, we encountered several challenges:

We’ve also encountered occasional differences of opinion among group members. These situations require careful navigation to ensure that we maintain a collaborative and productive environment while addressing diverse viewpoints.

Lastly, we’ve noticed that newcomers to the group may find it difficult to contribute effectively without consistent attendance at our meetings. The complex nature of our work often requires ongoing context, which can be challenging for those who aren’t able to participate regularly.

Can you tell me more about those challenges? How did you discover each one? What has the impact been? What were some strategies you used to address them?

It often takes a few meetings to fully understand the discussions, so don’t feel discouraged if you don’t grasp everything right away. We made a point to emphasize this and encouraged new members to review documentation as a starting point for getting involved.

·kubernetes.dev·Oct 18, 2025

Blog: Spotlight on Policy Working Group

MunGell/awesome-for-beginners

A list of awesome beginners-friendly projects

·github.com·Oct 17, 2025

MunGell/awesome-for-beginners

CISA warns of ‘significant’ threat to federal networks after nation-state hackers stole F5 source code, undisclosed bug info

The emergency directive orders all agencies to apply the latest updates for all at-risk F5 virtual and physical devices and downloaded software by October 22.

·therecord.media·Oct 17, 2025

CISA warns of ‘significant’ threat to federal networks after nation-state hackers stole F5 source code, undisclosed bug info

Last Week in Kubernetes Development - Week Ending October 12 2025

Week Ending October 12, 2025

https://lwkd.info/2025/20251015

Developer News

The ballots for the Steering Committee Elections are due on October 24th. If you haven’t already, submit your Steering votes. If you have contributed to Kubernetes in the last year but haven’t met the eligibility requirements, you will need to submit an exception request to vote in the steering election, the deadline for which is October 22nd.

The CFP for Maintainer Summit: KubeCon + CloudNativeCon Europe 2026 is open. Please send in your submissions before 14th December 2025.

SIG-Testing is continuing to improve alpha/beta feature coverage, including moving kind-beta-features to release blocking and several other beta jobs to release-informing.

Release Schedule

Next Deadline: Docs Deadline for placeholder PRs, October 23

We are in PRR freeze. Enhancements Freeze will begin this week (16th October). If you are going to miss the deadline, please file an Exception.

Patch releases have been delayed until 22nd October.

Featured PRs

134433 : kubeadm print errors during control-plane-wait retries

This PR improves troubleshooting during control plane startup by ensuring that errors encountered while waiting for control plane components are printed during each retry at log verbosity level 5. Previously, these errors were not shown, which made it harder to identify issues when components failed to become ready. With this change, administrators can now see the actual errors without additional steps, making failure causes more visible and debugging faster.

KEP of the Week

KEP-4622: New TopologyManager Policy which configure the value of maxAllowableNUMANodes

This KEP introduces a new TopologyManager policy option called max-allowable-numa-nodes, allowing users to configure the maximum number of NUMA nodes supported by the TopologyManager. Previously, this value was hardcoded to 8 as a temporary measure to prevent state explosion. By making it configurable, the KEP enables better support for high-end CPUs with more than 8 NUMA nodes, without changing existing TopologyManager policies or addressing broader resource management aspects.

This KEP is tracked as stable in v1.35

Other Merges

Enforce valid label-key format in device tolerations

Add declarative validation and path normalization for ResourceClaim fields

Remove runtime gogo protobuf dependencies from Kubernetes API types

Fix IPv6 allocator for /64 CIDRs

Add -n shorthand flag for kubectl config set-context

Add k8s.update flag to enable validation rules just for updates

Prevent panic when creating an invalid CronJob schedule

Stop calling --chunk-size beta, it’s been around since 2017

Make sure that the eviction controller knows about NoExecute device tolerations

APIApprovalController can run with contextual logging

kubeadm: show control plane retry errors

ResourceClaim: ensure that fields don’t exceed list limits, that shareID is validated, and that it supports the immutable tag and long name format

Add test for endpoint/endpointslice headless label propagation

Maybe don’t let folks create ResourceQuotas with request > limit

kubectl gets -n shorthand for --namespace

Set FeatureGates simultaneously during tests to avoid dependency problems

DeviceRequests exactly and firstAvailable shortcut some logic

Refactor away most of the dependencies on the unmaintained gogo protobuf library

Allocate within IPv6 subnets correctly

resource.k8s.io v1 API is now the default

Prometheus client can handle deprecated/missing metrics

APIserver will abort startup due to invalid CA configuration

Subprojects and Dependency Updates

headlamp v0.36.0 adds EndpointSlice support, label-based search, and clipboard copy for resource names.

cloud-provider-openstack v1.34.1 updates test dependencies and fixes build-script issues across OCCM and CSI plugins. Multiple Helm charts were also updated.

csi-driver-nfs v4.12.1 updates CSI release tools and documentation for NFS volumes.

csi-driver-smb v1.19.1 updates CSI release tooling and improves maintenance scripts.

kubespray v2.29.0 adds new configuration options, supports Kubernetes v1.33.1 and Debian 13 Trixie, and upgrades major components

prometheus v3.7.0 adds experimental anchored and smoothed rate functions, introduces NHCB, improves rule evaluation and TSDB logging, and deprecates several remote-write metrics.

via Last Week in Kubernetes Development https://lwkd.info/

October 15, 2025 at 06:00PM

·lwkd.info·Oct 17, 2025

Last Week in Kubernetes Development - Week Ending October 12 2025

US start-up Anthropic unveils cheaper model to widen AI’s appeal

Haiku 4.5 offers coding abilities similar to Sonnet 4 but at about one-third the cost, the company says.

·scmp.com·Oct 16, 2025

US start-up Anthropic unveils cheaper model to widen AI’s appeal

Amazon is planning a new wave of layoffs, sources say | Fortune

Amazon’s 10,000-plus-employee HR division, which includes recruiters, is expected to be among the hardest-hit in the job cuts.

·fortune.com·Oct 15, 2025

Amazon is planning a new wave of layoffs, sources say | Fortune

AWS Deprecates Two Dozen Services (Most of Which You've Never Heard Of)

AWS has done its quarterly housecleaning / "Googling" of its services, and deprecated what appears at first glance to be a startlingly long list. However, going through them put my mind at ease, and I'm hoping this post can do the same for you.

·lastweekinaws.com·Oct 15, 2025

AWS Deprecates Two Dozen Services (Most of Which You've Never Heard Of)

The Data Engineer's guide to optimizing Kubernetes with Niels Claeys

The Data Engineer's guide to optimizing Kubernetes, with Niels Claeys

https://ku.bz/hGRfkzDJW

Niels Claeys shares how his team at DataMinded built Conveyor, a data platform processing up to 1.5 million core hours monthly. He explains the specific optimizations they discovered through production experience, from scheduler changes that immediately reduce costs by 10-15% to achieving 97% spot instance usage without reliability issues.

You will learn:

Why the default Kubernetes scheduler wastes money on batch workloads and how switching from "least allocated" to "most allocated" scheduling enables faster scale-down and better resource utilization

How to achieve 97% spot instance adoption through strategic instance type diversification, region selection, and Spark-specific techniques

Node pool design principles that balance Kubernetes overhead with workload efficiency

Platform-specific gotchas like AWS cross-AZ data transfer costs that can spike bills unexpectedly

Sponsor

More info

Find all the links and info for this episode here: https://ku.bz/hGRfkzDJW

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

October 14, 2025 at 02:00AM

·kube.fm·Oct 14, 2025

The Data Engineer's guide to optimizing Kubernetes with Niels Claeys

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

·huggingface.co·Oct 13, 2025

Reachy Mini - The Open-Source Robot for Today's and Tomorrow's AI Builders

How to use your network to get a job

Common advice for job seekers is to "use your network" but what does that mean, exactly? Let's break...

·dev.to·Oct 13, 2025

How to use your network to get a job