54941 bookmarks

Custom sorting

Using Keybase and PGP To Build Certificate Trust Chains

We are expanding our previous experiment to include people who posses PGP keys hosted at certain domains. For now we are whitelisting Keyb...

·blog.certisfy.com·Jan 8, 2026

Using Keybase and PGP To Build Certificate Trust Chains

DevOps & AI Toolkit - DevOps Q&A - https://www.youtube.com/watch?v=glRuo5ACU3M

DevOps Q&A

In this AMA livestream, Viktor and Scott dive into your DevOps questions.

via YouTube https://www.youtube.com/watch?v=glRuo5ACU3M

·youtube.com·Jan 7, 2026

DevOps & AI Toolkit - DevOps Q&A - https://www.youtube.com/watch?v=glRuo5ACU3M

DevOps & AI Toolkit - AI Code Review Is a No-Brainer - https://www.youtube.com/watch?v=Ip3rk-6c5tk

AI Code Review Is a No-Brainer

AI Code Review Is a No-Brainer Watch the full video: https://youtu.be/65o_j4E7_lk

Shorts

via YouTube https://www.youtube.com/watch?v=Ip3rk-6c5tk

·youtube.com·Jan 7, 2026

DevOps & AI Toolkit - AI Code Review Is a No-Brainer - https://www.youtube.com/watch?v=Ip3rk-6c5tk

DevOps & AI Toolkit - DevOps & AI Q&A - https://www.youtube.com/watch?v=hV5cbJBz3cs

DevOps & AI Q&A

In this AMA livestream, Viktor, Scott, and Hilaly dive into your DevOps & AI questions.

via YouTube https://www.youtube.com/watch?v=hV5cbJBz3cs

·youtube.com·Jan 6, 2026

DevOps & AI Toolkit - DevOps & AI Q&A - https://www.youtube.com/watch?v=hV5cbJBz3cs

NVIDIA Kicks Off the Next Generation of AI With Rubin — Six New Chips, One Incredible AI Supercomputer

NVIDIA today kickstarted the next generation of AI with the launch of the NVIDIA Rubin platform, comprising six new chips designed to deliver one incredible AI supercomputer.

·nvidianews.nvidia.com·Jan 6, 2026

NVIDIA Kicks Off the Next Generation of AI With Rubin — Six New Chips, One Incredible AI Supercomputer

Agent Guardrails and Controls: Applying the CORS Model to Agents

Applying the security model of CORS to Agentic technologies to address common attacks against tool calling.

·block.github.io·Jan 6, 2026

Agent Guardrails and Controls: Applying the CORS Model to Agents

Nvidia Details New A.I. Chips and Autonomous Car Project With Mercedes

At the CES conference, Jensen Huang, the company’s chief executive, said the more efficient and powerful chip would begin shipping later this year.

·nytimes.com·Jan 6, 2026

Nvidia Details New A.I. Chips and Autonomous Car Project With Mercedes

DevOps & AI Toolkit - Your Portal Is NOT a Platform - https://www.youtube.com/watch?v=4t3fjgxTL6s

Your Portal Is NOT a Platform

Your Portal Is NOT a Platform Watch the full video: https://youtu.be/65o_j4E7_lk

Shorts

via YouTube https://www.youtube.com/watch?v=4t3fjgxTL6s

·youtube.com·Jan 6, 2026

DevOps & AI Toolkit - Your Portal Is NOT a Platform - https://www.youtube.com/watch?v=4t3fjgxTL6s

Kubernetes v1.35: Extended Toleration Operators to Support Numeric Comparisons (Alpha)

https://kubernetes.io/blog/2026/01/05/kubernetes-v1-35-numeric-toleration-operators/

Many production Kubernetes clusters blend on-demand (higher-SLA) and spot/preemptible (lower-SLA) nodes to optimize costs while maintaining reliability for critical workloads. Platform teams need a safe default that keeps most workloads away from risky capacity, while allowing specific workloads to opt-in with explicit thresholds like "I can tolerate nodes with failure probability up to 5%".

Today, Kubernetes taints and tolerations can match exact values or check for existence, but they can't compare numeric thresholds. You'd need to create discrete taint categories, use external admission controllers, or accept less-than-optimal placement decisions.

In Kubernetes v1.35, we're introducing Extended Toleration Operators as an alpha feature. This enhancement adds Gt (Greater Than) and Lt (Less Than) operators to spec.tolerations, enabling threshold-based scheduling decisions that unlock new possibilities for SLA-based placement, cost optimization, and performance-aware workload distribution.

The evolution of tolerations

Historically, Kubernetes supported two primary toleration operators:

Equal: The toleration matches a taint if the key and value are exactly equal

Exists: The toleration matches a taint if the key exists, regardless of value

While these worked well for categorical scenarios, they fell short for numeric comparisons. Starting with v1.35, we are closing this gap.

Consider these real-world scenarios:

SLA requirements: Schedule high-availability workloads only on nodes with failure probability below a certain threshold

Cost optimization: Allow cost-sensitive batch jobs to run on cheaper nodes that exceed a specific cost-per-hour value

Performance guarantees: Ensure latency-sensitive applications run only on nodes with disk IOPS or network bandwidth above minimum thresholds

Without numeric comparison operators, cluster operators have had to resort to workarounds like creating multiple discrete taint values or using external admission controllers, neither of which scale well or provide the flexibility needed for dynamic threshold-based scheduling.

Why extend tolerations instead of using NodeAffinity?

You might wonder: NodeAffinity already supports numeric comparison operators, so why extend tolerations? While NodeAffinity is powerful for expressing pod preferences, taints and tolerations provide critical operational benefits:

Policy orientation: NodeAffinity is per-pod, requiring every workload to explicitly opt-out of risky nodes. Taints invert control—nodes declare their risk level, and only pods with matching tolerations may land there. This provides a safer default; most pods stay away from spot/preemptible nodes unless they explicitly opt-in.

Eviction semantics: NodeAffinity has no eviction capability. Taints support the NoExecute effect with tolerationSeconds, enabling operators to drain and evict pods when a node's SLA degrades or spot instances receive termination notices.

Operational ergonomics: Centralized, node-side policy is consistent with other safety taints like disk-pressure and memory-pressure, making cluster management more intuitive.

This enhancement preserves the well-understood safety model of taints and tolerations while enabling threshold-based placement for SLA-aware scheduling.

Introducing Gt and Lt operators

Kubernetes v1.35 introduces two new operators for tolerations:

Gt (Greater Than): The toleration matches if the taint's numeric value is less than the toleration's value

Lt (Less Than): The toleration matches if the taint's numeric value is greater than the toleration's value

When a pod tolerates a taint with Lt, it's saying "I can tolerate nodes where this metric is less than my threshold". Since tolerations allow scheduling, the pod can run on nodes where the taint value is greater than the toleration value. Think of it as: "I tolerate nodes that are above my minimum requirements".

These operators work with numeric taint values and enable the scheduler to make sophisticated placement decisions based on continuous metrics rather than discrete categories.

Note:

Numeric values for Gt and Lt operators must be positive 64-bit integers without leading zeros. For example, "100" is valid, but "0100" (with leading zero) and "0" (zero value) are not permitted.

The Gt and Lt operators work with all taint effects: NoSchedule, NoExecute, and PreferNoSchedule.

Use cases and examples

Let's explore how Extended Toleration Operators solve real-world scheduling challenges.

Example 1: Spot instance protection with SLA thresholds

Many clusters mix on-demand and spot/preemptible nodes to optimize costs. Spot nodes offer significant savings but have higher failure rates. You want most workloads to avoid spot nodes by default, while allowing specific workloads to opt-in with clear SLA boundaries.

First, taint spot nodes with their failure probability (for example, 15% annual failure rate):

apiVersion: v1 kind: Node metadata: name: spot-node-1 spec: taints:

key: "failure-probability" value: "15" effect: "NoExecute"

On-demand nodes have much lower failure rates:

apiVersion: v1 kind: Node metadata: name: ondemand-node-1 spec: taints:

key: "failure-probability" value: "2" effect: "NoExecute"

Critical workloads can specify strict SLA requirements:

apiVersion: v1 kind: Pod metadata: name: payment-processor spec: tolerations:

key: "failure-probability" operator: "Lt" value: "5" effect: "NoExecute" tolerationSeconds: 30 containers:
name: app image: payment-app:v1

This pod will only schedule on nodes with failure-probability less than 5 (meaning ondemand-node-1 with 2% but not spot-node-1 with 15%). The NoExecute effect with tolerationSeconds: 30 means if a node's SLA degrades (for example, cloud provider changes the taint value), the pod gets 30 seconds to gracefully terminate before forced eviction.

Meanwhile, a fault-tolerant batch job can explicitly opt-in to spot instances:

apiVersion: v1 kind: Pod metadata: name: batch-job spec: tolerations:

key: "failure-probability" operator: "Lt" value: "20" effect: "NoExecute" containers:
name: worker image: batch-worker:v1

This batch job tolerates nodes with failure probability up to 20%, so it can run on both on-demand and spot nodes, maximizing cost savings while accepting higher risk.

Example 2: AI workload placement with GPU tiers

AI and machine learning workloads often have specific hardware requirements. With Extended Toleration Operators, you can create GPU node tiers and ensure workloads land on appropriately powered hardware.

Taint GPU nodes with their compute capability score:

apiVersion: v1 kind: Node metadata: name: gpu-node-a100 spec: taints:

key: "gpu-compute-score" value: "1000" effect: "NoSchedule" --- apiVersion: v1 kind: Node metadata: name: gpu-node-t4 spec: taints:
key: "gpu-compute-score" value: "500" effect: "NoSchedule"

A heavy training workload can require high-performance GPUs:

apiVersion: v1 kind: Pod metadata: name: model-training spec: tolerations:

key: "gpu-compute-score" operator: "Gt" value: "800" effect: "NoSchedule" containers:
name: trainer image: ml-trainer:v1 resources: limits: nvidia.com/gpu: 1

This ensures the training pod only schedules on nodes with compute scores greater than 800 (like the A100 node), preventing placement on lower-tier GPUs that would slow down training.

Meanwhile, inference workloads with less demanding requirements can use any available GPU:

apiVersion: v1 kind: Pod metadata: name: model-inference spec: tolerations:

key: "gpu-compute-score" operator: "Gt" value: "400" effect: "NoSchedule" containers:
name: inference image: ml-inference:v1 resources: limits: nvidia.com/gpu: 1

Example 3: Cost-optimized workload placement

For batch processing or non-critical workloads, you might want to minimize costs by running on cheaper nodes, even if they have lower performance characteristics.

Nodes can be tainted with their cost rating:

spec: taints:

key: "cost-per-hour" value: "50" effect: "NoSchedule"

A cost-sensitive batch job can express its tolerance for expensive nodes:

tolerations:

key: "cost-per-hour" operator: "Lt" value: "100" effect: "NoSchedule"

This batch job will schedule on nodes costing less than $100/hour but avoid more expensive nodes. Combined with Kubernetes scheduling priorities, this enables sophisticated cost-tiering strategies where critical workloads get premium nodes while batch workloads efficiently use budget-friendly resources.

Example 4: Performance-based placement

Storage-intensive applications often require minimum disk performance guarantees. With Extended Toleration Operators, you can enforce these requirements at the scheduling level.

tolerations:

key: "disk-iops" operator: "Gt" value: "3000" effect: "NoSchedule"

This toleration ensures the pod only schedules on nodes where disk-iops exceeds 3000. The Gt operator means "I need nodes that are greater than this minimum".

How to use this feature

Extended Toleration Operators is an alpha feature in Kubernetes v1.35. To try it out:

Enable the feature gate on both your API server and scheduler:

--feature-gates=TaintTolerationComparisonOperators=true

Taint your nodes with numeric values representing the metrics relevant to your scheduling needs:

kubectl taint nodes node-1 failure-probability=5:NoSchedule kubectl taint nodes node-2 disk-iops=5000:NoSchedule

Use the new operators in your pod specifications:

spec: tolerations:

key: "failure-probability" operator: "Lt" value: "1" effect: "NoSchedule"

Note: As an alpha feature, Extended Toleration Operators may change in future releases and should be used with caution in production environments. Always test thoroughly in non-production cluste

·kubernetes.io·Jan 6, 2026

Kubernetes v1.35: Extended Toleration Operators to Support Numeric Comparisons (Alpha)

There Is No One Left On Debian's Data Protection Team

Besides Debian's aging bug tracker interface, another challenge as the Debian Linux distribution project begins 2026 is that all volunteers have left their Data Protection Team

·phoronix.com·Jan 5, 2026

There Is No One Left On Debian's Data Protection Team

DevOps & AI Toolkit - Top 10 DevOps & AI Tools You MUST Use in 2026 - https://www.youtube.com/watch?v=65o_j4E7_lk

Top 10 DevOps & AI Tools You MUST Use in 2026

This video presents a practitioner's guide to the most essential developer tools for 2026, covering both the AI tools and the foundational technologies that remain critical. Rather than offering a neutral comparison, it shares battle-tested recommendations based on months of real-world use across AI models, coding agents, custom agent development, code review automation, vector databases, internal developer platforms, Kubernetes development environments, platform testing, and modern shell scripting.

Key recommendations include Anthropic's Claude for AI-powered software engineering, Cursor or Claude Code for coding agents depending on your workflow preference, Vercel AI SDK for building custom agents with model flexibility, CodeRabbit for automated code reviews with MCP integration, Qdrant for vector database needs, the BACK Stack for building internal developer platforms on Kubernetes, mirrord for bridging local and remote development environments, Kyverno Chainsaw for declarative platform testing, and Nushell for modern scripting with structured data handling. The video emphasizes that while agentic AI has transformed how developers work, solid foundations like testing frameworks, development environments, and platform architecture still matter—AI now intersects with all of them rather than replacing them.

DevOps #AITools #Kubernetes

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/devops/top-10-devops-tools-you-must-use-in-2026

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 DevOps and AI Tools 2026 02:10 Best AI Models for Software Engineering 06:19 Best AI Coding Agents 11:45 Building Custom AI Agents 17:15 AI Code Review Tools 21:09 Vector Databases for AI 25:06 Internal Developer Platforms 30:26 Kubernetes Dev Environments 34:56 Kubernetes Platform Testing 39:31 Modern Shell Scripting 42:46 What to Use in 2026

via YouTube https://www.youtube.com/watch?v=65o_j4E7_lk

·youtube.com·Jan 5, 2026

DevOps & AI Toolkit - Top 10 DevOps & AI Tools You MUST Use in 2026 - https://www.youtube.com/watch?v=65o_j4E7_lk

How to keep Open Source open without leaving our communities open to threats

The Four Freedoms (defined ~40 years ago) and the Four Opens (~15 years ago) for Open Source provided canonical definitions for what are ...

·media.ccc.de·Jan 4, 2026

How to keep Open Source open without leaving our communities open to threats

Kubernetes v1.35: New level of efficiency with in-place Pod restart

https://kubernetes.io/blog/2026/01/02/kubernetes-v1-35-restart-all-containers/

The release of Kubernetes 1.35 introduces a powerful new feature that provides a much-requested capability: the ability to trigger a full, in-place restart of the Pod. This feature, Restart All Containers (alpha in 1.35), allows for an efficient way to reset a Pod's state compared to resource-intensive approach of deleting and recreating the entire Pod. This feature is especially useful for AI/ML workloads allowing application developers to concentrate on their core training logic while offloading complex failure-handling and recovery mechanisms to sidecars and declarative Kubernetes configuration. With RestartAllContainers and other planned enhancements, Kubernetes continues to add building blocks for creating the most flexible, robust, and efficient platforms for AI/ML workloads.

This new functionality is available by enabling the RestartAllContainersOnContainerExits feature gate. This alpha feature extends the Container Restart Rules feature, which graduated to beta in Kubernetes 1.35.

The problem: when a single container restart isn't enough and recreating pods is too costly

Kubernetes has long supported restart policies at the Pod level (restartPolicy) and, more recently, at the individual container level. These policies are great for handling crashes in a single, isolated process. However, many modern applications have more complex inter-container dependencies. For instance:

An init container prepares the environment by mounting a volume or generating a configuration file. If the main application container corrupts this environment, simply restarting that one container is not enough. The entire initialization process needs to run again.

A watcher sidecar monitors system health. If it detects an unrecoverable but retriable error state, it must trigger a restart of the main application container from a clean slate.

A sidecar that manages a remote resource fails. Even if the sidecar restarts on its own, the main container may be stuck trying to access an outdated or broken connection.

In all these cases, the desired action is not to restart a single container, but all of them. Previously, the only way to achieve this was to delete the Pod and have a controller (like a Job or ReplicaSet) create a new one. This process is slow and expensive, involving the scheduler, node resource allocation and re-initialization of networking and storage.

This inefficiency becomes even worse when handling large-scale AI/ML workloads (>= 1,000 Nodes with one Pod per Node). A common requirement for these synchronous workloads is that when a failure occurs (such as a Node crash), all Pods in the fleet must be recreated to reset the state before training can resume, even if all the other Pods were not directly affected by the failure. Deleting, creating and scheduling thousands of Pods simultaneously creates a massive bottleneck. The estimated overhead of this failure could cost $100,000 per month in wasted resources.

Handling these failures for AI/ML training jobs requires a complex integration touching both the training framework and Kubernetes, which are often fragile and toilsome. This feature introduces a Kubernetes-native solution, improving system robustness and allowing application developers to concentrate on their core training logic.

Another major benefit of restarting Pods in place is that keeping Pods on their assigned Nodes allows for further optimizations. For example, one can implement node-level caching tied to a specific Pod identity, something that is impossible when Pods are unnecessarily being recreated on different Nodes.

Introducing the RestartAllContainers action

To address this, Kubernetes v1.35 adds a new action to the container restart rules: RestartAllContainers. When a container exits in a way that matches a rule with this action, the kubelet initiates a fast, in-place restart of the Pod.

This in-place restart is highly efficient because it preserves the Pod's most important resources:

The Pod's UID, IP address and network namespace.

The Pod's sandbox and any attached devices.

All volumes, including emptyDir and mounted volumes from PVCs.

After terminating all running containers, the Pod's startup sequence is re-executed from the very beginning. This means all init containers are run again in order, followed by the sidecar and regular containers, ensuring a completely fresh start in a known-good environment. With the exception of ephemeral containers (which are terminated), all other containers—including those that previously succeeded or failed—will be restarted, regardless of their individual restart policies.

Use cases

Efficient restarts for ML/Batch jobs

For ML training jobs, rescheduling a worker Pod on failure is a costly operation that wastes valuable compute resources. On a 1,000-node training cluster, rescheduling overhead can waste over $100,000 in compute resources monthly.

With RestartAllContainers actions you can address this by enabling a much faster, hybrid recovery strategy: recreate only the "bad" Pods (e.g., those on unhealthy Nodes) while triggering RestartAllContainers for the remaining healthy Pods. Benchmarks show this reduces the recovery overhead from minutes to a few seconds.

With in-place restarts, a watcher sidecar can monitor the main training process. If it encounters a specific, retriable error, the watcher can exit with a designated code to trigger a fast reset of the worker Pod, allowing it to restart from the last checkpoint without involving the Job controller. This capability is now natively supported by Kubernetes.

Read more details about future development and JobSet features at KEP-467 JobSet in-place restart.

apiVersion: v1 kind: Pod metadata: name: ml-worker-pod spec: restartPolicy: Never initContainers:

This init container will re-run on every in-place restart

name: setup-environment image: my-repo/setup-worker:1.0
name: watcher-sidecar image: my-repo/watcher:1.0 restartPolicy: Always restartPolicyRules:
action: RestartAllContainers onExit: exitCodes: operator: In # A specific exit code from the watcher triggers a full pod restart values: [88] containers:
name: main-application image: my-repo/training-app:1.0

Re-running init containers for a clean state

Imagine a scenario where an init container is responsible for fetching credentials or setting up a shared volume. If the main application fails in a way that corrupts this shared state, you need the init container to rerun.

By configuring the main application to exit with a specific code upon detecting such a corruption, you can trigger the RestartAllContainers action, guaranteeing that the init container provides a clean setup before the application restarts.

Handling high rate of similar tasks execution

There are cases when tasks are best represented as a Pod execution. And each task requires a clean execution. The task may be a game session backend or some queue item processing. If the rate of tasks is high, running the whole cycle of Pod creation, scheduling and initialization is simply too expensive, especially when tasks can be short. The ability to restart all containers from scratch enables a Kubernetes-native way to handle this scenario without custom solutions or frameworks.

How to use it

To try this feature, you must enable the RestartAllContainersOnContainerExits feature gate on your Kubernetes cluster components (API server and kubelet) running Kubernetes v1.35+. This alpha feature extends the ContainerRestartRules feature, which graduated to beta in v1.35 and is enabled by default.

Once enabled, you can add restartPolicyRules to any container (init, sidecar, or regular) and use the RestartAllContainers action.

The feature is designed to be easily usable on existing apps. However, if an application does not follow some best practices, it may cause issues for the application or for observability tooling. When enabling the feature, make sure that all containers are reentrant and that external tooling is prepared for init containers to re-run. Also, when restarting all containers, the kubelet does not run preStop hooks. This means containers must be designed to handle abrupt termination without relying on preStop hooks for graceful shutdown.

Observing the restart

To make this process observable, a new Pod condition, AllContainersRestarting, is added to the Pod's status. When a restart is triggered, this condition becomes True and it reverts to False once all containers have terminated and the Pod is ready to start its lifecycle anew. This provides a clear signal to users and other cluster components about the Pod's state.

All containers restarted by this action will have their restart count incremented in the container status.

Learn more

Read the official documentation on Pod Lifecycle.

Read the detailed proposal in the KEP-5532: Restart All Containers on Container Exits.

Read the proposal for JobSet in-place restart in JobSet issue #467.

We want your feedback!

As an alpha feature, RestartAllContainers is ready for you to experiment with and any use cases and feedback are welcome. This feature is driven by the SIG Node community. If you are interested in getting involved, sharing your thoughts, or contributing, please join us!

You can reach SIG Node through:

Slack: #sig-node

Mailing list

via Kubernetes Blog https://kubernetes.io/

January 02, 2026 at 01:30PM

·kubernetes.io·Jan 3, 2026

Kubernetes v1.35: New level of efficiency with in-place Pod restart

The HBR Charts that Help Explain 2025

Workslop, economic indicators, finding joy, and more.

·hbr.org·Jan 1, 2026

The HBR Charts that Help Explain 2025

The 10 Most Popular HBR Articles of 2025

Our top reads on AI workslop, visualizing strategy, Taylor Swift, and more.

·hbr.org·Jan 1, 2026

Kubernetes 1.35: Enhanced Debugging with Versioned z-pages APIs

https://kubernetes.io/blog/2025/12/31/kubernetes-v1-35-structured-zpages/

Debugging Kubernetes control plane components can be challenging, especially when you need to quickly understand the runtime state of a component or verify its configuration. With Kubernetes 1.35, we're enhancing the z-pages debugging endpoints with structured, machine-parseable responses that make it easier to build tooling and automate troubleshooting workflows.

What are z-pages?

z-pages are special debugging endpoints exposed by Kubernetes control plane components. Introduced as an alpha feature in Kubernetes 1.32, these endpoints provide runtime diagnostics for components like kube-apiserver, kube-controller-manager, kube-scheduler, kubelet and kube-proxy. The name "z-pages" comes from the convention of using /*z paths for debugging endpoints.

Currently, Kubernetes supports two primary z-page endpoints:

/statusz

Displays high-level component information including version information, start time, uptime, and available debug paths

/flagz

Shows all command-line arguments and their values used to start the component (with confidential values redacted for security)

These endpoints are valuable for human operators who need to quickly inspect component state, but until now, they only returned plain text output that was difficult to parse programmatically.

What's new in Kubernetes 1.35?

Kubernetes 1.35 introduces structured, versioned responses for both /statusz and /flagz endpoints. This enhancement maintains backward compatibility with the existing plain text format while adding support for machine-readable JSON responses.

Backward compatible design

The new structured responses are opt-in. Without specifying an Accept header, the endpoints continue to return the familiar plain text format:

$ curl --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \ --key /etc/kubernetes/pki/apiserver-kubelet-client.key \ --cacert /etc/kubernetes/pki/ca.crt \ https://localhost:6443/statusz kube-apiserver statusz Warning: This endpoint is not meant to be machine parseable, has no formatting compatibility guarantees and is for debugging purposes only. Started: Wed Oct 16 21:03:43 UTC 2024 Up: 0 hr 00 min 16 sec Go version: go1.23.2 Binary version: 1.35.0-alpha.0.1595 Emulation version: 1.35 Paths: /healthz /livez /metrics /readyz /statusz /version

Structured JSON responses

To receive a structured response, include the appropriate Accept header:

Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz

This returns a versioned JSON response:

{ "kind": "Statusz", "apiVersion": "config.k8s.io/v1alpha1", "metadata": { "name": "kube-apiserver" }, "startTime": "2025-10-29T00:30:01Z", "uptimeSeconds": 856, "goVersion": "go1.23.2", "binaryVersion": "1.35.0", "emulationVersion": "1.35", "paths": [ "/healthz", "/livez", "/metrics", "/readyz", "/statusz", "/version" ] }

Similarly, /flagz supports structured responses with the header:

Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Flagz

Example response:

{ "kind": "Flagz", "apiVersion": "config.k8s.io/v1alpha1", "metadata": { "name": "kube-apiserver" }, "flags": { "advertise-address": "192.168.8.4", "allow-privileged": "true", "authorization-mode": "[Node,RBAC]", "enable-priority-and-fairness": "true", "profiling": "true" } }

Why structured responses matter

The addition of structured responses opens up several new possibilities:

Automated health checks and monitoring

Instead of parsing plain text, monitoring tools can now easily extract specific fields. For example, you can programmatically check if a component has been running with an unexpected emulated version or verify that critical flags are set correctly.

Better debugging tools

Developers can build sophisticated debugging tools that compare configurations across multiple components or track configuration drift over time. The structured format makes it trivial to diff configurations or validate that components are running with expected settings.

API versioning and stability

By introducing versioned APIs (starting with v1alpha1), we provide a clear path to stability. As the feature matures, we'll introduce v1beta1 and eventually v1, giving you confidence that your tooling won't break with future Kubernetes releases.

How to use structured z-pages

Prerequisites

Both endpoints require feature gates to be enabled:

/statusz: Enable the ComponentStatusz feature gate

/flagz: Enable the ComponentFlagz feature gate

Example: Getting structured responses

Here's an example using curl to retrieve structured JSON responses from the kube-apiserver:

Get structured statusz response

curl \ --cert /etc/kubernetes/pki/apiserver-kubelet-client.crt \ --key /etc/kubernetes/pki/apiserver-kubelet-client.key \ --cacert /etc/kubernetes/pki/ca.crt \ -H "Accept: application/json;v=v1alpha1;g=config.k8s.io;as=Statusz" \ https://localhost:6443/statusz | jq .

Get structured flagz response

Note: The examples above use client certificate authentication and verify the server's certificate using --cacert. If you need to bypass certificate verification in a test environment, you can use --insecure (or -k), but this should never be done in production as it makes you vulnerable to man-in-the-middle attacks.

Important considerations

Alpha feature status

The structured z-page responses are an alpha feature in Kubernetes 1.35. This means:

The API format may change in future releases

These endpoints are intended for debugging, not production automation

You should avoid relying on them for critical monitoring workflows until they reach beta or stable status

Security and access control

z-pages expose internal component information and require proper access controls. Here are the key security considerations:

Authorization: Access to z-page endpoints is restricted to members of the system:monitoring group, which follows the same authorization model as other debugging endpoints like /healthz, /livez, and /readyz. This ensures that only authorized users and service accounts can access debugging information. If your cluster uses RBAC, you can manage access by granting appropriate permissions to this group.

Authentication: The authentication requirements for these endpoints depend on your cluster's configuration. Unless anonymous authentication is enabled for your cluster, you typically need to use authentication mechanisms (such as client certificates) to access these endpoints.

Information disclosure: These endpoints reveal configuration details about your cluster components, including:

Component versions and build information

All command-line arguments and their values (with confidential values redacted)

Available debug endpoints

Only grant access to trusted operators and debugging tools. Avoid exposing these endpoints to unauthorized users or automated systems that don't require this level of access.

Future evolution

As the feature matures, we (Kubernetes SIG Instrumentation) expect to:

Introduce v1beta1 and eventually v1 versions of the API

Gather community feedback on the response schema

Potentially add additional z-page endpoints based on user needs

Try it out

We encourage you to experiment with structured z-pages in a test environment:

Enable the ComponentStatusz and ComponentFlagz feature gates on your control plane components

Try querying the endpoints with both plain text and structured formats

Build a simple tool or script that uses the structured data

Share your feedback with the community

Learn more

z-pages documentation

KEP-4827: Component Statusz

KEP-4828: Component Flagz

Join the discussion in the #sig-instrumentation channel on Kubernetes Slack

Get involved

We'd love to hear your feedback! The structured z-pages feature is designed to make Kubernetes easier to debug and monitor. Whether you're building internal tooling, contributing to open source projects, or just exploring the feature, your input helps shape the future of Kubernetes observability.

If you have questions, suggestions, or run into issues, please reach out to SIG Instrumentation. You can find us on Slack or at our regular community meetings.

Happy debugging!

via Kubernetes Blog https://kubernetes.io/

December 31, 2025 at 01:30PM

·kubernetes.io·Jan 1, 2026

Kubernetes 1.35: Enhanced Debugging with Versioned z-pages APIs

Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager

https://kubernetes.io/blog/2025/12/30/kubernetes-v1-35-watch-based-route-reconciliation-in-ccm/

Up to and including Kubernetes v1.34, the route controller in Cloud Controller Manager (CCM) implementations built using the k8s.io/cloud-provider library reconciles routes at a fixed interval. This causes unnecessary API requests to the cloud provider when there are no changes to routes. Other controllers implemented through the same library already use watch-based mechanisms, leveraging informers to avoid unnecessary API calls. A new feature gate is being introduced in v1.35 to allow changing the behavior of the route controller to use watch-based informers.

What's new?

The feature gate CloudControllerManagerWatchBasedRoutesReconciliation has been introduced to k8s.io/cloud-provider in alpha stage by SIG Cloud Provider. To enable this feature you can use --feature-gate=CloudControllerManagerWatchBasedRoutesReconciliation=true in the CCM implementation you are using.

About the feature gate

This feature gate will trigger the route reconciliation loop whenever a node is added, deleted, or the fields .spec.podCIDRs or .status.addresses are updated.

An additional reconcile is performed in a random interval between 12h and 24h, which is chosen at the controller's start time.

This feature gate does not modify the logic within the reconciliation loop. Therefore, users of a CCM implementation should not experience significant changes to their existing route configurations.

How can I learn more?

For more details, refer to the KEP-5237.

via Kubernetes Blog https://kubernetes.io/

December 30, 2025 at 01:30PM

·kubernetes.io·Dec 31, 2025

Kubernetes v1.35: Watch Based Route Reconciliation in the Cloud Controller Manager

Kubernetes v1.35: Introducing Workload Aware Scheduling

https://kubernetes.io/blog/2025/12/29/kubernetes-v1-35-introducing-workload-aware-scheduling/

Scheduling large workloads is a much more complex and fragile operation than scheduling a single Pod, as it often requires considering all Pods together instead of scheduling each one independently. For example, when scheduling a machine learning batch job, you often need to place each worker strategically, such as on the same rack, to make the entire process as efficient as possible. At the same time, the Pods that are part of such a workload are very often identical from the scheduling perspective, which fundamentally changes how this process should look.

There are many custom schedulers adapted to perform workload scheduling efficiently, but considering how common and important workload scheduling is to Kubernetes users, especially in the AI era with the growing number of use cases, it is high time to make workloads a first-class citizen for kube-scheduler and support them natively.

Workload aware scheduling

The recent 1.35 release of Kubernetes delivered the first tranche of workload aware scheduling improvements. These are part of a wider effort that is aiming to improve scheduling and management of workloads. The effort will span over many SIGs and releases, and is supposed to gradually expand capabilities of the system toward reaching the north star goal, which is seamless workload scheduling and management in Kubernetes including, but not limited to, preemption and autoscaling.

Kubernetes v1.35 introduces the Workload API that you can use to describe the desired shape as well as scheduling-oriented requirements of the workload. It comes with an initial implementation of gang scheduling that instructs the kube-scheduler to schedule gang Pods in the all-or-nothing fashion. Finally, we improved scheduling of identical Pods (that typically make a gang) to speed up the process thanks to the opportunistic batching feature.

Workload API

The new Workload API resource is part of the scheduling.k8s.io/v1alpha1 API group. This resource acts as a structured, machine-readable definition of the scheduling requirements of a multi-Pod application. While user-facing workloads like Jobs define what to run, the Workload resource determines how a group of Pods should be scheduled and how its placement should be managed throughout its lifecycle.

A Workload allows you to define a group of Pods and apply a scheduling policy to them. Here is what a gang scheduling configuration looks like. You can define a podGroup named workers and apply the gang policy with a minCount of 4.

apiVersion: scheduling.k8s.io/v1alpha1 kind: Workload metadata: name: training-job-workload namespace: some-ns spec: podGroups:

name: workers policy: gang: # The gang is schedulable only if 4 pods can run at once minCount: 4

When you create your Pods, you link them to this Workload using the new workloadRef field:

apiVersion: v1 kind: Pod metadata: name: worker-0 namespace: some-ns spec: workloadRef: name: training-job-workload podGroup: workers ...

How gang scheduling works

The gang policy enforces all-or-nothing placement. Without gang scheduling, a Job might be partially scheduled, consuming resources without being able to run, leading to resource wastage and potential deadlocks.

When you create Pods that are part of a gang-scheduled pod group, the scheduler's GangScheduling plugin manages the lifecycle independently for each pod group (or replica key):

When you create your Pods (or a controller makes them for you), the scheduler blocks them from scheduling, until:

The referenced Workload object is created.

The referenced pod group exists in a Workload.

The number of pending Pods in that group meets your minCount.

Once enough Pods arrive, the scheduler tries to place them. However, instead of binding them to nodes immediately, the Pods wait at a Permit gate.

The scheduler checks if it has found valid assignments for the entire group (at least the minCount).

If there is room for the group, the gate opens, and all Pods are bound to nodes.

If only a subset of the group pods was successfully scheduled within a timeout (set to 5 minutes), the scheduler rejects all of the Pods in the group. They go back to the queue, freeing up the reserved resources for other workloads.

We'd like to point out that that while this is a first implementation, the Kubernetes project firmly intends to improve and expand the gang scheduling algorithm in future releases. Benefits we hope to deliver include a single-cycle scheduling phase for a whole gang, workload-level preemption, and more, moving towards the north star goal.

Opportunistic batching

In addition to explicit gang scheduling, v1.35 introduces opportunistic batching. This is a Beta feature that improves scheduling latency for identical Pods.

Unlike gang scheduling, this feature does not require the Workload API or any explicit opt-in on the user's part. It works opportunistically within the scheduler by identifying Pods that have identical scheduling requirements (container images, resource requests, affinities, etc.). When the scheduler processes a Pod, it can reuse the feasibility calculations for subsequent identical Pods in the queue, significantly speeding up the process.

Most users will benefit from this optimization automatically, without taking any special steps, provided their Pods meet the following criteria.

Restrictions

Opportunistic batching works under specific conditions. All fields used by the kube-scheduler to find a placement must be identical between Pods. Additionally, using some features disables the batching mechanism for those Pods to ensure correctness.

Note that you may need to review your kube-scheduler configuration to ensure it is not implicitly disabling batching for your workloads.

See the docs for more details about restrictions.

The north star vision

The project has a broad ambition to deliver workload aware scheduling. These new APIs and scheduling enhancements are just the first steps. In the near future, the effort aims to tackle:

Introducing a workload scheduling phase

Improved support for multi-node DRA and topology aware scheduling

Workload-level preemption

Improved integration between scheduling and autoscaling

Improved interaction with external workload schedulers

Managing placement of workloads throughout their entire lifecycle

Multi-workload scheduling simulations

And more. The priority and implementation order of these focus areas are subject to change. Stay tuned for further updates.

Getting started

To try the workload aware scheduling improvements:

Workload API: Enable the GenericWorkload feature gate on both kube-apiserver and kube-scheduler, and ensure the scheduling.k8s.io/v1alpha1 API group is enabled.

Gang scheduling: Enable the GangScheduling feature gate on kube-scheduler (requires the Workload API to be enabled).

Opportunistic batching: As a Beta feature, it is enabled by default in v1.35. You can disable it using the OpportunisticBatching feature gate on kube-scheduler if needed.

We encourage you to try out workload aware scheduling in your test clusters and share your experiences to help shape the future of Kubernetes scheduling. You can send your feedback by:

Reaching out via Slack (#sig-scheduling).

Commenting on the workload aware scheduling tracking issue

Filing a new issue in the Kubernetes repository.

Learn more

Read the KEPs for Workload API and gang scheduling and Opportunistic batching.

Track the Workload aware scheduling issue for recent updates.

via Kubernetes Blog https://kubernetes.io/

December 29, 2025 at 01:30PM

·kubernetes.io·Dec 30, 2025

Kubernetes v1.35: Introducing Workload Aware Scheduling

DevOps & AI Toolkit - Stop Trusting kubectl get all! Here's What It Hides From You - https://www.youtube.com/watch?v=UEkhIMx6B6E

Stop Trusting kubectl get all! Here's What It Hides From You

Ever wondered why kubectl get all doesn't actually get all your resources? It conveniently ignores Ingresses, PersistentVolumeClaims, and many other resource types. Even worse, when you do list everything in a namespace, you're left staring at a pile of disconnected objects with no way to understand how they relate to each other or form complete systems. This video dives into this fundamental Kubernetes problem and explores how ownerReferences and resource hierarchies actually work under the hood.

To solve this challenge, I built a custom Solution CRD that wraps related resources into logical groups with clear context, intent, and aggregated status. Instead of manually piecing together which Deployments, Services, Ingresses, and databases belong to the same application, you can now define and query complete solutions as first-class citizens in your cluster. I'll walk you through the problem, demonstrate tools like kubectl-tree for exploring ownership hierarchies, and show you how this simple CRD approach finally answers questions like "What is this app?" and "Is my entire system healthy?" Check out the project at github.com/vfarcic/dot-ai-controller if you want to try it yourself.

Kubernetes #CRD #DevOps

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/stop-trusting-kubectl-get-all-heres-what-it-hides-from-you 🔗 DevOps AI Toolkit Controller: https://github.com/vfarcic/dot-ai-controller

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Resource Relations 01:13 What Is This Thing in Kubernetes? 04:50 Kubernetes ownerReferences and Garbage Collection 08:30 Solving Resource Grouping with CRDs

via YouTube https://www.youtube.com/watch?v=UEkhIMx6B6E

·youtube.com·Dec 29, 2025

DevOps & AI Toolkit - Stop Trusting kubectl get all! Here's What It Hides From You - https://www.youtube.com/watch?v=UEkhIMx6B6E

Kubernetes v1.35: Fine-grained Supplemental Groups Control Graduates to GA

https://kubernetes.io/blog/2025/12/23/kubernetes-v1-35-fine-grained-supplementalgroups-control-ga/

On behalf of Kubernetes SIG Node, we are pleased to announce the graduation of fine-grained supplemental groups control to General Availability (GA) in Kubernetes v1.35!

The new Pod field, supplementalGroupsPolicy, was introduced as an opt-in alpha feature for Kubernetes v1.31, and then had graduated to beta in v1.33. Now, the feature is generally available. This feature allows you to implement more precise control over supplemental groups in Linux containers that can strengthen the security posture particularly in accessing volumes. Moreover, it also enhances the transparency of UID/GID details in containers, offering improved security oversight.

If you are planning to upgrade your cluster from v1.32 or an earlier version, please be aware that some behavioral breaking change introduced since beta (v1.33). For more details, see the behavioral changes introduced in beta and the upgrade considerations sections of the previous blog for graduation to beta.

Motivation: Implicit group memberships defined in /etc/group in the container image

Even though the majority of Kubernetes cluster admins/users may not be aware of this, by default Kubernetes merges group information from the Pod with information defined in /etc/group in the container image.

Here's an example; a Pod manifest that specifies spec.securityContext.runAsUser: 1000, spec.securityContext.runAsGroup: 3000 and spec.securityContext.supplementalGroups: 4000 as part of the Pod's security context.

apiVersion: v1 kind: Pod metadata: name: implicit-groups-example spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] containers:

name: example-container image: registry.k8s.io/e2e-test-images/agnhost:2.45 command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false

What is the result of id command in the example-container container? The output should be similar to this:

uid=1000 gid=3000 groups=3000,4000,50000

Where does group ID 50000 in supplementary groups (groups field) come from, even though 50000 is not defined in the Pod's manifest at all? The answer is /etc/group file in the container image.

Checking the contents of /etc/group in the container image contains something like the following:

user-defined-in-image:x:1000: group-defined-in-image:x:50000:user-defined-in-image

This shows that the container's primary user 1000 belongs to the group 50000 in the last entry.

Thus, the group membership defined in /etc/group in the container image for the container's primary user is implicitly merged to the information from the Pod. Please note that this was a design decision the current CRI implementations inherited from Docker, and the community never really reconsidered it until now.

What's wrong with it?

The implicitly merged group information from /etc/group in the container image poses a security risk. These implicit GIDs can't be detected or validated by policy engines because there's no record of them in the Pod manifest. This can lead to unexpected access control issues, particularly when accessing volumes (see kubernetes/kubernetes#112879 for details) because file permission is controlled by UID/GIDs in Linux.

Fine-grained supplemental groups control in a Pod: supplementaryGroupsPolicy

To tackle this problem, a Pod's .spec.securityContext now includes supplementalGroupsPolicy field.

This field lets you control how Kubernetes calculates the supplementary groups for container processes within a Pod. The available policies are:

Merge: The group membership defined in /etc/group for the container's primary user will be merged. If not specified, this policy will be applied (i.e. as-is behavior for backward compatibility).

Strict: Only the group IDs specified in fsGroup, supplementalGroups, or runAsGroup are attached as supplementary groups to the container processes. Group memberships defined in /etc/group for the container's primary user are ignored.

I'll explain how the Strict policy works. The following Pod manifest specifies supplementalGroupsPolicy: Strict:

apiVersion: v1 kind: Pod metadata: name: strict-supplementalgroups-policy-example spec: securityContext: runAsUser: 1000 runAsGroup: 3000 supplementalGroups: [4000] supplementalGroupsPolicy: Strict containers:

name: example-container image: registry.k8s.io/e2e-test-images/agnhost:2.45 command: [ "sh", "-c", "sleep 1h" ] securityContext: allowPrivilegeEscalation: false

The result of id command in the example-container container should be similar to this:

uid=1000 gid=3000 groups=3000,4000

You can see Strict policy can exclude group 50000 from groups!

Thus, ensuring supplementalGroupsPolicy: Strict (enforced by some policy mechanism) helps prevent the implicit supplementary groups in a Pod.

Note:

A container with sufficient privileges can change its process identity. The supplementalGroupsPolicy only affect the initial process identity.

Read on for more details.

Attached process identity in Pod status

This feature also exposes the process identity attached to the first container process of the container via .status.containerStatuses[].user.linux field. It would be helpful to see if implicit group IDs are attached.

... status: containerStatuses:

name: ctr user: linux: gid: 3000 supplementalGroups:
3000
4000 uid: 1000 ...

Note:

Please note that the values in status.containerStatuses[].user.linux field is the firstly attached process identity to the first container process in the container. If the container has sufficient privilege to call system calls related to process identity (e.g. setuid(2), setgid(2) or setgroups(2), etc.), the container process can change its identity. Thus, the actual process identity will be dynamic.

There are several ways to restrict these permissions in containers. We suggest the belows as simple solutions:

setting privilege: false and allowPrivilegeEscalation: false in your container's securityContext, or

conform your pod to Restricted policy in Pod Security Standard.

Also, kubelet has no visibility into NRI plugins or container runtime internal workings. Cluster Administrator configuring nodes or highly privilege workloads with the permission of a local administrator may change supplemental groups for any pod. However this is outside of a scope of Kubernetes control and should not be a concern for security-hardened nodes.

Strict policy requires up-to-date container runtimes

The high level container runtime (e.g. containerd, CRI-O) plays a key role for calculating supplementary group ids that will be attached to the containers. Thus, supplementalGroupsPolicy: Strict requires a CRI runtime that support this feature. The old behavior (supplementalGroupsPolicy: Merge) can work with a CRI runtime that does not support this feature, because this policy is fully backward compatible.

Here are some CRI runtimes that support this feature, and the versions you need to be running:

containerd: v2.0 or later

CRI-O: v1.31 or later

And, you can see if the feature is supported in the Node's .status.features.supplementalGroupsPolicy field. Please note that this field is different from status.declaredFeatures introduced in KEP-5328: Node Declared Features(formerly Node Capabilities).

apiVersion: v1 kind: Node ... status: features: supplementalGroupsPolicy: true

As container runtimes support this feature universally, various security policies may start enforcing the Strict behavior as more secure. It is the best practice to ensure that your Pods are ready for this enforcement and all supplemental groups are transparently declared in Pod spec, rather than in images.

Getting involved

This enhancement was driven by the SIG Node community. Please join us to connect with the community and share your ideas and feedback around the above feature and beyond. We look forward to hearing from you!

How can I learn more?

Configure a Security Context for a Pod or Container for the further details of supplementalGroupsPolicy

KEP-3619: Fine-grained SupplementalGroups control

via Kubernetes Blog https://kubernetes.io/

December 23, 2025 at 01:30PM

·kubernetes.io·Dec 23, 2025

Kubernetes v1.35: Fine-grained Supplemental Groups Control Graduates to GA

Europe gets serious about cutting US digital umbilical cord

Feature: Public bodies migrate in the bloc as hyperscalers claim sovereignty

·theregister.com·Dec 23, 2025

Europe gets serious about cutting US digital umbilical cord

Docker v29, and the fall-out

·portainer.io·Dec 23, 2025

Docker v29, and the fall-out

Kubernetes v1.35: Kubelet Configuration Drop-in Directory Graduates to GA

https://kubernetes.io/blog/2025/12/22/kubernetes-v1-35-kubelet-config-drop-in-directory-ga/

With the recent v1.35 release of Kubernetes, support for a kubelet configuration drop-in directory is generally available. The newly stable feature simplifies the management of kubelet configuration across large, heterogeneous clusters.

With v1.35, the kubelet command line argument --config-dir is production-ready and fully supported, allowing you to specify a directory containing kubelet configuration drop-in files. All files in that directory will be automatically merged with your main kubelet configuration. This allows cluster administrators to maintain a cohesive base configuration for kubelets while enabling targeted customizations for different node groups or use cases, and without complex tooling or manual configuration management.

The problem: managing kubelet configuration at scale

As Kubernetes clusters grow larger and more complex, they often include heterogeneous node pools with different hardware capabilities, workload requirements, and operational constraints. This diversity necessitates different kubelet configurations across node groups—yet managing these varied configurations at scale becomes increasingly challenging. Several pain points emerge:

Configuration drift: Different nodes may have slightly different configurations, leading to inconsistent behavior

Node group customization: GPU nodes, edge nodes, and standard compute nodes often require different kubelet settings

Operational overhead: Maintaining separate, complete configuration files for each node type is error-prone and difficult to audit

Change management: Rolling out configuration changes across heterogeneous node pools requires careful coordination

Before this support was added to Kubernetes, cluster administrators had to choose between using a single monolithic configuration file for all nodes, manually maintaining multiple complete configuration files, or relying on separate tooling. Each approach had its own drawbacks. This graduation to stable gives cluster administrators a fully supported fourth way to solve that challenge.

Example use cases

Managing heterogeneous node pools

Consider a cluster with multiple node types: standard compute nodes, high-capacity nodes (such as those with GPUs or large amounts of memory), and edge nodes with specialized requirements.

Base configuration

File: 00-base.conf

apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration clusterDNS:

"10.96.0.10" clusterDomain: cluster.local

High-capacity node override

File: 50-high-capacity-nodes.conf

apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration maxPods: 50 systemReserved: memory: "4Gi" cpu: "1000m"

Edge node override

File: 50-edge-nodes.conf (edge compute typically has lower capacity)

apiVersion: kubelet.config.k8s.io/v1beta1 kind: KubeletConfiguration evictionHard: memory.available: "500Mi" nodefs.available: "5%"

With this structure, high-capacity nodes apply both the base configuration and the capacity-specific overrides, while edge nodes apply the base configuration with edge-specific settings.

Gradual configuration rollouts

When rolling out configuration changes, you can:

Add a new drop-in file with a high numeric prefix (e.g., 99-new-feature.conf)

Test the changes on a subset of nodes

Gradually roll out to more nodes

Once stable, merge changes into the base configuration

Viewing the merged configuration

Since configuration is now spread across multiple files, you can inspect the final merged configuration using the kubelet's /configz endpoint:

Start kubectl proxy

kubectl proxy

In another terminal, fetch the merged configuration

Change the '<node-name>' placeholder before running the curl command

curl -X GET http://127.0.0.1:8001/api/v1/nodes/<node-name>/proxy/configz | jq .

This shows the actual configuration the kubelet is using after all merging has been applied. The merged configuration also includes any configuration settings that were specified via kubelet command-line arguments.

For detailed setup instructions, configuration examples, and merging behavior, see the official documentation:

Set Kubelet Parameters Via A Configuration File

Kubelet Configuration Directory Merging

Good practices

When using the kubelet configuration drop-in directory:

Test configurations incrementally: Always test new drop-in configurations on a subset of nodes before rolling out cluster-wide to minimize risk

Version control your drop-ins: Store your drop-in configuration files in version control (or the configuration source from which these are generated) alongside your infrastructure as code to track changes and enable easy rollbacks

Use numeric prefixes for predictable ordering: Name files with numeric prefixes (e.g., 00-, 50-, 90-) to explicitly control merge order and make the configuration layering obvious to other administrators

Be mindful of temporary files: Some text editors automatically create backup files (such as .bak, .swp, or files with ~ suffix) in the same directory when editing. Ensure these temporary or backup files are not left in the configuration directory, as they may be processed by the kubelet

Acknowledgments

This feature was developed through the collaborative efforts of SIG Node. Special thanks to all contributors who helped design, implement, test, and document this feature across its journey from alpha in v1.28, through beta in v1.30, to GA in v1.35.

To provide feedback on this feature, join the Kubernetes Node Special Interest Group, participate in discussions on the public Slack channel (#sig-node), or file an issue on GitHub.

Get involved

If you have feedback or questions about kubelet configuration management, or want to share your experience using this feature, join the discussion:

SIG Node community page

Kubernetes Slack in the #sig-node channel

SIG Node mailing list

SIG Node would love to hear about your experiences using this feature in production!

via Kubernetes Blog https://kubernetes.io/

December 22, 2025 at 01:30PM

·kubernetes.io·Dec 23, 2025

Kubernetes v1.35: Kubelet Configuration Drop-in Directory Graduates to GA

DevOps & AI Toolkit - Stop Resisting AI or Get Left Behind! (A Wake-Up Call) - https://www.youtube.com/watch?v=ZEB2pKs2R-Q

Stop Resisting AI or Get Left Behind! (A Wake-Up Call)

What happens when the team that always adapts first suddenly refuses to play by new rules? This video tells the story of a seemingly safe bet that went terribly wrong—a bet on a team with every advantage imaginable: the most skilled players, unlimited budgets, deep experience, and even the power to make the rules. But when AI was allowed on the field, everything changed. While other teams embraced the new reality and showed up with full rosters ready to play, that team's players mostly refused to participate, calling it hype and yelling at the few teammates who dared to try.

This story is about what's happening right now in tech companies with AI. The teams that led every previous transformation—VMs, cloud, containers, Kubernetes—are sitting on the sidelines while historically resistant teams are running full speed ahead. The irony is brutal, and the lesson is clear: being adaptable in the past doesn't guarantee you'll adapt in the future. The question isn't whether AI will change how we work, but whether you'll be in the field playing or on the bench yelling at those who are.

AIAdoption #TechLeadership #ChangeManagement

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/stop-resisting-ai-or-get-left-behind-a-wake-up-call

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=ZEB2pKs2R-Q

·youtube.com·Dec 22, 2025

DevOps & AI Toolkit - Stop Resisting AI or Get Left Behind! (A Wake-Up Call) - https://www.youtube.com/watch?v=ZEB2pKs2R-Q

NIST warns of NTP inaccuracy after blackouts across Colorado

UPDATED: A rare case of deliberately trying to induce an outage

·theregister.com·Dec 22, 2025

NIST warns of NTP inaccuracy after blackouts across Colorado

Avoiding Zombie Cluster Members When Upgrading to etcd v3.6

https://kubernetes.io/blog/2025/12/21/preventing-etcd-zombies/

This article is a mirror of an original that was recently published to the official etcd blog. The key takeaway? Always upgrade to etcd v3.5.26 or later before moving to v3.6. This ensures your cluster is automatically repaired, and avoids zombie members.

Issue summary

Recently, the etcd community addressed an issue that may appear when users upgrade from v3.5 to v3.6. This bug can cause the cluster to report "zombie members", which are etcd nodes that were removed from the database cluster some time ago, and are re-appearing and joining database consensus. The etcd cluster is then inoperable until these zombie members are removed.

In etcd v3.5 and earlier, the v2store was the source of truth for membership data, even though the v3store was also present. As a part of our v2store deprecation plan, in v3.6 the v3store is the source of truth for cluster membership. Through a bug report we found out that, in some older clusters, v2store and v3store could become inconsistent. This inconsistency manifests after upgrading as seeing old, removed "zombie" cluster members re-appearing in the cluster.

The fix and upgrade path

We’ve added a mechanism in etcd v3.5.26 to automatically sync v3store from v2store, ensuring that affected clusters are repaired before upgrading to 3.6.x.

To support the many users currently upgrading to 3.6, we have provided the following safe upgrade path:

Upgrade your cluster to v3.5.26 or later.

Wait and confirm that all members are healthy post-update.

Upgrade to v3.6.

We are unable to provide a safe workaround path for users who have some obstacle preventing updating to v3.5.26. As such, if v3.5.26 is not available from your packaging source or vendor, you should delay upgrading to v3.6 until it is.

Additional technical detail

Information below is offered for reference only. Users can follow the safe upgrade path without knowledge of the following details.

This issue is encountered with clusters that have been running in production on etcd v3.5.25 or earlier. It is a side effect of adding and removing members from the cluster, or recovering the cluster from failure. This means that the issue is more likely the older the etcd cluster is, but it cannot be ruled out for any user regardless of the age of the cluster.

etcd maintainers, working with issue reporters, have found three possible triggers for the issue based on symptoms and an analysis of etcd code and logs:

Bug in etcdctl snapshot restore (v3.4 and old versions): When restoring a snapshot using etcdctl snapshot restore, etcdctl was supposed to remove existing members before adding the new ones. In v3.4, due to a bug, old members were not removed, resulting in zombie members. Refer to the comment on etcdctl.

--force-new-cluster in v3.5 and earlier versions: In rare cases, forcibly creating a new single-member cluster did not fully remove old members, leaving zombies. The issue was resolved in v3.5.22. Please refer to this PR in the Raft project for detailed technical information.

--unsafe-no-sync enabled: If --unsafe-no-sync is enabled, in rare cases etcd might persist a membership change to v3store but crash before writing it to the WAL, causing inconsistency between v2store and v3store. This is a problem for single-member clusters. For multi-member clusters, forcibly creating a new single-member cluster from the crashed node’s data may lead to zombie members.

Note

--unsafe-no-sync is generally not recommended, as it may break the guarantees given by the consensus protocol.

Importantly, there may be other triggers for v2store and v3store membership data becoming inconsistent that we have not yet found. This means that you cannot assume that you are safe just because you have not performed any of the three actions above. Once users are upgraded to etcd v3.6, v3store becomes the source of membership data, and further inconsistency is not possible.

Advanced users who want to verify the consistency between v2store and v3store can follow the steps described in this comment. This check is not required to fix the issue, nor does SIG etcd recommend bypassing the v3.5.26 update regardless of the results of the check.

Key takeaway

Always upgrade to v3.5.26 or later before moving to v3.6. This ensures your cluster is automatically repaired and avoids zombie members.

Acknowledgements

We would like to thank Christian Baumann for reporting this long-standing upgrade issue. His report and follow-up work helped bring the issue to our attention so that we could investigate and resolve it upstream.

via Kubernetes Blog https://kubernetes.io/

December 20, 2025 at 07:00PM

·kubernetes.io·Dec 21, 2025

Avoiding Zombie Cluster Members When Upgrading to etcd v3.6

AI was behind over 50,000 layoffs in 2025 — here are the top firms to cite it for job cuts

Some of the world's biggest tech companies cited AI as part of their layoff and restructuring strategy in 2025.

·cnbc.com·Dec 21, 2025

AI was behind over 50,000 layoffs in 2025 — here are the top firms to cite it for job cuts

GitHub walks back plan to charge for self-hosted runners

updated: Engineers cried foul over plan to charge $0.002/min.

·theregister.com·Dec 20, 2025

GitHub walks back plan to charge for self-hosted runners

Linux Kernel Rust Code Sees Its First CVE Vulnerability

The first CVE vulnerability has been assigned to a piece of the Linux kernel's Rust code.

·phoronix.com·Dec 20, 2025

Linux Kernel Rust Code Sees Its First CVE Vulnerability

Kubernetes 1.35: In-Place Pod Resize Graduates to Stable

https://kubernetes.io/blog/2025/12/19/kubernetes-v1-35-in-place-pod-resize-ga/

This release marks a major step: more than 6 years after its initial conception, the In-Place Pod Resize feature (also known as In-Place Pod Vertical Scaling), first introduced as alpha in Kubernetes v1.27, and graduated to beta in Kubernetes v1.33, is now stable (GA) in Kubernetes 1.35!

This graduation is a major milestone for improving resource efficiency and flexibility for workloads running on Kubernetes.

What is in-place Pod Resize?

In the past, the CPU and memory resources allocated to a container in a Pod were immutable. This meant changing them required deleting and recreating the entire Pod. For stateful services, batch jobs, or latency-sensitive workloads, this was an incredibly disruptive operation.

In-Place Pod Resize makes CPU and memory requests and limits mutable, allowing you to adjust these resources within a running Pod, often without requiring a container restart.

Key Concept:

Desired Resources: A container's spec.containers[*].resources field now represents the desired resources. For CPU and memory, these fields are now mutable.

Actual Resources: The status.containerStatuses[*].resources field reflects the resources currently configured for a running container.

Triggering a Resize: You can request a resize by updating the desired requests and limits in the Pod's specification by utilizing the new resize subresource.

How can I start using in-place Pod Resize?

Detailed usage instructions and examples are provided in the official documentation: Resize CPU and Memory Resources assigned to Containers.

How does this help me?

In-place Pod Resize is a foundational building block that unlocks seamless, vertical autoscaling and improvements to workload efficiency.

Resources adjusted without disruption Workloads sensitive to latency or restarts can have their resources modified in-place without downtime or loss of state.

More powerful autoscaling Autoscalers are now empowered to adjust resources and with less impact. For example, Vertical Pod Autoscaler (VPA)'s InPlaceOrRecreate update mode, which leverages this feature, has graduated to beta. This allows resources to be adjusted automatically and seamlessly based on usage with minimal disruption.

See AEP-4016 for more details.

Address transient resource needs Workloads that temporarily need more resources can be adjusted quickly. This enables features like the CPU Startup Boost (AEP-7862) where applications can request more CPU during startup and then automatically scale back down.

Here are a few examples of some use cases:

A game server that needs to adjust its size with shifting player count.

A pre-warmed worker that can be shrunk while unused but inflated with the first request.

Dynamically scale with load for efficient bin-packing.

Increased resources for JIT compilation on startup.

Changes between beta (1.33) and stable (1.35)

Since the initial beta in v1.33, development effort has primarily been around stabilizing the feature and improving its usability based on community feedback. Here are the primary changes for the stable release:

Memory limit decrease Decreasing memory limits was previously prohibited. This restriction has been lifted, and memory limit decreases are now permitted. The Kubelet attempts to prevent OOM-kills by allowing the resize only if the current memory usage is below the new desired limit. However, this check is best-effort and not guaranteed.

Prioritized resizes If a node doesn't have enough room to accept all resize requests, Deferred resizes are reattempted based on the following priority:

PriorityClass

QoS class

Duration Deferred, with older requests prioritized first.

Pod Level Resources (Alpha) Support for in-place Pod Resize with Pod Level Resources has been introduced behind its own feature gate, which is alpha in v1.35.

Increased observability: There are now new Kubelet metrics and Pod events specifically associated with In-Place Pod Resize to help users track and debug resource changes.

What's next?

The graduation of In-Place Pod Resize to stable opens the door for powerful integrations across the Kubernetes ecosystem. There are several areas for futher improvement that are currently planned.

Integration with autoscalers and other projects

There are planned integrations with several autoscalers and other projects to improve workload efficiency at a larger scale. Some projects under discussion:

VPA CPU startup boost (AEP-7862): Allows applications to request more CPU at startup and scale back down after a specific period of time.

VPA Support for in-place updates (AEP-4016): VPA support for InPlaceOrRecreate has recently graduated to beta, with the eventual goal being to graduate the feature to stable. Support for InPlace mode is still being worked on; see this pull request.

Ray autoscaler: Plans to leverage In-Place Pod Resize to improve workload efficiency. See this Google Cloud blog post for more details.

Agent-sandbox "Soft-Pause": Investigating leveraging in-place Pod Resize for better improved latency. See the Github issue for more details.

Runtime support: Java and Python runtimes do not support resizing memory without restart. There is an open conversation with the Java developers, see the bug.

If you have a project that could benefit from integration with in-place pod resize, please reach out using the channels listed in the feedback section!

Feature expansion

Today, In-Place Pod Resize is prohibited when used in combination with: swap, the static CPU Manager, and the static Memory Manager. Additionally, resources other than CPU and memory are still immutable. Expanding the set of supported features and resources is under consideration as more feedback about community needs comes in.

There are also plans to support workload preemption; if there is not enough room on the node for the resize of a high priority pod, the goal is to enable policies to automatically evict a lower-priority pod or upsize the node.

Improved stability

Resolve kubelet-scheduler race conditions There are known race conditions between the kubelet and scheduler with regards to in-place pod resize. Work is underway to resolve these issues over the next few releases. See the issue for more details.

Safer memory limit decrease The Kubelet's best-effort check for OOM-kill prevention can be made even safer by moving the memory usage check into the container runtime itself. See the issue for more details.

Providing feedback

Looking to further build on this foundational feature, please share your feedback on how to improve and extend this feature. You can share your feedback through GitHub issues, mailing lists, or Slack channels related to the Kubernetes #sig-node and #sig-autoscaling communities.

Thank you to everyone who contributed to making this long-awaited feature a reality!

via Kubernetes Blog https://kubernetes.io/

December 19, 2025 at 01:30PM

·kubernetes.io·Dec 19, 2025

Kubernetes 1.35: In-Place Pod Resize Graduates to Stable