54549 bookmarks

Custom sorting

Having petulant kids in the US Government is going great | FCC Chair Brendan Carr is letting ISPs merge—as long as they end DEI programs

Verizon got FCC approval after ending DEI. Now Charter and Cox plan to merge.

·arstechnica.com·May 21, 2025

Having petulant kids in the US Government is going great | FCC Chair Brendan Carr is letting ISPs merge—as long as they end DEI programs

All the hype is around AI meanwhile I'm over here watching quantum computing | D-Wave Quantum shares soar 26% as company releases latest computing system

D-Wave Quantum's latest quantum computing system has hit the market, the company said Tuesday

·cnbc.com·May 21, 2025

All the hype is around AI meanwhile I'm over here watching quantum computing | D-Wave Quantum shares soar 26% as company releases latest computing system

Managing 100s of Kubernetes Clusters using Cluster API with Zain Malik

Managing 100s of Kubernetes Clusters using Cluster API, with Zain Malik

https://ku.bz/5PLksqVlk

Discover how to manage Kubernetes at scale with declarative infrastructure and automation principles.

Zain Malik shares his experience managing multi-tenant Kubernetes clusters with up to 30,000 pods across clusters capped at 950 nodes. He explains how his team transitioned from Terraform to Cluster API for declarative cluster lifecycle management, contributing upstream to improve AKS support while implementing GitOps workflows.

You will learn:

How to address challenges in large-scale Kubernetes operations, including node pool management inconsistencies and lengthy provisioning times

Why Cluster API provides a powerful foundation for multi-cloud cluster management, and how to extend it with custom operators for production-specific needs

How implementing GitOps principles eliminates manual intervention in critical operations like cluster upgrades

Strategies for handling production incidents and bugs when adopting emerging technologies like Cluster API

Sponsor

This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/5PLksqVlk

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

May 20, 2025 at 06:00AM

·kube.fm·May 20, 2025

Managing 100s of Kubernetes Clusters using Cluster API with Zain Malik

DevOps Toolkit - Ep22 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=7brdKxUiB9s

Ep22 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, Cloud, Kubernetes, Platform Engineering, containers, or anything else. We'll have special guests Scott Rosenberg and Ramiro Berrelleza to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=7brdKxUiB9s

·youtube.com·May 20, 2025

DevOps Toolkit - Ep22 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=7brdKxUiB9s

Valkey Turns One: How the Community Fork Left Redis in the Dust - Momento

Valkey is not only thriving, but now outperforming Redis 8.0 in real world benchmarks.

·gomomento.com·May 19, 2025

Valkey Turns One: How the Community Fork Left Redis in the Dust - Momento

DevOps Toolkit - Outdated AI Responses? Context7 Solves LLMs' Biggest Flaw - https://www.youtube.com/watch?v=DeZ-gw_aop0

Outdated AI Responses? Context7 Solves LLMs' Biggest Flaw

Discover the power of AI-enhanced coding with Context7! This video explores how to overcome outdated LLM information using Context7, an MCP server that provides up-to-date documentation. See how Context7 integrates with AI agents, improving their ability to provide current, reliable information for over 11000 projects. Boost your development workflow and stay ahead with cutting-edge tools and techniques.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Korbit AI 🔗 https://korbit.ai ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

AIAgents #Context7 #AIDocs

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/outdated-ai-responses?-context7-solves-llms-biggest-flaw 🔗 Context7: https://context7.com

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 The Problem with Models (LLMs) 01:07 Korbit AI (sponsor) 02:13 The Problem with Models (LLMs) (cont.) 02:23 Agents Using LLM Alone 04:19 Agents with Context7 MCP 07:04 What Is Context7?

via YouTube https://www.youtube.com/watch?v=DeZ-gw_aop0

·youtube.com·May 19, 2025

DevOps Toolkit - Outdated AI Responses? Context7 Solves LLMs' Biggest Flaw - https://www.youtube.com/watch?v=DeZ-gw_aop0

Forging Our Future: OSL's Path to Sustainability – A Call for Smart Solutions and Enduring Support | OSU Open Source Lab

A nonprofit organization working for the advancement of open source technologies.

·osuosl.org·May 19, 2025

Forging Our Future: OSL's Path to Sustainability – A Call for Smart Solutions and Enduring Support | OSU Open Source Lab

Kubernetes v1.33: In-Place Pod Resize Graduated to Beta

https://kubernetes.io/blog/2025/05/16/kubernetes-v1-33-in-place-pod-resize-beta/

On behalf of the Kubernetes project, I am excited to announce that the in-place Pod resize feature (also known as In-Place Pod Vertical Scaling), first introduced as alpha in Kubernetes v1.27, has graduated to Beta and will be enabled by default in the Kubernetes v1.33 release! This marks a significant milestone in making resource management for Kubernetes workloads more flexible and less disruptive.

What is in-place Pod resize?

Traditionally, changing the CPU or memory resources allocated to a container required restarting the Pod. While acceptable for many stateless applications, this could be disruptive for stateful services, batch jobs, or any workloads sensitive to restarts.

In-place Pod resizing allows you to change the CPU and memory requests and limits assigned to containers within a running Pod, often without requiring a container restart.

Here's the core idea:

The spec.containers[*].resources field in a Pod specification now represents the desired resources and is mutable for CPU and memory.

The status.containerStatuses[*].resources field reflects the actual resources currently configured on a running container.

You can trigger a resize by updating the desired resources in the Pod spec via the new resize subresource.

You can try it out on a v1.33 Kubernetes cluster by using kubectl to edit a Pod (requires kubectl v1.32+):

kubectl edit pod <pod-name> --subresource resize

For detailed usage instructions and examples, please refer to the official Kubernetes documentation: Resize CPU and Memory Resources assigned to Containers.

Why does in-place Pod resize matter?

Kubernetes still excels at scaling workloads horizontally (adding or removing replicas), but in-place Pod resizing unlocks several key benefits for vertical scaling:

Reduced Disruption: Stateful applications, long-running batch jobs, and sensitive workloads can have their resources adjusted without suffering the downtime or state loss associated with a Pod restart.

Improved Resource Utilization: Scale down over-provisioned Pods without disruption, freeing up resources in the cluster. Conversely, provide more resources to Pods under heavy load without needing a restart.

Faster Scaling: Address transient resource needs more quickly. For example Java applications often need more CPU during startup than during steady-state operation. Start with higher CPU and resize down later.

What's changed between Alpha and Beta?

Since the alpha release in v1.27, significant work has gone into maturing the feature, improving its stability, and refining the user experience based on feedback and further development. Here are the key changes:

Notable user-facing changes

resize Subresource: Modifying Pod resources must now be done via the Pod's resize subresource (kubectl patch pod <name> --subresource resize ...). kubectl versions v1.32+ support this argument.

Resize Status via Conditions: The old status.resize field is deprecated. The status of a resize operation is now exposed via two Pod conditions:

PodResizePending: Indicates the Kubelet cannot grant the resize immediately (e.g., reason: Deferred if temporarily unable, reason: Infeasible if impossible on the node).

PodResizeInProgress: Indicates the resize is accepted and being applied. Errors encountered during this phase are now reported in this condition's message with reason: Error.

Sidecar Support: Resizing sidecar containers in-place is now supported.

Stability and reliability enhancements

Refined Allocated Resources Management: The allocation management logic with the Kubelet was significantly reworked, making it more consistent and robust. The changes eliminated whole classes of bugs, and greatly improved the reliability of in-place Pod resize.

Improved Checkpointing & State Tracking: A more robust system for tracking "allocated" and "actuated" resources was implemented, using new checkpoint files (allocated_pods_state, actuated_pods_state) to reliably manage resize state across Kubelet restarts and handle edge cases where runtime-reported resources differ from requested ones. Several bugs related to checkpointing and state restoration were fixed. Checkpointing efficiency was also improved.

Faster Resize Detection: Enhancements to the Kubelet's Pod Lifecycle Event Generator (PLEG) allow the Kubelet to respond to and complete resizes much more quickly.

Enhanced CRI Integration: A new UpdatePodSandboxResources CRI call was added to better inform runtimes and plugins (like NRI) about Pod-level resource changes.

Numerous Bug Fixes: Addressed issues related to systemd cgroup drivers, handling of containers without limits, CPU minimum share calculations, container restart backoffs, error propagation, test stability, and more.

What's next?

Graduating to Beta means the feature is ready for broader adoption, but development doesn't stop here! Here's what the community is focusing on next:

Stability and Productionization: Continued focus on hardening the feature, improving performance, and ensuring it is robust for production environments.

Addressing Limitations: Working towards relaxing some of the current limitations noted in the documentation, such as allowing memory limit decreases.

VerticalPodAutoscaler (VPA) Integration: Work to enable VPA to leverage in-place Pod resize is already underway. A new InPlaceOrRecreate update mode will allow it to attempt non-disruptive resizes first, or fall back to recreation if needed. This will allow users to benefit from VPA's recommendations with significantly less disruption.

User Feedback: Gathering feedback from users adopting the beta feature is crucial for prioritizing further enhancements and addressing any uncovered issues or bugs.

Getting started and providing feedback

With the InPlacePodVerticalScaling feature gate enabled by default in v1.33, you can start experimenting with in-place Pod resizing right away!

Refer to the documentation for detailed guides and examples.

As this feature moves through Beta, your feedback is invaluable. Please report any issues or share your experiences via the standard Kubernetes communication channels (GitHub issues, mailing lists, Slack). You can also review the KEP-1287: In-place Update of Pod Resources for the full in-depth design details.

We look forward to seeing how the community leverages in-place Pod resize to build more efficient and resilient applications on Kubernetes!

via Kubernetes Blog https://kubernetes.io/

May 16, 2025 at 02:30PM

·kubernetes.io·May 17, 2025

Kubernetes v1.33: In-Place Pod Resize Graduated to Beta

Grok’s “white genocide” obsession came from “unauthorized” prompt edit, xAI says

Meanwhile, Grok’s authorized prompt asks it to “provide truthful and based insights.”…

·arstechnica.com·May 16, 2025

Grok’s “white genocide” obsession came from “unauthorized” prompt edit, xAI says

CHAOSSMonthly - May 2025 · chaoss/community · Discussion #728

CHAOSSMonthly Newsletter - May 2025 👇 In this Issue We've Moved CHAOSSMonthly to GitHub CHAOSScon is happening June 26, 2025 New Working Group for CHAOSS Education CHAOTIC of the Month: Cali Dolfi ...

·github.com·May 16, 2025

CHAOSSMonthly - May 2025 · chaoss/community · Discussion #728

copy and paste in the terminal

·wizardzines.com·May 16, 2025

copy and paste in the terminal

DHS pulls $2.4B Leidos CISA deal after rival calls foul

: Nightwing claims insider intel helped secure lucrative CISA work but US says decision is unrelated

·theregister.com·May 16, 2025

DHS pulls $2.4B Leidos CISA deal after rival calls foul

Announcing etcd v3.6.0

https://kubernetes.io/blog/2025/05/15/announcing-etcd-3.6/

This announcement originally appeared on the etcd blog.

Today, we are releasing etcd v3.6.0, the first minor release since etcd v3.5.0 on June 15, 2021. This release introduces several new features, makes significant progress on long-standing efforts like downgrade support and migration to v3store, and addresses numerous critical & major issues. It also includes major optimizations in memory usage, improving efficiency and performance.

In addition to the features of v3.6.0, etcd has joined Kubernetes as a SIG (sig-etcd), enabling us to improve project sustainability. We've introduced systematic robustness testing to ensure correctness and reliability. Through the etcd-operator Working Group, we plan to improve usability as well.

What follows are the most significant changes introduced in etcd v3.6.0, along with the discussion of the roadmap for future development. For a detailed list of changes, please refer to the CHANGELOG-3.6.

A heartfelt thank you to all the contributors who made this release possible!

Security

etcd takes security seriously. To enhance software security in v3.6.0, we have improved our workflow checks by integrating govulncheck to scan the source code and trivy to scan container images. These improvements have also been backported to supported stable releases.

etcd continues to follow the Security Release Process to ensure vulnerabilities are properly managed and addressed.

Features

Migration to v3store

The v2store has been deprecated since etcd v3.4 but could still be enabled via --enable-v2. It remained the source of truth for membership data. In etcd v3.6.0, v2store can no longer be enabled as the --enable-v2 flag has been removed, and v3store has become the sole source of truth for membership data.

While v2store still exists in v3.6.0, etcd will fail to start if it contains any data other than membership information. To assist with migration, etcd v3.5.18+ provides the etcdutl check v2store command, which verifies that v2store contains only membership data (see PR 19113).

Compared to v2store, v3store offers better performance and transactional support. It is also the actively maintained storage engine moving forward.

The removal of v2store is still ongoing and is tracked in issues/12913.

Downgrade

etcd v3.6.0 is the first version to fully support downgrade. The effort for this downgrade task spans both versions 3.5 and 3.6, and all related work is tracked in issues/11716.

At a high level, the process involves migrating the data schema to the target version (e.g., v3.5), followed by a rolling downgrade.

Ensure the cluster is healthy, and take a snapshot backup. Validate whether the downgrade is valid:

$ etcdctl downgrade validate 3.5 Downgrade validate success, cluster version 3.6

If the downgrade is valid, enable downgrade mode:

$ etcdctl downgrade enable 3.5 Downgrade enable success, cluster version 3.6

etcd will then migrate the data schema in the background. Once complete, proceed with the rolling downgrade.

For details, refer to the Downgrade-3.6 guide.

Feature gates

In etcd v3.6.0, we introduced Kubernetes-style feature gates for managing new features. Previously, we indicated unstable features through the --experimental prefix in feature flag names. The prefix was removed once the feature was stable, causing a breaking change. Now, features will start in Alpha, progress to Beta, then GA, or get deprecated. This ensures a much smoother upgrade and downgrade experience for users.

See feature-gates for details.

livez / readyz checks

etcd now supports /livez and /readyz endpoints, aligning with Kubernetes' Liveness and Readiness probes. /livez indicates whether the etcd instance is alive, while /readyz indicates when it is ready to serve requests. This feature has also been backported to release-3.5 (starting from v3.5.11) and release-3.4 (starting from v3.4.29). See livez/readyz for details.

The existing /health endpoint remains functional. /livez is similar to /health?serializable=true, while /readyz is similar to /health or /health?serializable=false. Clearly, the /livez and /readyz endpoints provide clearer semantics and are easier to understand.

v3discovery

In etcd v3.6.0, the new discovery protocol v3discovery was introduced, based on clientv3. It facilitates the discovery of all cluster members during the bootstrap phase.

The previous v2discovery protocol, based on clientv2, has been deprecated. Additionally, the public discovery service at https://discovery.etcd.io/, which relied on v2discovery, is no longer maintained.

Performance

Memory

In this release, we reduced average memory consumption by at least 50% (see Figure 1). This improvement is primarily due to two changes:

The default value of --snapshot-count has been reduced from 100,000 in v3.5 to 10,000 in v3.6. As a result, etcd v3.6 now retains only about 10% of the history records compared to v3.5.

Raft history is compacted more frequently, as introduced in PR/18825.

Figure 1: Memory usage comparison between etcd v3.5.20 and v3.6.0-rc.2 under different read/write ratios. Each subplot shows the memory usage over time with a specific read/write ratio. The red line represents etcd v3.5.20, while the teal line represents v3.6.0-rc.2. Across all tested ratios, v3.6.0-rc.2 exhibits lower and more stable memory usage.

Throughput

Compared to v3.5, etcd v3.6 delivers an average performance improvement of approximately 10% in both read and write throughput (see Figure 2, 3, 4 and 5). This improvement is not attributed to any single major change, but rather the cumulative effect of multiple minor enhancements. One such example is the optimization of the free page queries introduced in PR/419.

Figure 2: Read throughput comparison between etcd v3.5.20 and v3.6.0-rc.2 under a high write ratio. The read/write ratio is 0.0078, meaning 1 read per 128 writes. The right bar shows the percentage improvement in read throughput of v3.6.0-rc.2 over v3.5.20, ranging from 3.21% to 25.59%.

Figure 3: Read throughput comparison between etcd v3.5.20 and v3.6.0-rc.2 under a high read ratio. The read/write ratio is 8, meaning 8 reads per write. The right bar shows the percentage improvement in read throughput of v3.6.0-rc.2 over v3.5.20, ranging from 4.38% to 27.20%.

Figure 4: Write throughput comparison between etcd v3.5.20 and v3.6.0-rc.2 under a high write ratio. The read/write ratio is 0.0078, meaning 1 read per 128 writes. The right bar shows the percentage improvement in write throughput of v3.6.0-rc.2 over v3.5.20, ranging from 2.95% to 24.24%.

Figure 5: Write throughput comparison between etcd v3.5.20 and v3.6.0-rc.2 under a high read ratio. The read/write ratio is 8, meaning 8 reads per write. The right bar shows the percentage improvement in write throughput of v3.6.0-rc.2 over v3.5.20, ranging from 3.86% to 28.37%.

Breaking changes

This section highlights a few notable breaking changes. For a complete list, please refer to the Upgrade etcd from v3.5 to v3.6 and the CHANGELOG-3.6.

Old binaries are incompatible with new schema versions

Old etcd binaries are not compatible with newer data schema versions. For example, etcd 3.5 cannot start with data created by etcd 3.6, and etcd 3.4 cannot start with data created by either 3.5 or 3.6.

When downgrading etcd, it's important to follow the documented downgrade procedure. Simply replacing the binary or image will result in the incompatibility issue.

Peer endpoints no longer serve client requests

Client endpoints (--advertise-client-urls) are intended to serve client requests only, while peer endpoints (--initial-advertise-peer-urls) are intended solely for peer communication. However, due to an implementation oversight, the peer endpoints were also able to handle client requests in etcd 3.4 and 3.5. This behavior was misleading and encouraged incorrect usage patterns. In etcd 3.6, this misleading behavior was corrected via PR/13565; peer endpoints no longer serve client requests.

Clear boundary between etcdctl and etcdutl

Both etcdctl and etcdutl are command line tools. etcdutl is an offline utility designed to operate directly on etcd data files, while etcdctl is an online tool that interacts with etcd over a network. Previously, there were some overlapping functionalities between the two, but these overlaps were removed in 3.6.0.

Removed etcdctl defrag --data-dir

The etcdctl defrag command only support online defragmentation and no longer supports offline defragmentation. To perform offline defragmentation, use the etcdutl defrag --data-dir command instead.

Removed etcdctl snapshot status

etcdctl no longer supports retrieving the status of a snapshot. Use the etcdutl snapshot status command instead.

Removed etcdctl snapshot restore

etcdctl no longer supports restoring from a snapshot. Use the etcdutl snapshot restore command instead.

Critical bug fixes

Correctness has always been a top priority for the etcd project. In the process of developing 3.6.0, we found and fixed a few notable bugs that could lead to data inconsistency in specific cases. These fixes have been backported to previous releases, but we believe they deserve special mention here.

Data Inconsistency when Crashing Under Load

Previously, when etcd was applying data, it would update the consistent-index first, followed by committing the data. However, these operations were not atomic. If etcd crashed in between, it could lead to data inconsistency (see issue/13766). The issue was introduced in v3.5.0, and fixed in v3.5.3 with PR/13854.

Durability API guarantee broken in single node cluster

When a client writes data and receives a success response, the data is expected to be persisted. However, the data might be lost if etcd crashes immediately after sending the success response to the client. This was a legacy issue (see issue/14370) affecting all previous releases. It was addressed in

·kubernetes.io·May 16, 2025

Announcing etcd v3.6.0

Kubernetes 1.33: Job's SuccessPolicy Goes GA

https://kubernetes.io/blog/2025/05/15/kubernetes-1-33-jobs-success-policy-goes-ga/

On behalf of the Kubernetes project, I'm pleased to announce that Job success policy has graduated to General Availability (GA) as part of the v1.33 release.

About Job's Success Policy

In batch workloads, you might want to use leader-follower patterns like MPI, in which the leader controls the execution, including the followers' lifecycle.

In this case, you might want to mark it as succeeded even if some of the indexes failed. Unfortunately, a leader-follower Kubernetes Job that didn't use a success policy, in most cases, would have to require all Pods to finish successfully for that Job to reach an overall succeeded state.

For Kubernetes Jobs, the API allows you to specify the early exit criteria using the .spec.successPolicy field (you can only use the .spec.successPolicy field for an indexed Job). Which describes a set of rules either using a list of succeeded indexes for a job, or defining a minimal required size of succeeded indexes.

This newly stable field is especially valuable for scientific simulation, AI/ML and High-Performance Computing (HPC) batch workloads. Users in these areas often run numerous experiments and may only need a specific number to complete successfully, rather than requiring all of them to succeed. In this case, the leader index failure is the only relevant Job exit criteria, and the outcomes for individual follower Pods are handled only indirectly via the status of the leader index. Moreover, followers do not know when they can terminate themselves.

After Job meets any Success Policy, the Job is marked as succeeded, and all Pods are terminated including the running ones.

How it works

The following excerpt from a Job manifest, using .successPolicy.rules[0].succeededCount, shows an example of using a custom success policy:

parallelism: 10 completions: 10 completionMode: Indexed successPolicy: rules:

succeededCount: 1

Here, the Job is marked as succeeded when one index succeeded regardless of its number. Additionally, you can constrain index numbers against succeededCount in .successPolicy.rules[0].succeededCount as shown below:

parallelism: 10 completions: 10 completionMode: Indexed successPolicy: rules:

succeededIndexes: 0 # index of the leader Pod succeededCount: 1

This example shows that the Job will be marked as succeeded once a Pod with a specific index (Pod index 0) has succeeded.

Once the Job either reaches one of the successPolicy rules, or achieves its Complete criteria based on .spec.completions, the Job controller within kube-controller-manager adds the SuccessCriteriaMet condition to the Job status. After that, the job-controller initiates cleanup and termination of Pods for Jobs with SuccessCriteriaMet condition. Eventually, Jobs obtain Complete condition when the job-controller finished cleanup and termination.

Learn more

Read the documentation for success policy.

Read the KEP for the Job success/completion policy

Get involved

This work was led by the Kubernetes batch working group in close collaboration with the SIG Apps community.

If you are interested in working on new features in the space I recommend subscribing to our Slack channel and attending the regular community meetings.

via Kubernetes Blog https://kubernetes.io/

May 15, 2025 at 02:30PM

·kubernetes.io·May 15, 2025

Kubernetes 1.33: Job's SuccessPolicy Goes GA

Sustaining Standards Leadership: The United States Cannot Disengage from RISC-V

RISC-V enhances the competitiveness of U.S. chip design firms by creating a flexible, low-risk, and low-cost platform for collaboration. To capitalize on this opportunity, the United States should continue to support RISC-V for future chip innovation.

·csis.org·May 15, 2025

Sustaining Standards Leadership: The United States Cannot Disengage from RISC-V

Firefox Source Code Now Hosted On GitHub

The Mozilla Firefox source code is now officially available on GitHub as they work to transition from their hg.mozilla.org servers.

·phoronix.com·May 15, 2025

Firefox Source Code Now Hosted On GitHub

‘Admin’ and ‘123456’ Still Among Most Used Passwords in FTP Attacks, Study

·hackread.com·May 15, 2025

‘Admin’ and ‘123456’ Still Among Most Used Passwords in FTP Attacks, Study

Kubernetes v1.33: Updates to Container Lifecycle

https://kubernetes.io/blog/2025/05/14/kubernetes-v1-33-updates-to-container-lifecycle/

Kubernetes v1.33 introduces a few updates to the lifecycle of containers. The Sleep action for container lifecycle hooks now supports a zero sleep duration (feature enabled by default). There is also alpha support for customizing the stop signal sent to containers when they are being terminated.

This blog post goes into the details of these new aspects of the container lifecycle, and how you can use them.

Zero value for Sleep action

Kubernetes v1.29 introduced the Sleep action for container PreStop and PostStart Lifecycle hooks. The Sleep action lets your containers pause for a specified duration after the container is started or before it is terminated. This was needed to provide a straightforward way to manage graceful shutdowns. Before the Sleep action, folks used to run the sleep command using the exec action in their container lifecycle hooks. If you wanted to do this you'd need to have the binary for the sleep command in your container image. This is difficult if you're using third party images.

The sleep action when it was added initially didn't have support for a sleep duration of zero seconds. The time.Sleep which the Sleep action uses under the hood supports a duration of zero seconds. Using a negative or a zero value for the sleep returns immediately, resulting in a no-op. We wanted the same behaviour with the sleep action. This support for the zero duration was later added in v1.32, with the PodLifecycleSleepActionAllowZero feature gate.

The PodLifecycleSleepActionAllowZero feature gate has graduated to beta in v1.33, and is now enabled by default. The original Sleep action for preStop and postStart hooks is been enabled by default, starting from Kubernetes v1.30. With a cluster running Kubernetes v1.33, you are able to set a zero duration for sleep lifecycle hooks. For a cluster with default configuration, you don't need to enable any feature gate to make that possible.

Container stop signals

Container runtimes such as containerd and CRI-O honor a StopSignal instruction in the container image definition. This can be used to specify a custom stop signal that the runtime will used to terminate containers based on that image. Stop signal configuration was not originally part of the Pod API in Kubernetes. Until Kubernetes v1.33, the only way to override the stop signal for containers was by rebuilding your container image with the new custom stop signal (for example, specifying STOPSIGNAL in a Containerfile or Dockerfile).

The ContainerStopSignals feature gate which is newly added in Kubernetes v1.33 adds stop signals to the Kubernetes API. This allows users to specify a custom stop signal in the container spec. Stop signals are added to the API as a new lifecycle along with the existing PreStop and PostStart lifecycle handlers. In order to use this feature, we expect the Pod to have the operating system specified with spec.os.name. This is enforced so that we can cross-validate the stop signal against the operating system and make sure that the containers in the Pod are created with a valid stop signal for the operating system the Pod is being scheduled to. For Pods scheduled on Windows nodes, only SIGTERM and SIGKILL are allowed as valid stop signals. Find the full list of signals supported in Linux nodes here.

Default behaviour

If a container has a custom stop signal defined in its lifecycle, the container runtime would use the signal defined in the lifecycle to kill the container, given that the container runtime also supports custom stop signals. If there is no custom stop signal defined in the container lifecycle, the runtime would fallback to the stop signal defined in the container image. If there is no stop signal defined in the container image, the default stop signal of the runtime would be used. The default signal is SIGTERM for both containerd and CRI-O.

Version skew

For the feature to work as intended, both the versions of Kubernetes and the container runtime should support container stop signals. The changes to the Kuberentes API and kubelet are available in alpha stage from v1.33, which can be enabled with the ContainerStopSignals feature gate. The container runtime implementations for containerd and CRI-O are still a work in progress and will be rolled out soon.

Using container stop signals

To enable this feature, you need to turn on the ContainerStopSignals feature gate in both the kube-apiserver and the kubelet. Once you have nodes where the feature gate is turned on, you can create Pods with a StopSignal lifecycle and a valid OS name like so:

apiVersion: v1 kind: Pod metadata: name: nginx spec: os: name: linux containers:

name: nginx image: nginx:latest lifecycle: stopSignal: SIGUSR1

Do note that the SIGUSR1 signal in this example can only be used if the container's Pod is scheduled to a Linux node. Hence we need to specify spec.os.name as linux to be able to use the signal. You will only be able to configure SIGTERM and SIGKILL signals if the Pod is being scheduled to a Windows node. You cannot specify a containers[*].lifecycle.stopSignal if the spec.os.name field is nil or unset either.

How do I get involved?

This feature is driven by the SIG Node. If you are interested in helping develop this feature, sharing feedback, or participating in any other ongoing SIG Node projects, please reach out to us!

You can reach SIG Node by several means:

Slack: #sig-node

Mailing list

Open Community Issues/PRs

You can also contact me directly:

GitHub: @sreeram-venkitesh

Slack: @sreeram.venkitesh

via Kubernetes Blog https://kubernetes.io/

May 14, 2025 at 02:30PM

·kubernetes.io·May 15, 2025

Kubernetes v1.33: Updates to Container Lifecycle

Last Week in Kubernetes Development - Week Ending May 11 2025

Week Ending May 11, 2025

https://lwkd.info/2025/20250514

Developer News

SIG-Architecture updated the KEP Template to better guide future KEPs by explicitly indicating that beta must be complete and promotion from beta to GA must have no significant change for the release. This is the result of a 6-month discussion. The complete details, motivations, and incremental delivery handling are explained in Beta Feature Gate Promotion Requirements.

Release Schedule

Next Deadline: Release Schedule Begins, May 19th

The Release Cycle for Kubernetes v1.34 starts on May 19th, with final release on August 27th. Vyom Yadav is the Release Lead, with shadows Daniel Chan, Ryota Sawada, Wendy Ha, and Sreeram Venkitesh.

The May 2025 Kubernetes patch release is delayed to May 15th to accommodate the cherry-picks that were approved by code owners and satisfied the criteria before the deadline of May 9th, but were not merged.

Featured PRs

129874: Change the implementation design of matchLabelKeys in PodTopologySpread to be aligned with PodAffinity

PodTopologySpread’s matchLabelKeys now behaves like PodAffinity’s matchLabelKeys to ensure consistent scheduling; A new feature gate MatchLabelKeysInPodTopologySpreadSelectorMerge controls this change (enabled by default); Users upgrading from v1.32 to v1.34 must upgrade step-by-step (from v1.32 to v1.33, then to v1.34), to avoid issues with unscheduled pods using matchLabelKeys.

131662: DRA: Fix failure to allocate large number of devices

This PR fixes a bug in 1.33 that reduced device allocation per ResourceClaim to 16; Now restores support for allocating up to 32 devices per ResourceClaim, ensuring large claims can be allocated as expected and making the DRA reliable for high device counts.

KEP of the Week

Other Merges

Publishing rules for 1.30/31/32 to use go1.23.8

LogResponseObject to avoid encoding when we are not going to use it

Tests in mounted_volume_resize moved into testsuites/volume_expand.go

Fix for broken recursion in validation-gen

Container resources included when generating the key for crashloopbackoff

Pass test context to http requests

Deprecated ioutil package in apiserver removed and replaced with os

DRA: Fixed incorrect behavior for AllocationMode

Request method constants added to avoid using string literals and fix linter errors

Reorganized scheme type converter into apimachinery utils

E2e tests for Partitionable Devices

Fix for API server crash on concurrent map iteration and write

Promotions

DisableAllocatorDualWrite to GA

via Last Week in Kubernetes Development https://lwkd.info/

May 14, 2025 at 06:00PM

·lwkd.info·May 14, 2025

Last Week in Kubernetes Development - Week Ending May 11 2025

stdin, stdout, stderr

·wizardzines.com·May 14, 2025

stdin, stdout, stderr

DNS Piracy Blocking Orders: Google, Cloudflare, and OpenDNS Respond Differently * TorrentFreak

Facing escalating DNS piracy blocking orders, major providers like OpenDNS, Cloudflare, and Google are adopting notably different responses.

·torrentfreak.com·May 14, 2025

DNS Piracy Blocking Orders: Google, Cloudflare, and OpenDNS Respond Differently * TorrentFreak

Apple Patches Major Security Flaws in iOS, macOS Platforms

Apple rolls out iOS and macOS platform updates to fix serious security bugs that could be triggered simply by opening an image or video file.

·securityweek.com·May 14, 2025

Apple Patches Major Security Flaws in iOS, macOS Platforms

Kubernetes v1.33: Job's Backoff Limit Per Index Goes GA

https://kubernetes.io/blog/2025/05/13/kubernetes-v1-33-jobs-backoff-limit-per-index-goes-ga/

In Kubernetes v1.33, the Backoff Limit Per Index feature reaches general availability (GA). This blog describes the Backoff Limit Per Index feature and its benefits.

About backoff limit per index

When you run workloads on Kubernetes, you must consider scenarios where Pod failures can affect the completion of your workloads. Ideally, your workload should tolerate transient failures and continue running.

To achieve failure tolerance in a Kubernetes Job, you can set the spec.backoffLimit field. This field specifies the total number of tolerated failures.

However, for workloads where every index is considered independent, like embarassingly parallel workloads - the spec.backoffLimit field is often not flexible enough. For example, you may choose to run multiple suites of integration tests by representing each suite as an index within an Indexed Job. In that setup, a fast-failing index (test suite) is likely to consume your entire budget for tolerating Pod failures, and you might not be able to run the other indexes.

In order to address this limitation, Kubernetes introduced backoff limit per index, which allows you to control the number of retries per index.

How backoff limit per index works

To use Backoff Limit Per Index for Indexed Jobs, specify the number of tolerated Pod failures per index with the spec.backoffLimitPerIndex field. When you set this field, the Job executes all indexes by default.

Additionally, to fine-tune the error handling:

Specify the cap on the total number of failed indexes by setting the spec.maxFailedIndexes field. When the limit is exceeded the entire Job is terminated.

Define a short-circuit to detect a failed index by using the FailIndex action in the Pod Failure Policy mechanism.

When the number of tolerated failures is exceeded, the Job marks that index as failed and lists it in the Job's status.failedIndexes field.

Example

The following Job spec snippet is an example of how to combine backoff limit per index with the Pod Failure Policy feature:

completions: 10 parallelism: 10 completionMode: Indexed backoffLimitPerIndex: 1 maxFailedIndexes: 5 podFailurePolicy: rules:

action: Ignore onPodConditions:
type: DisruptionTarget
action: FailIndex onExitCodes: operator: In values: [ 42 ]

In this example, the Job handles Pod failures as follows:

Ignores any failed Pods that have the built-in disruption condition, called DisruptionTarget. These Pods don't count towards Job backoff limits.

Fails the index corresponding to the failed Pod if any of the failed Pod's containers finished with the exit code 42 - based on the matching "FailIndex" rule.

Retries the first failure of any index, unless the index failed due to the matching FailIndex rule.

Fails the entire Job if the number of failed indexes exceeded 5 (set by the spec.maxFailedIndexes field).

Learn more

Read the blog post on the closely related feature of Pod Failure Policy Kubernetes 1.31: Pod Failure Policy for Jobs Goes GA

For a hands-on guide to using Pod failure policy, including the use of FailIndex, see Handling retriable and non-retriable pod failures with Pod failure policy

Read the documentation for Backoff limit per index and Pod failure policy

Read the KEP for the Backoff Limits Per Index For Indexed Jobs

Get involved

This work was sponsored by the Kubernetes batch working group in close collaboration with the SIG Apps community.

If you are interested in working on new features in the space we recommend subscribing to our Slack channel and attending the regular community meetings.

via Kubernetes Blog https://kubernetes.io/

May 13, 2025 at 02:30PM

·kubernetes.io·May 14, 2025

Kubernetes v1.33: Job's Backoff Limit Per Index Goes GA

Super-Scaling Open Policy Agent with Batch Queries with Nicholaos Mouzourakis

Super-Scaling Open Policy Agent with Batch Queries, with Nicholaos Mouzourakis

https://ku.bz/S-2vQ_j-4

Dive into the technical challenges of scaling authorization in Kubernetes with this in-depth conversation about Open Policy Agent (OPA).

Nicholaos Mouzourakis, Staff Product Security Engineer at Gusto, explains how his team re-architected Kubernetes native authorization using OPA to support scale, latency guarantees, and audit requirements across services. He shares detailed insights about their journey optimizing OPA performance through batch queries and solving unexpected interactions between Kubernetes resource limits and Go's runtime behavior.

You will learn:

Why traditional authorization approaches (code-driven and data-driven) fall short in microservice architectures, and how OPA provides a more flexible, decoupled solution

How batch authorization can improve performance by up to 18x by reducing network round-trips

The unexpected interaction between Kubernetes CPU limits and Go's thread management (GOMAXPROCS) that can severely impact OPA performance

Practical deployment strategies for OPA in production environments, including considerations for sidecars, daemon sets, and WASM modules

Sponsor

This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/S-2vQ_j-4

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

May 13, 2025 at 06:00AM

·kube.fm·May 13, 2025

Super-Scaling Open Policy Agent with Batch Queries with Nicholaos Mouzourakis

EU Vulnerability Database

European Vulnerability Database

·euvd.enisa.europa.eu·May 13, 2025

EU Vulnerability Database

EU bug database fully operational as US slashes infosec

: EUVD comes into play not a moment too soon

·theregister.com·May 13, 2025

EU bug database fully operational as US slashes infosec

Multiple Security Issues in Screen

Screen is the traditional terminal multiplexer software used on Linux and Unix systems. We found a local root exploit in Screen 5.0.0 affecting Arch Linux and NetBSD, as well as a couple of other issues that partly also affect older Screen versions, which are still found in the majority of distributions.

·security.opensuse.org·May 13, 2025

Multiple Security Issues in Screen

Kubernetes v1.33: Image Pull Policy the way you always thought it worked!

https://kubernetes.io/blog/2025/05/12/kubernetes-v1-33-ensure-secret-pulled-images-alpha/

Image Pull Policy the way you always thought it worked!

Some things in Kubernetes are surprising, and the way imagePullPolicy behaves might be one of them. Given Kubernetes is all about running pods, it may be peculiar to learn that there has been a caveat to restricting pod access to authenticated images for over 10 years in the form of issue 18787! It is an exciting release when you can resolve a ten-year-old issue.

Note: Throughout this blog post, the term "pod credentials" will be used often. In this context, the term generally encapsulates the authentication material that is available to a pod to authenticate a container image pull.

IfNotPresent, even if I'm not supposed to have it

The gist of the problem is that the imagePullPolicy: IfNotPresent strategy has done precisely what it says, and nothing more. Let's set up a scenario. To begin, Pod A in Namespace X is scheduled to Node 1 and requires image Foo from a private repository. For it's image pull authentication material, the pod references Secret 1 in its imagePullSecrets. Secret 1 contains the necessary credentials to pull from the private repository. The Kubelet will utilize the credentials from Secret 1 as supplied by Pod A and it will pull container image Foo from the registry. This is the intended (and secure) behavior.

But now things get curious. If Pod B in Namespace Y happens to also be scheduled to Node 1, unexpected (and potentially insecure) things happen. Pod B may reference the same private image, specifying the IfNotPresent image pull policy. Pod B does not reference Secret 1 (or in our case, any secret) in its imagePullSecrets. When the Kubelet tries to run the pod, it honors the IfNotPresent policy. The Kubelet sees that the image Foo is already present locally, and will provide image Foo to Pod B. Pod B gets to run the image even though it did not provide credentials authorizing it to pull the image in the first place.

Using a private image pulled by a different pod

While IfNotPresent should not pull image Foo if it is already present on the node, it is an incorrect security posture to allow all pods scheduled to a node to have access to previously pulled private image. These pods were never authorized to pull the image in the first place.

IfNotPresent, but only if I am supposed to have it

In Kubernetes v1.33, we - SIG Auth and SIG Node - have finally started to address this (really old) problem and getting the verification right! The basic expected behavior is not changed. If an image is not present, the Kubelet will attempt to pull the image. The credentials each pod supplies will be utilized for this task. This matches behavior prior to 1.33.

If the image is present, then the behavior of the Kubelet changes. The Kubelet will now verify the pod's credentials before allowing the pod to use the image.

Performance and service stability have been a consideration while revising the feature. Pods utilizing the same credential will not be required to re-authenticate. This is also true when pods source credentials from the same Kubernetes Secret object, even when the credentials are rotated.

Never pull, but use if authorized

The imagePullPolicy: Never option does not fetch images. However, if the container image is already present on the node, any pod attempting to use the private image will be required to provide credentials, and those credentials require verification.

Pods utilizing the same credential will not be required to re-authenticate. Pods that do not supply credentials previously used to successfully pull an image will not be allowed to use the private image.

Always pull, if authorized

The imagePullPolicy: Always has always worked as intended. Each time an image is requested, the request goes to the registry and the registry will perform an authentication check.

In the past, forcing the Always image pull policy via pod admission was the only way to ensure that your private container images didn't get reused by other pods on nodes which already pulled the images.

Fortunately, this was somewhat performant. Only the image manifest was pulled, not the image. However, there was still a cost and a risk. During a new rollout, scale up, or pod restart, the image registry that provided the image MUST be available for the auth check, putting the image registry in the critical path for stability of services running inside of the cluster.

How it all works

The feature is based on persistent, file-based caches that are present on each of the nodes. The following is a simplified description of how the feature works. For the complete version, please see KEP-2535.

The process of requesting an image for the first time goes like this:

A pod requesting an image from a private registry is scheduled to a node.

The image is not present on the node.

The Kubelet makes a record of the intention to pull the image.

The Kubelet extracts credentials from the Kubernetes Secret referenced by the pod as an image pull secret, and uses them to pull the image from the private registry.

After the image has been successfully pulled, the Kubelet makes a record of the successful pull. This record includes details about credentials used (in the form of a hash) as well as the Secret from which they originated.

The Kubelet removes the original record of intent.

The Kubelet retains the record of successful pull for later use.

When future pods scheduled to the same node request the previously pulled private image:

The Kubelet checks the credentials that the new pod provides for the pull.

If the hash of these credentials, or the source Secret of the credentials match the hash or source Secret which were recorded for a previous successful pull, the pod is allowed to use the previously pulled image.

If the credentials or their source Secret are not found in the records of successful pulls for that image, the Kubelet will attempt to use these new credentials to request a pull from the remote registry, triggering the authorization flow.

Try it out

In Kubernetes v1.33 we shipped the alpha version of this feature. To give it a spin, enable the KubeletEnsureSecretPulledImages feature gate for your 1.33 Kubelets.

You can learn more about the feature and additional optional configuration on the concept page for Images in the official Kubernetes documentation.

What's next?

In future releases we are going to:

Make this feature work together with Projected service account tokens for Kubelet image credential providers which adds a new, workload-specific source of image pull credentials.

Write a benchmarking suite to measure the performance of this feature and assess the impact of any future changes.

Implement an in-memory caching layer so that we don't need to read files for each image pull request.

Add support for credential expirations, thus forcing previously validated credentials to be re-authenticated.

How to get involved

Reading KEP-2535 is a great way to understand these changes in depth.

If you are interested in further involvement, reach out to us on the #sig-auth-authenticators-dev channel on Kubernetes Slack (for an invitation, visit https://slack.k8s.io/). You are also welcome to join the bi-weekly SIG Auth meetings, held every other Wednesday.

via Kubernetes Blog https://kubernetes.io/

May 12, 2025 at 02:30PM

·kubernetes.io·May 13, 2025

Kubernetes v1.33: Image Pull Policy the way you always thought it worked!

Clean Dependency Project

repo for gh-pages

·clean-dependency-project.github.io·May 12, 2025

Clean Dependency Project

DevOps Toolkit - Claude Code: AI Agent for DevOps SRE and Platform Engineering - https://www.youtube.com/watch?v=h-6LP133o6w

Claude Code: AI Agent for DevOps, SRE, and Platform Engineering

Discover the ultimate AI agent for DevOps, SRE, and Platform Engineering! This video explores Claude Code from Anthropic, comparing it to popular tools like GitHub Copilot and Cursor. Learn how Claude Code excels in terminal-based operations, understanding complex project structures, and executing commands with precision. See examples of its capabilities in setting up environments, running tests, and analyzing code. Uncover the pros and cons, including its superior performance and potential cost considerations. This video offers valuable insights into the future of AI in software engineering.

AICodeAssistant, #DevOpsTools, #SoftwareEngineering

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/claude-code-ai-agent-for-devops-sre-and-platform-engineering 🔗 Anthropic: https://anthropic.com

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 The Best AI Agent for Software Engineers 01:57 Claude Code AI Agent in Action 13:02 Claude Code AI Agent Pros and Cons

via YouTube https://www.youtube.com/watch?v=h-6LP133o6w

·youtube.com·May 12, 2025

DevOps Toolkit - Claude Code: AI Agent for DevOps SRE and Platform Engineering - https://www.youtube.com/watch?v=h-6LP133o6w