1_r/devopsish

1_r/devopsish

54496 bookmarks
Custom sorting
Beyond the Repository: Best practices for open source ecosystems researchers
Beyond the Repository: Best practices for open source ecosystems researchers
Much of the existing research about open source elects to study software repositories instead of ecosystems. An open source repository most often refers to the artifacts recorded in a version control system and occasionally includes interactions around ...
·dl.acm.org·
Beyond the Repository: Best practices for open source ecosystems researchers
Marcus Noble's tips on giving technical talks
Marcus Noble's tips on giving technical talks
I've been giving talks at meetups and conferences for a few years now. I started off after the encouragement of my friends giving their own talk and looking so cool doing it! It's taken a while but I think I'm at a stage now where I'm not only good at it (at least I hope so 😅) but I feel confident and comfortable while doing it. I want everyone to have that same confidence and I want to hear ALL OF YOU giving talks too! You have stories to tell, lessons to share and experience to pass on. So here is my learnings on how I approach giving a talk in front of a crowd of techies, mainly focussed on technical talks but most of this should apply to most public speaking.
·marcusnoble.co.uk·
Marcus Noble's tips on giving technical talks
The valley of engineering despair
The valley of engineering despair
I have delivered a lot of successful engineering projects. When I start on a project, I’m now very (perhaps unreasonably) confident that I will ship it…
·seangoedecke.com·
The valley of engineering despair
A New Kali Linux Archive Signing Key | Kali Linux Blog
A New Kali Linux Archive Signing Key | Kali Linux Blog
TL;DR Bad news for Kali Linux users! In the coming day(s), apt update is going to fail for pretty much everyone out there: Missing key 827C8569F2518CC677FECA1AED65462EC8D5E4C5, which is needed to verify signature. Reason is, we had to roll a new signing key for the Kali repository. You need to download and install the new key manually, here’s the one-liner:
·kali.org·
A New Kali Linux Archive Signing Key | Kali Linux Blog
CNCF and Synadia Reach an Agreement on NATS
CNCF and Synadia Reach an Agreement on NATS
For a minute there, it looked like we were in for an ugly, legal fight over control of the NATS messaging system. But Synadia has backed off, and all's well now.
·thenewstack.io·
CNCF and Synadia Reach an Agreement on NATS
Kubernetes upgrades: beyond the one-click update with Tanat Lokejaroenlarb
Kubernetes upgrades: beyond the one-click update with Tanat Lokejaroenlarb

Kubernetes upgrades: beyond the one-click update, with Tanat Lokejaroenlarb

https://ku.bz/VVHFfXGl_

Discover how Adevinta manages Kubernetes upgrades at scale in this episode with Tanat Lokejaroenlarb. Tanat shares his team's journey from time-consuming blue-green deployments to efficient in-place upgrades for their multi-tenant Kubernetes platform SHIP, detailing the engineering decisions and operational challenges they overcame.

You will learn:

How to transition from blue-green to in-place Kubernetes upgrades while maintaining service reliability

Techniques for tracking and addressing API deprecations using tools like Pluto and Kube-no-trouble

Strategies for minimizing SLO impact during node rebuilds through serialized approaches and proper PDB configuration

Why a phased upgrade approach with "cluster waves" provides safer production deployments even with thorough testing

Sponsor

This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/VVHFfXGl_

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

May 06, 2025 at 06:00AM

·kube.fm·
Kubernetes upgrades: beyond the one-click update with Tanat Lokejaroenlarb
Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA
Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA

Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA

https://kubernetes.io/blog/2025/05/05/kubernetes-v1-33-prevent-persistentvolume-leaks-when-deleting-out-of-order-graduate-to-ga/

I am thrilled to announce that the feature to prevent PersistentVolume (or PVs for short) leaks when deleting out of order has graduated to General Availability (GA) in Kubernetes v1.33! This improvement, initially introduced as a beta feature in Kubernetes v1.31, ensures that your storage resources are properly reclaimed, preventing unwanted leaks.

How did reclaim work in previous Kubernetes releases?

PersistentVolumeClaim (or PVC for short) is a user's request for storage. A PV and PVC are considered Bound if a newly created PV or a matching PV is found. The PVs themselves are backed by volumes allocated by the storage backend.

Normally, if the volume is to be deleted, then the expectation is to delete the PVC for a bound PV-PVC pair. However, there are no restrictions on deleting a PV before deleting a PVC.

For a Bound PV-PVC pair, the ordering of PV-PVC deletion determines whether the PV reclaim policy is honored. The reclaim policy is honored if the PVC is deleted first; however, if the PV is deleted prior to deleting the PVC, then the reclaim policy is not exercised. As a result of this behavior, the associated storage asset in the external infrastructure is not removed.

PV reclaim policy with Kubernetes v1.33

With the graduation to GA in Kubernetes v1.33, this issue is now resolved. Kubernetes now reliably honors the configured Delete reclaim policy, even when PVs are deleted before their bound PVCs. This is achieved through the use of finalizers, ensuring that the storage backend releases the allocated storage resource as intended.

How does it work?

For CSI volumes, the new behavior is achieved by adding a finalizer external-provisioner.volume.kubernetes.io/finalizer on new and existing PVs. The finalizer is only removed after the storage from the backend is deleted. Addition or removal of finalizer is handled by external-provisioner `

An example of a PV with the finalizer, notice the new finalizer in the finalizers list

kubectl get pv pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 -o yaml

apiVersion: v1 kind: PersistentVolume metadata: annotations: pv.kubernetes.io/provisioned-by: csi.example.driver.com creationTimestamp: "2021-11-17T19:28:56Z" finalizers:

  • kubernetes.io/pv-protection
  • external-provisioner.volume.kubernetes.io/finalizer name: pvc-a7b7e3ba-f837-45ba-b243-dec7d8aaed53 resourceVersion: "194711" uid: 087f14f2-4157-4e95-8a70-8294b039d30e spec: accessModes:
  • ReadWriteOnce capacity: storage: 1Gi claimRef: apiVersion: v1 kind: PersistentVolumeClaim name: example-vanilla-block-pvc namespace: default resourceVersion: "194677" uid: a7b7e3ba-f837-45ba-b243-dec7d8aaed53 csi: driver: csi.example.driver.com fsType: ext4 volumeAttributes: storage.kubernetes.io/csiProvisionerIdentity: 1637110610497-8081-csi.example.driver.com type: CNS Block Volume volumeHandle: 2dacf297-803f-4ccc-afc7-3d3c3f02051e persistentVolumeReclaimPolicy: Delete storageClassName: example-vanilla-block-sc volumeMode: Filesystem status: phase: Bound

The finalizer prevents this PersistentVolume from being removed from the cluster. As stated previously, the finalizer is only removed from the PV object after it is successfully deleted from the storage backend. To learn more about finalizers, please refer to Using Finalizers to Control Deletion.

Similarly, the finalizer kubernetes.io/pv-controller is added to dynamically provisioned in-tree plugin volumes.

Important note

The fix does not apply to statically provisioned in-tree plugin volumes.

How to enable new behavior?

To take advantage of the new behavior, you must have upgraded your cluster to the v1.33 release of Kubernetes and run the CSI external-provisioner version 5.0.1 or later. The feature was released as beta in v1.31 release of Kubernetes, where it was enabled by default.

References

KEP-2644

Volume leak issue

Beta Release Blog

How do I get involved?

The Kubernetes Slack channel SIG Storage communication channels are great mediums to reach out to the SIG Storage and migration working group teams.

Special thanks to the following people for the insightful reviews, thorough consideration and valuable contribution:

Fan Baofa (carlory)

Jan Šafránek (jsafrane)

Xing Yang (xing-yang)

Matthew Wong (wongma7)

Join the Kubernetes Storage Special Interest Group (SIG) if you're interested in getting involved with the design and development of CSI or any part of the Kubernetes Storage system. We’re rapidly growing and always welcome new contributors.

via Kubernetes Blog https://kubernetes.io/

May 05, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: Prevent PersistentVolume Leaks When Deleting out of Order graduates to GA
It's a Trap! The Two Generals' Problem
It's a Trap! The Two Generals' Problem
In distributed systems, coordination is hard—really hard—especially when both parties depend on mutual confirmation to proceed, but there’s no guarantee their messages will arrive. This classic…
·particular.net·
It's a Trap! The Two Generals' Problem
We the builders - we find the truth and tell the truth
We the builders - we find the truth and tell the truth
For decades, we've done our jobs in the background. We helped with filing taxes, getting veterans' benefits, applying for financial aid, refugees navigating immigration, everyone find vaccines, parents find baby formula
·wethebuilders.org·
We the builders - we find the truth and tell the truth
Kubernetes v1.33: Mutable CSI Node Allocatable Count
Kubernetes v1.33: Mutable CSI Node Allocatable Count

Kubernetes v1.33: Mutable CSI Node Allocatable Count

https://kubernetes.io/blog/2025/05/02/kubernetes-1-33-mutable-csi-node-allocatable-count/

Scheduling stateful applications reliably depends heavily on accurate information about resource availability on nodes. Kubernetes v1.33 introduces an alpha feature called mutable CSI node allocatable count, allowing Container Storage Interface (CSI) drivers to dynamically update the reported maximum number of volumes that a node can handle. This capability significantly enhances the accuracy of pod scheduling decisions and reduces scheduling failures caused by outdated volume capacity information.

Background

Traditionally, Kubernetes CSI drivers report a static maximum volume attachment limit when initializing. However, actual attachment capacities can change during a node's lifecycle for various reasons, such as:

Manual or external operations attaching/detaching volumes outside of Kubernetes control.

Dynamically attached network interfaces or specialized hardware (GPUs, NICs, etc.) consuming available slots.

Multi-driver scenarios, where one CSI driver’s operations affect available capacity reported by another.

Static reporting can cause Kubernetes to schedule pods onto nodes that appear to have capacity but don't, leading to pods stuck in a ContainerCreating state.

Dynamically adapting CSI volume limits

With the new feature gate MutableCSINodeAllocatableCount, Kubernetes enables CSI drivers to dynamically adjust and report node attachment capacities at runtime. This ensures that the scheduler has the most accurate, up-to-date view of node capacity.

How it works

When this feature is enabled, Kubernetes supports two mechanisms for updating the reported node volume limits:

Periodic Updates: CSI drivers specify an interval to periodically refresh the node's allocatable capacity.

Reactive Updates: An immediate update triggered when a volume attachment fails due to exhausted resources (ResourceExhausted error).

Enabling the feature

To use this alpha feature, you must enable the MutableCSINodeAllocatableCount feature gate in these components:

kube-apiserver

kubelet

Example CSI driver configuration

Below is an example of configuring a CSI driver to enable periodic updates every 60 seconds:

apiVersion: storage.k8s.io/v1 kind: CSIDriver metadata: name: example.csi.k8s.io spec: nodeAllocatableUpdatePeriodSeconds: 60

This configuration directs Kubelet to periodically call the CSI driver's NodeGetInfo method every 60 seconds, updating the node’s allocatable volume count. Kubernetes enforces a minimum update interval of 10 seconds to balance accuracy and resource usage.

Immediate updates on attachment failures

In addition to periodic updates, Kubernetes now reacts to attachment failures. Specifically, if a volume attachment fails with a ResourceExhausted error (gRPC code 8), an immediate update is triggered to correct the allocatable count promptly.

This proactive correction prevents repeated scheduling errors and helps maintain cluster health.

Getting started

To experiment with mutable CSI node allocatable count in your Kubernetes v1.33 cluster:

Enable the feature gate MutableCSINodeAllocatableCount on the kube-apiserver and kubelet components.

Update your CSI driver configuration by setting nodeAllocatableUpdatePeriodSeconds.

Monitor and observe improvements in scheduling accuracy and pod placement reliability.

Next steps

This feature is currently in alpha and the Kubernetes community welcomes your feedback. Test it, share your experiences, and help guide its evolution toward beta and GA stability.

Join discussions in the Kubernetes Storage Special Interest Group (SIG-Storage) to shape the future of Kubernetes storage capabilities.

via Kubernetes Blog https://kubernetes.io/

May 02, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: Mutable CSI Node Allocatable Count
FIPS 140: The Best Explanation Ever (Hopefully)
FIPS 140: The Best Explanation Ever (Hopefully)
Cryptography = modern cyber security. Full stop. It is at the core of everyone’s lives from buying a latte with Apple/Google Pay, messaging our friends, and even just checking the online news…
·itnext.io·
FIPS 140: The Best Explanation Ever (Hopefully)
Simplifying HPC: CIQ Releases User-Friendly UI and API for Warewulf - TFiR
Simplifying HPC: CIQ Releases User-Friendly UI and API for Warewulf - TFiR
CIQ has announced the tech preview release of a user-friendly management interface for the Warewulf cluster provisioning system. This new web interface, built on Cockpit and backed by a new, open source API, is designed to simplify and streamline management of high-performance computing (HPC) clusters. This new capability simplifies cluster administration for new and existing
·tfir.io·
Simplifying HPC: CIQ Releases User-Friendly UI and API for Warewulf - TFiR
The Redis saga continues | Redis is now available under the the OSI-approved AGPLv3 open source license.
The Redis saga continues | Redis is now available under the the OSI-approved AGPLv3 open source license.
The rise of hyperscalers like AWS and GCP has unlocked incredible speed and scale for startups and enterprises alike. But for companies rooted in open source, it has posed a fundamental challenge: how do you keep innovating and investing in OSS projects when cloud providers reap the profits and control the infrastructure without proportional contributions […]
·redis.io·
The Redis saga continues | Redis is now available under the the OSI-approved AGPLv3 open source license.
Kubernetes v1.33: New features in DRA
Kubernetes v1.33: New features in DRA

Kubernetes v1.33: New features in DRA

https://kubernetes.io/blog/2025/05/01/kubernetes-v1-33-dra-updates/

Kubernetes Dynamic Resource Allocation (DRA) was originally introduced as an alpha feature in the v1.26 release, and then went through a significant redesign for Kubernetes v1.31. The main DRA feature went to beta in v1.32, and the project hopes it will be generally available in Kubernetes v1.34.

The basic feature set of DRA provides a far more powerful and flexible API for requesting devices than Device Plugin. And while DRA remains a beta feature for v1.33, the DRA team has been hard at work implementing a number of new features and UX improvements. One feature has been promoted to beta, while a number of new features have been added in alpha. The team has also made progress towards getting DRA ready for GA.

Features promoted to beta

Driver-owned Resource Claim Status was promoted to beta. This allows the driver to report driver-specific device status data for each allocated device in a resource claim, which is particularly useful for supporting network devices.

New alpha features

Partitionable Devices lets a driver advertise several overlapping logical devices (“partitions”), and the driver can reconfigure the physical device dynamically based on the actual devices allocated. This makes it possible to partition devices on-demand to meet the needs of the workloads and therefore increase the utilization.

Device Taints and Tolerations allow devices to be tainted and for workloads to tolerate those taints. This makes it possible for drivers or cluster administrators to mark devices as unavailable. Depending on the effect of the taint, this can prevent devices from being allocated or cause eviction of pods that are using the device.

Prioritized List lets users specify a list of acceptable devices for their workloads, rather than just a single type of device. So while the workload might run best on a single high-performance GPU, it might also be able to run on 2 mid-level GPUs. The scheduler will attempt to satisfy the alternatives in the list in order, so the workload will be allocated the best set of devices available in the cluster.

Admin Access has been updated so that only users with access to a namespace with the resource.k8s.io/admin-access: "true" label are authorized to create ResourceClaim or ResourceClaimTemplates objects with the adminAccess field within the namespace. This grants administrators access to in-use devices and may enable additional permissions when making the device available in a container. This ensures that non-admin users cannot misuse the feature.

Preparing for general availability

A new v1beta2 API has been added to simplify the user experience and to prepare for additional features being added in the future. The RBAC rules for DRA have been improved and support has been added for seamless upgrades of DRA drivers.

What’s next?

The plan for v1.34 is even more ambitious than for v1.33. Most importantly, we (the Kubernetes device management working group) hope to bring DRA to general availability, which will make it available by default on all v1.34 Kubernetes clusters. This also means that many, perhaps all, of the DRA features that are still beta in v1.34 will become enabled by default, making it much easier to use them.

The alpha features that were added in v1.33 will be brought to beta in v1.34.

Getting involved

A good starting point is joining the WG Device Management Slack channel and meetings, which happen at US/EU and EU/APAC friendly time slots.

Not all enhancement ideas are tracked as issues yet, so come talk to us if you want to help or have some ideas yourself! We have work to do at all levels, from difficult core changes to usability enhancements in kubectl, which could be picked up by newcomers.

Acknowledgments

A huge thanks to everyone who has contributed:

Cici Huang (cici37)

Ed Bartosh (bart0sh

John Belamaric (johnbelamaric)

Jon Huhn (nojnhuh)

Kevin Klues (klueska)

Morten Torkildsen (mortent)

Patrick Ohly (pohly)

Rita Zhang (ritazh)

Shingo Omura (everpeace)

via Kubernetes Blog https://kubernetes.io/

May 01, 2025 at 02:30PM

·kubernetes.io·
Kubernetes v1.33: New features in DRA
Music Assistant
Music Assistant
Music Assistant is a music library manager for local and streaming providers
·music-assistant.io·
Music Assistant
AirBorne: Wormable Zero-Click RCE in Apple AirPlay Puts Billions of Devices at Risk | Oligo Security | Oligo Security
AirBorne: Wormable Zero-Click RCE in Apple AirPlay Puts Billions of Devices at Risk | Oligo Security | Oligo Security
Oligo Security reveals AirBorne, a new set of vulnerabilities in Apple’s AirPlay protocol and SDK. Learn how zero-click RCEs, ACL bypasses, and wormable exploits could endanger Apple and IoT devices worldwide — and how to protect yourself.
·oligo.security·
AirBorne: Wormable Zero-Click RCE in Apple AirPlay Puts Billions of Devices at Risk | Oligo Security | Oligo Security