
1_r/devopsish
Image Compatibility In Cloud Native Environments
https://kubernetes.io/blog/2025/06/25/image-compatibility-in-cloud-native-environments/
In industries where systems must run very reliably and meet strict performance criteria such as telecommunication, high-performance or AI computing, containerized applications often need specific operating system configuration or hardware presence. It is common practice to require the use of specific versions of the kernel, its configuration, device drivers, or system components. Despite the existence of the Open Container Initiative (OCI), a governing community to define standards and specifications for container images, there has been a gap in expression of such compatibility requirements. The need to address this issue has led to different proposals and, ultimately, an implementation in Kubernetes' Node Feature Discovery (NFD).
NFD is an open source Kubernetes project that automatically detects and reports hardware and system features of cluster nodes. This information helps users to schedule workloads on nodes that meet specific system requirements, which is especially useful for applications with strict hardware or operating system dependencies.
The need for image compatibility specification
Dependencies between containers and host OS
A container image is built on a base image, which provides a minimal runtime environment, often a stripped-down Linux userland, completely empty or distroless. When an application requires certain features from the host OS, compatibility issues arise. These dependencies can manifest in several ways:
Drivers: Host driver versions must match the supported range of a library version inside the container to avoid compatibility problems. Examples include GPUs and network drivers.
Libraries or Software: The container must come with a specific version or range of versions for a library or software to run optimally in the environment. Examples from high performance computing are MPI, EFA, or Infiniband.
Kernel Modules or Features:: Specific kernel features or modules must be present. Examples include having support of write protected huge page faults, or the presence of VFIO
And more…
While containers in Kubernetes are the most likely unit of abstraction for these needs, the definition of compatibility can extend further to include other container technologies such as Singularity and other OCI artifacts such as binaries from a spack binary cache.
Multi-cloud and hybrid cloud challenges
Containerized applications are deployed across various Kubernetes distributions and cloud providers, where different host operating systems introduce compatibility challenges. Often those have to be pre-configured before workload deployment or are immutable. For instance, different cloud providers will include different operating systems like:
RHCOS/RHEL
Photon OS
Amazon Linux 2
Container-Optimized OS
Azure Linux OS
And more...
Each OS comes with unique kernel versions, configurations, and drivers, making compatibility a non-trivial issue for applications requiring specific features. It must be possible to quickly assess a container for its suitability to run on any specific environment.
Image compatibility initiative
An effort was made within the Open Containers Initiative Image Compatibility working group to introduce a standard for image compatibility metadata. A specification for compatibility would allow container authors to declare required host OS features, making compatibility requirements discoverable and programmable. The specification implemented in Kubernetes Node Feature Discovery is one of the discussed proposals. It aims to:
Define a structured way to express compatibility in OCI image manifests.
Support a compatibility specification alongside container images in image registries.
Allow automated validation of compatibility before scheduling containers.
The concept has since been implemented in the Kubernetes Node Feature Discovery project.
Implementation in Node Feature Discovery
The solution integrates compatibility metadata into Kubernetes via NFD features and the NodeFeatureGroup API. This interface enables the user to match containers to nodes based on exposing features of hardware and software, allowing for intelligent scheduling and workload optimization.
Compatibility specification
The compatibility specification is a structured list of compatibility objects containing Node Feature Groups. These objects define image requirements and facilitate validation against host nodes. The feature requirements are described by using the list of available features from the NFD project. The schema has the following structure:
version (string) - Specifies the API version.
compatibilities (array of objects) - List of compatibility sets.
rules (object) - Specifies NodeFeatureGroup to define image requirements.
weight (int, optional) - Node affinity weight.
tag (string, optional) - Categorization tag.
description (string, optional) - Short description.
An example might look like the following:
version: v1alpha1 compatibilities:
- description: "My image requirements"
rules:
- name: "kernel and cpu" matchFeatures:
- feature: kernel.loadedmodule matchExpressions: vfio-pci: {op: Exists}
- feature: cpu.model matchExpressions: vendor_id: {op: In, value: ["Intel", "AMD"]}
- name: "one of available nics" matchAny:
- matchFeatures:
- feature: pci.device matchExpressions: vendor: {op: In, value: ["0eee"]} class: {op: In, value: ["0200"]}
- matchFeatures:
- feature: pci.device matchExpressions: vendor: {op: In, value: ["0fff"]} class: {op: In, value: ["0200"]}
Client implementation for node validation
To streamline compatibility validation, we implemented a client tool that allows for node validation based on an image's compatibility artifact. In this workflow, the image author would generate a compatibility artifact that points to the image it describes in a registry via the referrers API. When a need arises to assess the fit of an image to a host, the tool can discover the artifact and verify compatibility of an image to a node before deployment. The client can validate nodes both inside and outside a Kubernetes cluster, extending the utility of the tool beyond the single Kubernetes use case. In the future, image compatibility could play a crucial role in creating specific workload profiles based on image compatibility requirements, aiding in more efficient scheduling. Additionally, it could potentially enable automatic node configuration to some extent, further optimizing resource allocation and ensuring seamless deployment of specialized workloads.
Examples of usage
Define image compatibility metadata
A container image can have metadata that describes its requirements based on features discovered from nodes, like kernel modules or CPU models. The previous compatibility specification example in this article exemplified this use case.
Attach the artifact to the image
The image compatibility specification is stored as an OCI artifact. You can attach this metadata to your container image using the oras tool. The registry only needs to support OCI artifacts, support for arbitrary types is not required. Keep in mind that the container image and the artifact must be stored in the same registry. Use the following command to attach the artifact to the image:
oras attach \ --artifact-type application/vnd.nfd.image-compatibility.v1alpha1 <image-url> \ <path-to-spec>.yaml:application/vnd.nfd.image-compatibility.spec.v1alpha1+yaml
Validate image compatibility
After attaching the compatibility specification, you can validate whether a node meets the image's requirements. This validation can be done using the nfd client:
nfd compat validate-node --image <image-url>
Read the output from the client
Finally you can read the report generated by the tool or use your own tools to act based on the generated JSON report.
Conclusion
The addition of image compatibility to Kubernetes through Node Feature Discovery underscores the growing importance of addressing compatibility in cloud native environments. It is only a start, as further work is needed to integrate compatibility into scheduling of workloads within and outside of Kubernetes. However, by integrating this feature into Kubernetes, mission-critical workloads can now define and validate host OS requirements more efficiently. Moving forward, the adoption of compatibility metadata within Kubernetes ecosystems will significantly enhance the reliability and performance of specialized containerized applications, ensuring they meet the stringent requirements of industries like telecommunications, high-performance computing or any environment that requires special hardware or host OS configuration.
Get involved
Join the Kubernetes Node Feature Discovery project if you're interested in getting involved with the design and development of Image Compatibility API and tools. We always welcome new contributors.
via Kubernetes Blog https://kubernetes.io/
June 24, 2025 at 08:00PM
Dear friend, you have built a Kubernetes, with Mac Chaffee
Mac Chaffee, a platform engineer and security champion, examines why developers often underestimate the complexity of running modern applications and how overconfidence leads to expensive technical mistakes.
You will learn:
Why teams reject Kubernetes then rebuild it piece by piece - understanding the psychological factors, like overconfidence, that drive initial rejection of complex but proven tools
How to identify the tipping point when DIY solutions become more complex than adopting established orchestration tools, especially around scaling and high availability challenges
The right approach to abstracting Kubernetes complexity - why hiding the Kubernetes API often backfires and how to build effective guardrails instead of reinventing interfaces
Why mentorship gaps lead to poor technical decisions - how the lack of proper apprenticeship programs in tech results in teams making expensive mistakes when building infrastructure
Sponsor
This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/9nFPmG85f
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
June 24, 2025 at 06:00AM
Ep25 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=6UZnp38Txf4
My Workflow With AI: How I Code, Test, and Deploy Faster Than Ever
Discover how AI transforms software development workflows in this hands-on demonstration. Follow along as we explore a streamlined approach where AI agents generate detailed Product Requirement Documents (PRDs), manage tasks, write and test code, and automate complex development processes. Learn about the powerful combination of models, agents, MCP servers, and customized instructions that enable one developer to effectively orchestrate an entire AI-powered team.
This video showcases a practical, AI-driven workflow designed to simplify and accelerate software development, highlighting tools like Cursor IDE, Taskmaster, GitHub integration, and Memory MCP. Whether you're curious about AI-assisted coding or looking to enhance your own development practices, join us to see how Artificial Intelligence is reshaping the way we build software.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Blacksmith 🔗 https://blacksmith.sh ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
AIWorkflow #CursorIDE #AutomatedDevelopment
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/my-workflow-with-ai-how-i-code-test-and-deploy-faster-than-ever 🎬 The Missing Link: How MCP Servers Supercharge Your AI Coding Assistant: https://youtu.be/n0dCFY6wMeI 🎬 From Shame to Fame: How I Fixed My Lazy Vibe Coding Habits with Taskmaster: https://youtu.be/0WtCBbIHoKE
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Development Workflow with AI 01:26 blacksmith (sponsor) 02:33 Create PRDs Using AI 06:10 Get PRDs with AI 08:23 Implement PRD Code and Tests 09:42 Execute Final PRD Tasks and Workflows 12:07 How Does It All Work?
via YouTube https://www.youtube.com/watch?v=2E610yzqQwg
Ep25 - Ask Me Anything About Anything with Kostis Kapelonis and Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, Cloud, Kubernetes, Platform Engineering, containers, or anything else. We'll have special guests Kostis Kapelonis and Scott Rosenberg to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Codefresh 🔗 GitOps Argo CD Certifications: https://learning.codefresh.io (use "viktor" for a 50% discount) ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=wOcQqdUfMbI
Week Ending June 15, 2025
https://lwkd.info/2025/20250617
Developer News
Kubernetes Slack is downgrading to a regular free account. Not only does this change how we use Slack, community members need to take action to preserve some things that are not part of regular backups.
The Go team fixed a symlink race condition in os.RemoveAll in Go versions 1.21.11 and 1.22.4. The Kubernetes Security Response Committee confirmed this vulnerability can allow file deletion on a Node. This issue will be fixed in the patch releases coming out on Wednesday
Release Schedule
Next Deadline: Enhancements Freeze, June 20
Hopefully everyone has their PRRs started, and this Friday is the deadline for opt-in for Enhancements. Get your 1.34 changes listed.
Kubernetes v1.34.0-alpha.1 has been built and pushed. Please review the changes and test the release.
Patch releases are due out on June 18th.
Featured PRs
132007: Fix: HPA suppresses FailedRescale event on successful conflict retry
This PR modifies the HPA controller to only emit a FailedRescale event if a scaling operation fails after retrying due to a conflict; If the retry succeeds, it will emit a SuccessfulRescale event instead. This change ensures that transient conflicts do not generate unnecessary failure events and reduces noise in the event logs.
132251: kubectl delete: update interactive delete to break on new line
This PR updates kubectl delete interactive mode to treat an empty newline as “No”. Previously, pressing “Enter” on an empty line would send a new line. With this update, pressing “Enter” now automatically responds with “No”, improving safety and ensuring that empty inputs don’t result in unintended actions.
KEP of the Week
KEP 2837: Pod Level Resource Specifications
The KEP extends the Pod API to support Pod-level resource limits and requests for non-extended resources in addition to existing container-level resource allocation. Previously, resource requests and limits could be set only at the container level, which limited flexibility and ease of resource management for the pod as a whole. The existing behaviour was problematic for users who wanted to limit the overall resource consumption of the entire pod.
This KEP is tracked for beta in v1.34.
Other Merges
kubernetes.io/initial-events-list-blueprint annotation removed from “Bookmark” event for watch stream requests
Missing conformance coverage for servicecidr read status endpoint
Go version for publishing bot rules updated
Support for API streaming from the rest client removed
Incorrect reference to JoinConfigurationKind in error message removed
Deprecated encryption config controller metrics removed
validation-gen code generator now generates validation code that supports validation ratcheting
Kubernetes is now built using Go 1.24.4
DRA kubelet: logging now uses driverName like the rest of the Kubernetes components
e2e tests for PodLifecycleSleepAction fixed to avoid flakes
Promotions
PreferSameTrafficDistribution to beta
NodeLocalCRISocket to beta
SeparateTaintEvictionController to stable
Subprojects and Dependency Updates
containerd v2.1.2 updates grpc to v1.72.2, fixes erofs error checks, improves mount error messages, updates image transfer logic, and prevents shim leaks
Shoutouts
No shoutouts this week. Want to thank someone for special efforts to improve Kubernetes? Tag them in the #shoutouts channel.
via Last Week in Kubernetes Development https://lwkd.info/
June 17, 2025 at 07:00PM
Beyond Kubernetes: Serverless Execution Models for Variable Workloads, with Marc Campora
Marc Campora, a systems consultant with experience in high-throughput platforms, shares his analysis of a real customer deployment with 500+ microservices. He breaks down the cost implications, technical constraints, and operational trade-offs between Kubernetes containers and AWS Lambda functions based on actual production data and migration assessments.
You will learn:
Cost analysis frameworks for comparing Lambda vs Kubernetes across different traffic patterns, including specific examples of 3x savings potential and the 80/20 rule for service utilization
Migration complexity factors when moving existing microservices to Lambda, including cold start issues, runtime model changes, and why it's often a complete rewrite rather than a simple port
Decision criteria for choosing between platforms based on traffic consistency, computational requirements, and operational overhead tolerance
Sponsor
This episode is sponsored by Learnk8s — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/5gMTkzLhV
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
June 17, 2025 at 06:00AM