1_r/devopsish

1_r/devopsish

54834 bookmarks
Custom sorting
More Kubernetes Than I Bargained For with Amos Wenger
More Kubernetes Than I Bargained For with Amos Wenger

More Kubernetes Than I Bargained For, with Amos Wenger

https://ku.bz/6Ll_7slr9

Amos Wenger walks through his production incident where adding a home computer as a Kubernetes node caused TLS certificate renewals to fail. The discussion covers debugging techniques using tools like netshoot and K9s, and explores the unexpected interactions between Kubernetes overlay networks and consumer routers.

You will learn:

How Kubernetes networking assumptions break when mixing cloud VMs with nodes behind consumer routers, and why cert-manager challenges fail in NAT environments

The differences between CNI plugins like Flannel and Calico, particularly how they handle IPv6 translation

Debugging techniques for network issues using tools like netshoot, K9s, and iproute2

Best practices for mixed infrastructure including proper node labeling, taints, and scheduling controls

Sponsor

This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.

More info

Find all the links and info for this episode here: https://ku.bz/6Ll_7slr9

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

November 25, 2025 at 05:00AM

·kube.fm·
More Kubernetes Than I Bargained For with Amos Wenger
Kubernetes Configuration Good Practices
Kubernetes Configuration Good Practices

Kubernetes Configuration Good Practices

https://kubernetes.io/blog/2025/11/25/configuration-good-practices/

Configuration is one of those things in Kubernetes that seems small until it's not. Configuration is at the heart of every Kubernetes workload. A missing quote, a wrong API version or a misplaced YAML indent can ruin your entire deploy.

This blog brings together tried-and-tested configuration best practices. The small habits that make your Kubernetes setup clean, consistent and easier to manage. Whether you are just starting out or already deploying apps daily, these are the little things that keep your cluster stable and your future self sane.

This blog is inspired by the original Configuration Best Practices page, which has evolved through contributions from many members of the Kubernetes community.

General configuration practices

Use the latest stable API version

Kubernetes evolves fast. Older APIs eventually get deprecated and stop working. So, whenever you are defining resources, make sure you are using the latest stable API version. You can always check with

kubectl api-resources

This simple step saves you from future compatibility issues.

Store configuration in version control

Never apply manifest files directly from your desktop. Always keep them in a version control system like Git, it's your safety net. If something breaks, you can instantly roll back to a previous commit, compare changes or recreate your cluster setup without panic.

Write configs in YAML not JSON

Write your configuration files using YAML rather than JSON. Both work technically, but YAML is just easier for humans. It's cleaner to read and less noisy and widely used in the community.

YAML has some sneaky gotchas with boolean values: Use only true or false. Don't write yes, no, on or off. They might work in one version of YAML but break in another. To be safe, quote anything that looks like a Boolean (for example "yes").

Keep configuration simple and minimal

Avoid setting default values that are already handled by Kubernetes. Minimal manifests are easier to debug, cleaner to review and less likely to break things later.

Group related objects together

If your Deployment, Service and ConfigMap all belong to one app, put them in a single manifest file.

It's easier to track changes and apply them as a unit. See the Guestbook all-in-one.yaml file for an example of this syntax.

You can even apply entire directories with:

kubectl apply -f configs/

One command and boom everything in that folder gets deployed.

Add helpful annotations

Manifest files are not just for machines, they are for humans too. Use annotations to describe why something exists or what it does. A quick one-liner can save hours when debugging later and also allows better collaboration.

The most helpful annotation to set is kubernetes.io/description. It's like using comment, except that it gets copied into the API so that everyone else can see it even after you deploy.

Managing Workloads: Pods, Deployments, and Jobs

A common early mistake in Kubernetes is creating Pods directly. Pods work, but they don't reschedule themselves if something goes wrong.

Naked Pods (Pods not managed by a controller, such as Deployment or a StatefulSet) are fine for testing, but in real setups, they are risky.

Why? Because if the node hosting that Pod dies, the Pod dies with it and Kubernetes won't bring it back automatically.

Use Deployments for apps that should always be running

A Deployment, which both creates a ReplicaSet to ensure that the desired number of Pods is always available, and specifies a strategy to replace Pods (such as RollingUpdate), is almost always preferable to creating Pods directly. You can roll out a new version, and if something breaks, roll back instantly.

Use Jobs for tasks that should finish

A Job is perfect when you need something to run once and then stop like database migration or batch processing task. It will retry if the pods fails and report success when it's done.

Service Configuration and Networking

Services are how your workloads talk to each other inside (and sometimes outside) your cluster. Without them, your pods exist but can't reach anyone. Let's make sure that doesn't happen.

Create Services before workloads that use them

When Kubernetes starts a Pod, it automatically injects environment variables for existing Services. So, if a Pod depends on a Service, create a Service before its corresponding backend workloads (Deployments or StatefulSets), and before any workloads that need to access it.

For example, if a Service named foo exists, all containers will get the following variables in their initial environment:

FOO_SERVICE_HOST=<the host the Service runs on> FOO_SERVICE_PORT=<the port the Service runs on>

DNS based discovery doesn't have this problem, but it's a good habit to follow anyway.

Use DNS for Service discovery

If your cluster has the DNS add-on (most do), every Service automatically gets a DNS entry. That means you can access it by name instead of IP:

curl http://my-service.default.svc.cluster.local

It's one of those features that makes Kubernetes networking feel magical.

Avoid hostPort and hostNetwork unless absolutely necessary

You'll sometimes see these options in manifests:

hostPort: 8080 hostNetwork: true

But here's the thing: They tie your Pods to specific nodes, making them harder to schedule and scale. Because each <hostIP, hostPort, protocol> combination must be unique. If you don't specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol. Unless you're debugging or building something like a network plugin, avoid them.

If you just need local access for testing, try kubectl port-forward:

kubectl port-forward deployment/web 8080:80

See Use Port Forwarding to access applications in a cluster to learn more. Or if you really need external access, use a type: NodePort Service. That's the safer, Kubernetes-native way.

Use headless Services for internal discovery

Sometimes, you don't want Kubernetes to load balance traffic. You want to talk directly to each Pod. That's where headless Services come in.

You create one by setting clusterIP: None. Instead of a single IP, DNS gives you a list of all Pods IPs, perfect for apps that manage connections themselves.

Working with labels effectively

Labels are key/value pairs that are attached to objects such as Pods. Labels help you organize, query and group your resources. They don't do anything by themselves, but they make everything else from Services to Deployments work together smoothly.

Use semantics labels

Good labels help you understand what's what, even after months later. Define and use labels that identify semantic attributes of your application or Deployment. For example;

labels: app.kubernetes.io/name: myapp app.kubernetes.io/component: web tier: frontend phase: test

app.kubernetes.io/name : what the app is

tier : which layer it belongs to (frontend/backend)

phase : which stage it's in (test/prod)

You can then use these labels to make powerful selectors. For example:

kubectl get pods -l tier=frontend

This will list all frontend Pods across your cluster, no matter which Deployment they came from. Basically you are not manually listing Pod names; you are just describing what you want. See the guestbook app for examples of this approach.

Use common Kubernetes labels

Kubernetes actually recommends a set of common labels. It's a standardized way to name things across your different workloads or projects. Following this convention makes your manifests cleaner, and it means that tools such as Headlamp, dashboard, or third-party monitoring systems can all automatically understand what's running.

Manipulate labels for debugging

Since controllers (like ReplicaSets or Deployments) use labels to manage Pods, you can remove a label to “detach” a Pod temporarily.

Example:

kubectl label pod mypod app-

The app- part removes the label key app. Once that happens, the controller won’t manage that Pod anymore. It’s like isolating it for inspection, a “quarantine mode” for debugging. To interactively remove or add labels, use kubectl label.

You can then check logs, exec into it and once done, delete it manually. That’s a super underrated trick every Kubernetes engineer should know.

Handy kubectl tips

These small tips make life much easier when you are working with multiple manifest files or clusters.

Apply entire directories

Instead of applying one file at a time, apply the whole folder:

Using server-side apply is also a good practice

kubectl apply -f configs/ --server-side

This command looks for .yaml, .yml and .json files in that folder and applies them all together. It's faster, cleaner and helps keep things grouped by app.

Use label selectors to get or delete resources

You don't always need to type out resource names one by one. Instead, use selectors to act on entire groups at once:

kubectl get pods -l app=myapp kubectl delete pod -l phase=test

It's especially useful in CI/CD pipelines, where you want to clean up test resources dynamically.

Quickly create Deployments and Services

For quick experiments, you don't always need to write a manifest. You can spin up a Deployment right from the CLI:

kubectl create deployment webapp --image=nginx

Then expose it as a Service:

kubectl expose deployment webapp --port=80

This is great when you just want to test something before writing full manifests. Also, see Use a Service to Access an Application in a cluster for an example.

Conclusion

Cleaner configuration leads to calmer cluster administrators. If you stick to a few simple habits: keep configuration simple and minimal, version-control everything, use consistent labels, and avoid relying on naked Pods, you'll save yourself hours of debugging down the road.

The best part? Clean configurations stay readable. Even after months, you or anyone on yo

·kubernetes.io·
Kubernetes Configuration Good Practices
KubeCon Wrap Up by Chris Short, Head of Open Source at CIQ
KubeCon Wrap Up by Chris Short, Head of Open Source at CIQ
Last week, CIQ participated in KubeCon + CloudNativeCon North America 2025 in Atlanta, Georgia. CIQ's very own Chris Short hosted a talk during the event, participated in the CNCF Maintainers Summit…
·ciq.com·
KubeCon Wrap Up by Chris Short, Head of Open Source at CIQ
DevOps & AI Toolkit - Gemini 3 Is Fast But Gaslights You at 128 Tokens/Second - https://www.youtube.com/watch?v=AUoqr5r1pBY
DevOps & AI Toolkit - Gemini 3 Is Fast But Gaslights You at 128 Tokens/Second - https://www.youtube.com/watch?v=AUoqr5r1pBY

Gemini 3 Is Fast But Gaslights You at 128 Tokens/Second

Gemini 3 is undeniably fast and impressive on benchmarks, but after a full week of real-world software engineering work, the reality is more complicated. While everyone's been hyping its capabilities based on day-one reviews and marketing materials, this video digs into what actually matters: how Gemini 3 performs with coding agents on real projects, not just one-shot Tetris games or simple websites. The speed is remarkable at 128 tokens per second, but it comes with serious trade-offs that affect daily pair programming work.

The core issues are frustrating: Gemini 3 is nearly impossible to redirect once it commits to a plan, suffers from an 88% hallucination rate (nearly double Sonnet 4.5's 48%), and confidently claims tasks are complete when they're not. It ignores context from earlier in conversations, struggles with complex multi-step instructions, and dismisses suggestions like a grumpy coder who thinks they know best. While it excels at one-shot code generation, it falls short as a collaborative partner for serious software development. Gemini 3 is genuinely one of the best models available (probably second place behind Sonnet 4.5) but it's not the massive leap forward that the hype suggests, and the gap between Claude Code and Gemini CLI remains significant.

Gemini3 #AIcoding #SoftwareEngineering

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/gemini-3-is-fast-but-gaslights-you-at-128-tokens-second 🔗 Gemini 3: https://deepmind.google/models/gemini

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Gamini 3 with Gamini CLI 00:25 Gemini 3 Real-World Testing 02:54 Gemini 3's Biggest Problems 10:10 Is Gemini 3 Worth It?

via YouTube https://www.youtube.com/watch?v=AUoqr5r1pBY

·youtube.com·
DevOps & AI Toolkit - Gemini 3 Is Fast But Gaslights You at 128 Tokens/Second - https://www.youtube.com/watch?v=AUoqr5r1pBY
How we built a 130,000-node GKE cluster | Google Cloud Blog
How we built a 130,000-node GKE cluster | Google Cloud Blog
Learn about the architectural innovations we used to build a 130,000-node Kubernetes cluster, and the trends driving demand for these environments.
·cloud.google.com·
How we built a 130,000-node GKE cluster | Google Cloud Blog
Last Week in Kubernetes Development - Week Ending November 16 2025
Last Week in Kubernetes Development - Week Ending November 16 2025

Week Ending November 16, 2025

https://lwkd.info/2025/20251120

Developer News

Kubernetes SIG Network and the Security Response Committee have announced the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026.

Release Schedule

Next Deadline: Feature blogs ready for review, November 24th

We are in Code Freeze. Release lead Drew Hagen shared the state of the release.

The Feature blog is a great way to highlight and share information about your enhancement with the community. Feature blogs are especially encouraged for high visibility changes as well as deprecations and removals. The official deadline has passed, but but opt-ins are still welcome. If you are interested in writing a blog for your enhancement, please create placeholder PR and contact your lead ASAP.

Kubernetes v1.35.0-beta.0 and patch releases v1.32.10, v1.31.14 v1.33.6 and v1.34.2 are now live!

KEP of the Week

KEP-5067: Pod Generation

This KEP introduces proper use of metadata.generation and a new status.observedGeneration field to show which PodSpec version the kubelet has actually processed. This helps eliminate uncertainty when multiple updates occur, making Pod status tracking consistent with other Kubernetes resources.

This KEP is tracked for stable in v1.35

Other Merges

Implement opportunistic batching to speed up pod scheduling

Allow constraining impersonation for specific resources

NominatedNodeName has integration tests

DRA device health check timeouts are configurable

Disguish between nil and not present in validation racheting

You can mutate job directives even if they’re suspended

Volume Group Snapshots are now v1beta2 API

Overhaul Device Taint Eviction in DRA

ScheduleAsyncAPICalls has been re-enabled by default after debugging

Device class selection is deterministic

StatefulSets won’t trigger a rollout when upgrading to 1.34

Don’t schedule pods that need storage to a node with no CSI

kuberc gets view and set commands

v1alpha1 structured response for /flagz

Pod statuses stay the same after kubelet restart

Let’s schedule the whole darned gang through the new workload API

DRA: prioritized list scoring and Extended Resource Metrics and extended resource quota

Operators get more tolerations

Mutate persistent volume node affinity

Auto-restart of all containers in a pod when one of them exits

Promotions

KubeletEnsureSecretPulledImages is Beta

Image Volume Source to Beta

PodTopologyLabelsAdmission to Beta

NominatedNodeNameForExpectation and ClearingNominatedNodeNameAfterBinding to Beta

SupplementalGroupsPolicy to GA

JobManagedBy to GA

InPlacePodVerticalScaling tests to Conformance

KubeletCrashLoopBackOffMax to Beta

Pod Certificates to Beta

EnvFiles to Beta

WatchListClient to Beta

Deprecations

Drop networking v1beta1 Ingress from kubectl

AggregatedDiscoveryRemoveBetaType gate removed

Version Updates

go to v1.25.4

CoreDNS to 1.13.1

Subprojects and Dependency Updates

prometheus v3.8.0-rc.0 stabilizes native histograms (now an optional stable feature via scrape_native_histogram), tightens validation for custom-bounds histograms, adds detailed target relabeling views in the UI, improves OTLP target_info de-duplication, expands alerting and promtool support (including Remote-Write 2.0 for promtool push metrics), and delivers multiple PromQL and UI performance fixes for large rule/alert pages.

cloud-provider-aws v1.31.9 bumps the AWS Go SDK to 1.24.7 for CVE coverage, completes migration to AWS SDK v2 for EC2, ELB and ELBV2, adds support for a new AWS partition in the credential provider, and includes defensive fixes for potential nil pointer dereferences alongside the usual 1.31 release line version bump.

cloud-provider-aws v1.30.10 mirrors the 1.31.9 line with backported updates to AWS SDK Go v2 (EC2 and load balancers), a Go SDK 1.24.7 security bump, support for the new AWS partition in credential provider logic, improved nil-pointer safety, and includes contributions from a new external maintainer.

cloud-provider-aws v1.29.10 provides a straightforward version bump for the 1.29 branch, while cloud-provider-aws v1.29.9 backports key changes including EC2/load balancer migration to AWS SDK Go v2, the Go SDK 1.24.7 CVE update, and new-partition support in the credential provider to keep older clusters aligned with current AWS environments.

cluster-api v1.12.0-beta.1 continues the v1.12 beta with chained-upgrade Runtime SDK improvements, blocking AfterClusterUpgrade hooks for safer rollouts, new features such as taint propagation in Machine APIs, MachineDeployment in-place update support, clusterctl describe condition filters, and a broad set of bugfixes and dependency bumps (including etcd v3.6.6 and Kubernetes v0.34.2 libraries).

cluster-api-provider-vsphere v1.15.0-beta.1 refreshes CAPV against CAPI v1.12.0-beta.1, upgrades Go to 1.24.10 and core Kubernetes/etcd libraries, and focuses on test and tooling improvements such as enhanced e2e network debugging, junit output from e2e runs, and refined CI configuration ahead of the 1.15 release.

kubebuilder v4.10.1 is a fast follow-up bugfix release that retracts the problematic v4.10.0 Go module, fixes nested JSON tag omitempty handling in generated APIs, stabilizes metrics e2e tests with webhooks, and tightens Go module validation to prevent future module install issues while keeping scaffold auto-update guidance intact.

kubebuilder v4.10.0 (now retracted as a Go module) introduced the new helm/v2-alpha plugin to replace helm/v1-alpha, improved multi-arch support and Go/tooling versions (golangci-lint, controller-runtime, cert-manager), added external plugin enhancements (PluginChain, ProjectConfig access), support for custom webhook paths, and a series of CLI and scaffolding fixes including better handling of directories with spaces.

cluster-api-provider-vsphere v1.15.0-beta.0 introduces the next beta version of CAPV for testing upcoming Cluster API v1.15 functionality on vSphere. This release is intended only for testing and feedback.

vsphere-csi-driver v3.6.0 adds compatibility with Kubernetes v1.34 and brings improvements such as shared session support on vCenter login and enhanced task monitoring. Updated manifests for this release are available under the versioned manifests/vanilla directory.

kustomize kyaml v0.21.0 updates structured data replacement capabilities, upgrades Go to 1.24.6, refreshes dependencies following security alerts, and includes minor YAML handling fixes.

kustomize v5.8.0 enhances YAML/JSON replacement features, fixes namespace propagation for Helm integrations, and adds improvements such as regex support for replacements, new patch argument types, validation fixes, improved error messages, and performance optimizations.

kustomize cmd/config v0.21.0 aligns with kyaml updates, adopts Go 1.24.6, and brings dependency updates based on recent security advisories.

kustomize api v0.21.0 includes structured-data replacement enhancements, regex selector support, patch argument additions, namespace propagation fixes, validation improvements, Go 1.24.6 updates, and dependency refreshes.

etcd v3.6.6 provides a new patch update for the v3.6 series with all changes documented in the linked changelog. Installation steps and supported platform updates are also included.

etcd v3.5.25 delivers maintenance updates for the v3.5 series along with relevant upgrade guidance and support documentation.

etcd v3.4.39 introduces the newest patches for the v3.4 branch with installation instructions and detailed platform support notes.

cri-o v1.34.2 improves GRPC debug log formatting and ships updated, signed release bundles and SPDX SBOMs for all supported architectures.

cri-o v1.33.6 publishes refreshed signed artifacts and SPDX documents for the 1.33 line, with no dependency changes recorded.

cri-o v1.32.10 updates the 1.32 branch with new signed release artifacts and SBOM files, without dependency modifications.

nerdctl v2.2.0 fixes a namestore path issue, adds mount-manager support, introduces checkpoint lifecycle commands, and enhances image conversion through a new estargz helper flag. The full bundle includes updated containerd, runc, BuildKit, and Stargz Snapshotter.

Shoutouts

Danilo Gemoli: Shoutout to @Petr Muller who is trying to gather new contributors in #prow. He arranged a meeting in which we had the possibility to bring on the table several interesting idea on how to ease the entry barriers for newcomers

via Last Week in Kubernetes Development https://lwkd.info/

November 20, 2025 at 07:59AM

·lwkd.info·
Last Week in Kubernetes Development - Week Ending November 16 2025
Skyway: Cloud cost management for the 9-figure club
Skyway: Cloud cost management for the 9-figure club
Introducing Skyway: contract management for enterprise cloud spend. Built by the team overseeing tens-of-billions in enterprise cloud spend.
·duckbillhq.com·
Skyway: Cloud cost management for the 9-figure club
Cloudflare outage on November 18, 2025
Cloudflare outage on November 18, 2025
Cloudflare suffered a service outage on November 18, 2025. The outage was triggered by a bug in generation logic for a Bot Management feature file causing many Cloudflare services to be affected.
·blog.cloudflare.com·
Cloudflare outage on November 18, 2025
Kubernetes Cluster Goes Mobile In Pet Carrier
Kubernetes Cluster Goes Mobile In Pet Carrier
There’s been a bit of a virtualization revolution going on for the last decade or so, where tools like Docker and LXC have made it possible to quickly deploy server applications without worry…
·hackaday.com·
Kubernetes Cluster Goes Mobile In Pet Carrier
Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog
Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog
Today, we are unveiling the next Fairwater site of Azure AI datacenters in Atlanta, Georgia. This purpose-built datacenter is connected to our first Fairwater site in Wisconsin, prior generations of AI supercomputers and the broader Azure global datacenter footprint to create the world’s first planet-scale AI superfactory. By packing computing power more densely than ever...
·blogs.microsoft.com·
Infinite scale: The architecture behind the Azure AI superfactory - The Official Microsoft Blog
Pouring packages with Homebrew
Pouring packages with Homebrew
The Homebrew project is an open-source package-management system that comes with a repository o [...]
·lwn.net·
Pouring packages with Homebrew
How Kubernetes Became the New Linux
How Kubernetes Became the New Linux
Learn why AWS is building core features instead of competing products in this episode of The New Stack Makers.
·thenewstack.io·
How Kubernetes Became the New Linux
Agent Sandbox provides a secure and isolated execution layer to safely deploy autonomous AI agents on Kubernetes that generate and run untrusted code at scale.
Agent Sandbox provides a secure and isolated execution layer to safely deploy autonomous AI agents on Kubernetes that generate and run untrusted code at scale.
Agent Sandbox is a cloud native controller for sandboxes
Agent Sandbox provides a secure, and isolated execution layer to safely deploy autonomous AI agents on Kubernetes that generate and run untrusted code at scale.
·agent-sandbox.sigs.k8s.io·
Agent Sandbox provides a secure and isolated execution layer to safely deploy autonomous AI agents on Kubernetes that generate and run untrusted code at scale.
DevOps & AI Toolkit - Ep39 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=tafANChjv3g
DevOps & AI Toolkit - Ep39 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=tafANChjv3g

Ep39 - Ask Me Anything About Anything with Scott Rosenberg

There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

via YouTube https://www.youtube.com/watch?v=tafANChjv3g

·youtube.com·
DevOps & AI Toolkit - Ep39 - Ask Me Anything About Anything with Scott Rosenberg - https://www.youtube.com/watch?v=tafANChjv3g
The Karpenter Effect: Redefining Kubernetes Operations with Tanat Lokejaroenlarb
The Karpenter Effect: Redefining Kubernetes Operations with Tanat Lokejaroenlarb

The Karpenter Effect: Redefining Kubernetes Operations, with Tanat Lokejaroenlarb

https://ku.bz/T6hDSWYhb

Tanat Lokejaroenlarb shares the complete journey of replacing EKS Managed Node Groups and Cluster Autoscaler with AWS Karpenter. He explains how this migration transformed their Kubernetes operations, from eliminating brittle upgrade processes to achieving significant cost savings of €30,000 per month through automated instance selection and AMD adoption.

You will learn:

How to decouple control plane and data plane upgrades using Karpenter's asynchronous node rollout capabilities

Cost optimization strategies including flexible instance selection, automated AMD migration, and the trade-offs between cheapest-first selection versus performance considerations

Scaling and performance tuning techniques such as implementing over-provisioning with low-priority placeholder pods

Policy automation and operational practices using Kyverno for user experience simplification, implementing proper Pod Disruption Budgets

Sponsor

This episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimization

More info

Find all the links and info for this episode here: https://ku.bz/T6hDSWYhb

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

November 18, 2025 at 05:00AM

·kube.fm·
The Karpenter Effect: Redefining Kubernetes Operations with Tanat Lokejaroenlarb
DevOps & AI Toolkit - AI vs Manual: Kubernetes Troubleshooting Showdown 2025 - https://www.youtube.com/watch?v=UbPyEelCh-I
DevOps & AI Toolkit - AI vs Manual: Kubernetes Troubleshooting Showdown 2025 - https://www.youtube.com/watch?v=UbPyEelCh-I

AI vs Manual: Kubernetes Troubleshooting Showdown 2025

Tired of waking up at 3 AM to troubleshoot Kubernetes issues? This video shows you how to automate the entire incident response process using AI-powered remediation. We walk through the traditional manual troubleshooting workflow—detecting issues through kubectl events, analyzing pods and their controllers, identifying root causes, and validating fixes—then demonstrate how AI agents can handle all four phases automatically. Using the open-source DevOps AI Toolkit with the Model Context Protocol (MCP) and a custom Kubernetes controller, you'll see how AI can detect failing pods, analyze the root cause (like a missing PersistentVolumeClaim), suggest remediation, and validate that the fix worked, all while you stay in bed.

The video breaks down the complete architecture, showing how a Kubernetes controller monitors events defined in RemediationPolicy resources, triggers the MCP server for analysis, and either automatically applies fixes or sends Slack notifications for manual approval based on confidence thresholds and risk levels. You'll learn how the MCP agent loops with an LLM using read-only tools to gather data and analyze issues, while keeping write operations isolated and requiring explicit approval. Whether you want fully automated remediation for low-risk issues or human-in-the-loop approval for everything, this approach gives you intelligent troubleshooting that scales beyond what you can predict and prepare for manually.

▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: JFrog Fly 🔗 https://jfrog.com/fly_viktor ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬

Kubernetes #AIAutomation #DevOps

Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join

▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/ai/ai-vs-manual-kubernetes-troubleshooting-showdown-2025 🔗 DevOps AI Toolkit: https://github.com/vfarcic/dot-ai

▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).

▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/

▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox

▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 Kubernetes Analysis and Remediation with AI 01:15 JFrog Fly (sponsor) 02:46 Kubernetes Troubleshooting Manual Process 11:37 AI-Powered Kubernetes Remediation 14:38 MCP Architecture and Controller Design 20:49 Key Takeaways and Next Steps

via YouTube https://www.youtube.com/watch?v=UbPyEelCh-I

·youtube.com·
DevOps & AI Toolkit - AI vs Manual: Kubernetes Troubleshooting Showdown 2025 - https://www.youtube.com/watch?v=UbPyEelCh-I
Ingress NGINX Retirement: What You Need to Know
Ingress NGINX Retirement: What You Need to Know

Ingress NGINX Retirement: What You Need to Know

https://kubernetes.io/blog/2025/11/11/ingress-nginx-retirement/

To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.

We recommend migrating to one of the many alternatives. Consider migrating to Gateway API, the modern replacement for Ingress. If you must continue using Ingress, many alternative Ingress controllers are listed in the Kubernetes documentation. Continue reading for further information about the history and current state of Ingress NGINX, as well as next steps.

About Ingress NGINX

Ingress is the original user-friendly way to direct network traffic to workloads running on Kubernetes. (Gateway API is a newer way to achieve many of the same goals.) In order for an Ingress to work in your cluster, there must be an Ingress controller running. There are many Ingress controller choices available, which serve the needs of different users and use cases. Some are cloud-provider specific, while others have more general applicability.

Ingress NGINX was an Ingress controller, developed early in the history of the Kubernetes project as an example implementation of the API. It became very popular due to its tremendous flexibility, breadth of features, and independence from any particular cloud or infrastructure provider. Since those days, many other Ingress controllers have been created within the Kubernetes project by community groups, and by cloud native vendors. Ingress NGINX has continued to be one of the most popular, deployed as part of many hosted Kubernetes platforms and within innumerable independent users’ clusters.

History and Challenges

The breadth and flexibility of Ingress NGINX has caused maintenance challenges. Changing expectations about cloud native software have also added complications. What were once considered helpful options have sometimes come to be considered serious security flaws, such as the ability to add arbitrary NGINX configuration directives via the "snippets" annotations. Yesterday’s flexibility has become today’s insurmountable technical debt.

Despite the project’s popularity among users, Ingress NGINX has always struggled with insufficient or barely-sufficient maintainership. For years, the project has had only one or two people doing development work, on their own time, after work hours and on weekends. Last year, the Ingress NGINX maintainers announced their plans to wind down Ingress NGINX and develop a replacement controller together with the Gateway API community. Unfortunately, even that announcement failed to generate additional interest in helping maintain Ingress NGINX or develop InGate to replace it. (InGate development never progressed far enough to create a mature replacement; it will also be retired.)

Current State and Next Steps

Currently, Ingress NGINX is receiving best-effort maintenance. SIG Network and the Security Response Committee have exhausted our efforts to find additional support to make Ingress NGINX sustainable. To prioritize user safety, we must retire the project.

In March 2026, Ingress NGINX maintenance will be halted, and the project will be retired. After that time, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. The GitHub repositories will be made read-only and left available for reference.

Existing deployments of Ingress NGINX will not be broken. Existing project artifacts such as Helm charts and container images will remain available.

In most cases, you can check whether you use Ingress NGINX by running kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx with cluster administrator permissions.

We would like to thank the Ingress NGINX maintainers for their work in creating and maintaining this project–their dedication remains impressive. This Ingress controller has powered billions of requests in datacenters and homelabs all around the world. In a lot of ways, Kubernetes wouldn’t be where it is without Ingress NGINX, and we are grateful for so many years of incredible effort.

SIG Network and the Security Response Committee recommend that all Ingress NGINX users begin migration to Gateway API or another Ingress controller immediately. Many options are listed in the Kubernetes documentation: Gateway API, Ingress. Additional options may be available from vendors you work with.

via Kubernetes Blog https://kubernetes.io/

November 11, 2025 at 01:30PM

·kubernetes.io·
Ingress NGINX Retirement: What You Need to Know
Blog: Ingress NGINX Retirement: What You Need to Know
Blog: Ingress NGINX Retirement: What You Need to Know

Blog: Ingress NGINX Retirement: What You Need to Know

https://www.kubernetes.dev/blog/2025/11/12/ingress-nginx-retirement/

To prioritize the safety and security of the ecosystem, Kubernetes SIG Network and the Security Response Committee are announcing the upcoming retirement of Ingress NGINX. Best-effort maintenance will continue until March 2026. Afterward, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. Existing deployments of Ingress NGINX will continue to function and installation artifacts will remain available.

We recommend migrating to one of the many alternatives. Consider migrating to Gateway API, the modern replacement for Ingress. If you must continue using Ingress, many alternative Ingress controllers are listed in the Kubernetes documentation. Continue reading for further information about the history and current state of Ingress NGINX, as well as next steps.

About Ingress NGINX

Ingress is the original user-friendly way to direct network traffic to workloads running on Kubernetes. (Gateway API is a newer way to achieve many of the same goals.) In order for an Ingress to work in your cluster, there must be an Ingress controller running. There are many Ingress controller choices available, which serve the needs of different users and use cases. Some are cloud-provider specific, while others have more general applicability.

Ingress NGINX was an Ingress controller, developed early in the history of the Kubernetes project as an example implementation of the API. It became very popular due to its tremendous flexibility, breadth of features, and independence from any particular cloud or infrastructure provider. Since those days, many other Ingress controllers have been created within the Kubernetes project by community groups, and by cloud native vendors. Ingress NGINX has continued to be one of the most popular, deployed as part of many hosted Kubernetes platforms and within innumerable independent users’ clusters.

History and Challenges

The breadth and flexibility of Ingress NGINX has caused maintenance challenges. Changing expectations about cloud native software have also added complications. What were once considered helpful options have sometimes come to be considered serious security flaws, such as the ability to add arbitrary NGINX configuration directives via the “snippets” annotations. Yesterday’s flexibility has become today’s insurmountable technical debt.

Despite the project’s popularity among users, Ingress NGINX has always struggled with insufficient or barely-sufficient maintainership. For years, the project has had only one or two people doing development work, on their own time, after work hours and on weekends. Last year, the Ingress NGINX maintainers announced their plans to wind down Ingress NGINX and develop a replacement controller together with the Gateway API community. Unfortunately, even that announcement failed to generate additional interest in helping maintain Ingress NGINX or develop InGate to replace it. (InGate development never progressed far enough to create a mature replacement; it will also be retired.)

Current State and Next Steps

Currently, Ingress NGINX is receiving best-effort maintenance. SIG Network and the Security Response Committee have exhausted our efforts to find additional support to make Ingress NGINX sustainable. To prioritize user safety, we must retire the project.

In March 2026, Ingress NGINX maintenance will be halted, and the project will be retired. After that time, there will be no further releases, no bugfixes, and no updates to resolve any security vulnerabilities that may be discovered. The GitHub repositories will be made read-only and left available for reference.

Existing deployments of Ingress NGINX will not be broken. Existing project artifacts such as Helm charts and container images will remain available.

In most cases, you can check whether you use Ingress NGINX by running kubectl get pods --all-namespaces --selector app.kubernetes.io/name=ingress-nginx with cluster administrator permissions.

We would like to thank the Ingress NGINX maintainers for their work in creating and maintaining this project–their dedication remains impressive. This Ingress controller has powered billions of requests in datacenters and homelabs all around the world. In a lot of ways, Kubernetes wouldn’t be where it is without Ingress NGINX, and we are grateful for so many years of incredible effort.

SIG Network and the Security Response Committee recommend that all Ingress NGINX users begin migration to Gateway API or another Ingress controller immediately. Many options are listed in the Kubernetes documentation: Gateway API, Ingress. Additional options may be available from vendors you work with.

via Kubernetes Contributors – Contributor Blog https://www.kubernetes.dev/blog/

November 12, 2025 at 12:00PM

·kubernetes.dev·
Blog: Ingress NGINX Retirement: What You Need to Know
Building Kubernetes (a lite version) from scratch in Go with Owumi Festus
Building Kubernetes (a lite version) from scratch in Go with Owumi Festus

Building Kubernetes (a lite version) from scratch in Go, with Owumi Festus

https://ku.bz/pf5kK9lQF

Festus Owumi walks through his project of building a lightweight version of Kubernetes in Go. He removed etcd (replacing it with in-memory storage), skipped containers entirely, dropped authentication, and focused purely on the control plane mechanics. Through this process, he demonstrates how the reconciliation loop, API server concurrency handling, and scheduling logic actually work at their most basic level.

You will learn:

How the reconciliation loop works - The core concept of desired state vs current state that drives all Kubernetes operations

Why the API server is the gateway to etcd - How Kubernetes prevents race conditions using optimistic concurrency control and why centralized validation matters

What the scheduler actually does - Beyond simple round-robin assignment, understanding node affinity, resource requirements, and the complex scoring algorithms that determine pod placement

The complete pod lifecycle - Step-by-step walkthrough from kubectl command to running pod, showing how independent components work together like an orchestra

Sponsor

This episode is sponsored by StormForge by CloudBolt — automatically rightsize your Kubernetes workloads with ML-powered optimization

More info

Find all the links and info for this episode here: https://ku.bz/pf5kK9lQF

Interested in sponsoring an episode? Learn more.

via KubeFM https://kube.fm

November 11, 2025 at 05:00AM

·kube.fm·
Building Kubernetes (a lite version) from scratch in Go with Owumi Festus