1_r/devopsish
Week Ending November 30, 2025
https://lwkd.info/2025/20251203
Developer News
CVE-2025-13281: the in-tree Portworxs CSI driver exposes a security hole in the kube-controller-manager, which was patched for other storage drivers but not for Portworx. Vulnerable users are ones who still haven’t migrated to the external CSI StorageClass.
SIG-Scheduling has published their technical plan for Kubernetes 1.36.
Wei Fu was nominated as SIG-Etcd Tech Lead.
Release Schedule
Next Deadline: Release Highlights Complete, Dec. 9
We are in Code Freeze. Release highlight items need to be finished and fully edited by next week. Also, please be on the alert for any blocking test failures, and get them debugged quickly so we can release on time.
Friday is the cherry-pick deadline for the next set of patch releases.
Other Merges
Allow relaxed Ingress defaultBackend service names with RelaxedServiceNameValidation
Eliminate spurious warning log messages about enabled alpha APIs while starting API server
Prevent spurious namespace-not-found errors in admission
Version Updates
Go to 1.24.10 and distroless iptables for 1.32
Subprojects and Dependency Updates
cri-o v1.34.3 adds support for the external crio-credential-provider plugin, fixes CVE-2025-58183 by updating github.com/vbatts/tar-split to v0.12.2, introduces a new housekeeping option for the irq-load-balancing.crio.io annotation (surfacing housekeeping CPUs via OPENSHIFT_HOUSEKEEPING_CPUS and adjusting IRQ affinity behaviour), and refreshes core dependencies including the Kubernetes 0.34.1 stack and new Podman image/storage libraries.
cri-o v1.33.7 and v1.32.11 are focused patch releases that backport the CVE-2025-58183 tar-split update across the 1.33 and 1.32 lines, with v1.32.11 additionally fixing network cleanup failures when the network namespace path is empty on server teardown.
kops v1.35.0-alpha.1 advances the 1.35 line with etcd 3.5.23/3.5.24 updates, containerd v2.1.5, refreshed CNI plugin sources, AWS Karpenter v1.8.1 plus configurable feature gates, expanded scale and GCE/Azure testing, initial Ubuntu 25.10 support, tighter AWS IAM permissions, and deeper ClusterAPI integration including new toolbox commands and CAPI-oriented nodeup refactors.
cluster-autoscaler 1.34.2, 1.33.3, and 1.32.5 align the 1.34, 1.33 and 1.32 branches with common fixes: more robust proactive scale-up handling for scheduling-gated pods, a SimulateNodeRemoval panic fix for missing node info, Azure LTS test updates and refreshed static SKU lists, CI/lint cleanups, and Kubernetes dependency bumps to v1.34.2, v1.33.6, and v1.32.10 respectively.
cluster-api v1.12.0-rc.1 continues the v1.12 line toward GA with in-place update support for KCP and MachineDeployments, chained multi-minor Kubernetes upgrades for managed topologies, new InPlaceUpdates, MachineTaintPropagation, and ReconcilerRateLimiting feature gates, MachineHealthCheck condition-based health checks, plus a round of bugfixes across webhooks, e2e tests, runtime SDK, and condition handling on top of Go 1.24 and Kubernetes 0.34.x library bumps.
cluster-api-provider-vsphere v1.15.0-rc.0 tracks CAPI v1.12 and Kubernetes v1.35/cloud-provider-vsphere v1.35, introduces a dedicated CAPV ServiceAccount, and adds govmomi flags to tune CPU and memory shares, reservations, and limits, while also updating etcd/Kubernetes dependencies, bumping CPI/autoscaler versions, and hardening tests and CI (including network debug improvements and flake-focused timeouts).
prometheus v3.8.0 is the first release to mark Native Histograms as a stable opt-in feature via the new scrape_native_histogram config knob, updates Remote Write v2 to the 2.0-rc.4 spec, adds unified AWS service discovery (EC2, Lightsail, ECS), introduces OAuth2 JWT-bearer grant support, extends promtool with Remote Write 2.0 pushes, and delivers a broad set of PromQL, TSDB, and UI performance fixes (including faster large alerts/rules pages and improved NHCB handling).
Shoutouts
Petr Mullar – Shoutout for organizing a meeting to support new contributors in Prow, gathering ideas to improve onboarding and reduce entry barriers for newcomers.
via Last Week in Kubernetes Development https://lwkd.info/
December 03, 2025 at 05:00PM
Will Agentic AI Pay Off? Cybersecurity Shifts and EU Cloud Pressure | TSG Ep. 973
https://chrisshort.net/video/techstrong-gang-ep973/
The gang then looks at how AI is about to transform cybersecurity before examining why the European Union is investigating Amazon Web Services and Microsoft.
via Chris Short https://chrisshort.net/
November 24, 2025
Ep41 - Ask Me Anything About Anything with Scott Rosenberg 📱
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=mKvsQW6GBRg
Ep41 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=L4RZXAKGb6M
A Journey Through Kafkian SplitDNS in a Multitenant Kubernetes, with Fabián Sellés Rosa
Fabián Sellés Rosa, Tech Lead of the Runtime team at Adevinta, walks through a real engineering investigation that started with a simple request: allowing tenants to use third-party Kafka services. What seemed straightforward turned into a complex DNS resolution problem that required testing seven different approaches before a working solution was found.
You will learn:
Why Kafka's multi-step DNS resolution creates unique challenges in multi-tenant environments, where bootstrap servers and dynamic broker lists complicate standard DNS approaches
The iterative debugging process from Route 53 split DNS through Kubernetes native pod DNS config, custom DNS servers, Kafka proxies, and CoreDNS solutions
How to implement the final solution using node-local DNS and CoreDNS templating with practical details including ndots configuration and Kyverno automation
Platform engineering evaluation criteria for assessing solutions based on maintainability, self-service capability, and evolvability in multi-tenant environments
Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/NsBZ-FwcJ
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
December 02, 2025 at 07:11AM
Deploy AI Agents and MCPs to Kubernetes: Is kagent and kmcp Worth It?
This video explores kagent and kmcp, two tools that promise to bring AI agents and MCP servers into Kubernetes using cloud-native principles. kagent lets you define AI agents as custom resources with YAML manifests, connect them to MCP servers for tools, and manage them like any other Kubernetes workload. kmcp deploys MCP servers to Kubernetes clusters using simple custom resources. The concept sounds appealing for platform engineers: create agents declaratively, give them specific tools, let them communicate through open protocols like A2A, all running in your existing infrastructure.
However, the reality reveals significant gaps. While kagent successfully deploys agents to Kubernetes and connects them to MCP tools, its web interface is severely lacking compared to modern coding agents like Claude Code or Cursor. Tool execution is unreliable, there's no built-in user confirmation before calling tools, and the choice to expose agents via the A2A protocol instead of MCP limits integration with existing coding tools. kmcp works for deploying MCP servers but offers limited value beyond what standard Kubernetes manifests or Helm charts already provide. The video demonstrates both tools in action—creating agents, connecting to MCP servers, and troubleshooting Kubernetes issues—while honestly examining whether these projects solve real problems or just add unnecessary complexity to workflows that modern coding agents already handle better.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: RavenDB 🔗 Meet the new AI Agent in RavenDB: https://ravendb.net/ai-agent-creator?utm_source=youtube&utm_medium=influencers&utm_campaign=devops_toolkit 🔗 Visit RavenDB's homepage: https://ravendb.net/?utm_source=youtube&utm_medium=influencers&utm_campaign=devops_toolkit ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
KubernetesAI #MCPServers #AIAgents
Consider joining the channel: https://www.youtube.com/c/devopstoolkit/join
▬▬▬▬▬▬ 🔗 Additional Info 🔗 ▬▬▬▬▬▬ ➡ Transcript and commands: https://devopstoolkit.live/kubernetes/deploy-ai-agents-and-mcps-to-k8s-is-kagent-and-kmcp-worth-it 🔗 kagent: https://kagent.dev
▬▬▬▬▬▬ 💰 Sponsorships 💰 ▬▬▬▬▬▬ If you are interested in sponsoring this channel, please visit https://devopstoolkit.live/sponsor for more information. Alternatively, feel free to contact me over Twitter or LinkedIn (see below).
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
▬▬▬▬▬▬ ⏱ Timecodes ⏱ ▬▬▬▬▬▬ 00:00 AI Agent and MCPs in Kubernetes 01:01 RavenDB (sponsor) 02:25 Kubernetes AI Agents with kagent 11:42 Integrating External MCP Servers 16:23 Deploying MCP Servers with kmcp 22:31 Should You Use kagent and kmcp?
via YouTube https://www.youtube.com/watch?v=3jkGJvmUMYE
Kubernetes v1.35 Sneak Peek
https://kubernetes.io/blog/2025/11/26/kubernetes-v1-35-sneak-peek/
As the release of Kubernetes v1.35 approaches, the Kubernetes project continues to evolve. Features may be deprecated, removed, or replaced to improve the project's overall health. This blog post outlines planned changes for the v1.35 release that the release team believes you should be aware of to ensure the continued smooth operation of your Kubernetes cluster(s), and to keep you up to date with the latest developments. The information below is based on the current status of the v1.35 release and is subject to change before the final release date.
Deprecations and removals for Kubernetes v1.35
cgroup v1 support
On Linux nodes, container runtimes typically rely on cgroups (short for "control groups"). Support for using cgroup v2 has been stable in Kubernetes since v1.25, providing an alternative to the original v1 cgroup support. While cgroup v1 provided the initial resource control mechanism, it suffered from well-known inconsistencies and limitations. Adding support for cgroup v2 allowed use of a unified control group hierarchy, improved resource isolation, and served as the foundation for modern features, making legacy cgroup v1 support ready for removal. The removal of cgroup v1 support will only impact cluster administrators running nodes on older Linux distributions that do not support cgroup v2; on those nodes, the kubelet will fail to start. Administrators must migrate their nodes to systems with cgroup v2 enabled. More details on compatibility requirements will be available in a blog post soon after the v1.35 release.
To learn more, read about cgroup v2;
you can also track the switchover work via KEP-5573: Remove cgroup v1 support.
Deprecation of ipvs mode in kube-proxy
Many releases ago, the Kubernetes project implemented an ipvs mode in kube-proxy. It was adopted as a way to provide high-performance service load balancing, with better performance than the existing iptables mode. However, maintaining feature parity between ipvs and other kube-proxy modes became difficult, due to technical complexity and diverging requirements. This created significant technical debt and made the ipvs backend impractical to support alongside newer networking capabilities.
The Kubernetes project intends to deprecate kube-proxy ipvs mode in the v1.35 release, to streamline the kube-proxy codebase. For Linux nodes, the recommended kube-proxy mode is already nftables.
You can find more in KEP-5495: Deprecate ipvs mode in kube-proxy
Kubernetes is deprecating containerd v1.y support
While Kubernetes v1.35 still supports containerd 1.7 and other LTS releases of containerd, as a consequence of automated cgroup driver detection, the Kubernetes SIG Node community has formally agreed upon a final support timeline for containerd v1.X. Kubernetes v1.35 is the last release to offer this support (aligned with containerd 1.7 EOL).
This is a final warning that if you are using containerd 1.X, you must switch to 2.0 or later before upgrading Kubernetes to the next version. You are able to monitor the kubelet_cri_losing_support metric to determine if any nodes in your cluster are using a containerd version that will soon be unsupported.
You can find more in the official blog post or in KEP-4033: Discover cgroup driver from CRI
Featured enhancements of Kubernetes v1.35
The following enhancements are some of those likely to be included in the v1.35 release. This is not a commitment, and the release content is subject to change.
Node declared features
When scheduling Pods, Kubernetes uses node labels, taints, and tolerations to match workload requirements with node capabilities. However, managing feature compatibility becomes challenging during cluster upgrades due to version skew between the control plane and nodes. This can lead to Pods being scheduled on nodes that lack required features, resulting in runtime failures.
The node declared features framework will introduce a standard mechanism for nodes to declare their supported Kubernetes features. With the new alpha feature enabled, a Node reports the features it can support, publishing this information to the control plane through a new .status.declaredFeatures field. Then, the kube-scheduler, admission controllers and third-party components can use these declarations. For example, you can enforce scheduling and API validation constraints, ensuring that Pods run only on compatible nodes.
This approach reduces manual node labeling, improves scheduling accuracy, and prevents incompatible pod placements proactively. It also integrates with the Cluster Autoscaler for informed scale-up decisions. Feature declarations are temporary and tied to Kubernetes feature gates, enabling safe rollout and cleanup.
Targeting alpha in v1.35, node declared features aims to solve version skew scheduling issues by making node capabilities explicit, enhancing reliability and cluster stability in heterogeneous version environments.
To learn more about this before the official documentation is published, you can read KEP-5328.
In-place update of Pod resources
Kubernetes is graduating in-place updates for Pod resources to General Availability (GA). This feature allows users to adjust cpu and memory resources without restarting Pods or Containers. Previously, such modifications required recreating Pods, which could disrupt workloads, particularly for stateful or batch applications. Previous Kubernetes releases already allowed you to change infrastructure resources settings (requests and limits) for existing Pods. This allows for smoother vertical scaling, improves efficiency, and can also simplify solution development.
The Container Runtime Interface (CRI) has also been improved, extending the UpdateContainerResources API for Windows and future runtimes while allowing ContainerStatus to report real-time resource configurations. Together, these changes make scaling in Kubernetes faster, more flexible, and disruption-free. The feature was introduced as alpha in v1.27, graduated to beta in v1.33, and is targeting graduation to stable in v1.35.
You can find more in KEP-1287: In-place Update of Pod Resources
Pod certificates
When running microservices, Pods often require a strong cryptographic identity to authenticate with each other using mutual TLS (mTLS). While Kubernetes provides Service Account tokens, these are designed for authenticating to the API server, not for general-purpose workload identity.
Before this enhancement, operators had to rely on complex, external projects like SPIFFE/SPIRE or cert-manager to provision and rotate certificates for their workloads. But what if you could issue a unique, short-lived certificate to your Pods natively and automatically? KEP-4317 is designed to enable such native workload identity. It opens up various possibilities for securing pod-to-pod communication by allowing the kubelet to request and mount certificates for a Pod via a projected volume.
This provides a built-in mechanism for workload identity, complete with automated certificate rotation, significantly simplifying the setup of service meshes and other zero-trust network policies. This feature was introduced as alpha in v1.34 and is targeting beta in v1.35.
You can find more in KEP-4317: Pod Certificates
Numeric values for taints
Kubernetes is enhancing taints and tolerations by adding numeric comparison operators, such as Gt (Greater Than) and Lt (Less Than).
Previously, tolerations supported only exact (Equal) or existence (Exists) matches, which were not suitable for numeric properties such as reliability SLAs.
With this change, a Pod can use a toleration to "opt-in" to nodes that meet a specific numeric threshold. For example, a Pod can require a Node with an SLA taint value greater than 950 (operator: Gt, value: "950").
This approach is more powerful than Node Affinity because it supports the NoExecute effect, allowing Pods to be automatically evicted if a node's numeric value drops below the tolerated threshold.
You can find more in KEP-5471: Enable SLA-based Scheduling
User namespaces
When running Pods, you can use securityContext to drop privileges, but containers inside the pod often still run as root (UID 0). This simplicity poses a significant challenge, as that container UID 0 maps directly to the host's root user.
Before this enhancement, a container breakout vulnerability could grant an attacker full root access to the node. But what if you could dynamically remap the container's root user to a safe, unprivileged user on the host? KEP-127 specifically allows such native support for Linux User Namespaces. It opens up various possibilities for pod security by isolating container and host user/group IDs. This allows a process to have root privileges (UID 0) within its namespace, while running as a non-privileged, high-numbered UID on the host.
Released as alpha in v1.25 and beta in v1.30, this feature continues to progress through beta maturity, paving the way for truly "rootless" containers that drastically reduce the attack surface for a whole class of security vulnerabilities.
You can find more in KEP-127: User Namespaces
Support for mounting OCI images as volumes
When provisioning a Pod, you often need to bundle data, binaries, or configuration files for your containers. Before this enhancement, people often included that kind of data directly into the main container image, or required a custom init container to download and unpack files into an emptyDir. You can still take either of those approaches, of course.
But what if you could populate a volume directly from a data-only artifact in an OCI registry, just like pulling a container image? Kubernetes v1.31 added support for the image volume type, allowing Pods to pull and unpack OCI container image artifacts into a volume declaratively.
This allows for seamless distribution of data, binaries, or ML mode
Ep40 - Ask Me Anything About Anything with Scott Rosenberg
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=nomAGBszjQo
Ep40 - Ask Me Anything About Anything with Scott Rosenberg 📱
There are no restrictions in this AMA session. You can ask anything about DevOps, AI, Cloud, Kubernetes, Platform Engineering, containers, or anything else. Scott Rosenberg, a regular guest, will be here to help us out.
▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬ Sponsor: Octopus 🔗 Enterprise Support for Argo: https://octopus.com/support/enterprise-argo-support ▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬▬
▬▬▬▬▬▬ 👋 Contact me 👋 ▬▬▬▬▬▬ ➡ BlueSky: https://vfarcic.bsky.social ➡ LinkedIn: https://www.linkedin.com/in/viktorfarcic/
▬▬▬▬▬▬ 🚀 Other Channels 🚀 ▬▬▬▬▬▬ 🎤 Podcast: https://www.devopsparadox.com/ 💬 Live streams: https://www.youtube.com/c/DevOpsParadox
via YouTube https://www.youtube.com/watch?v=0TtOJbMOVbs
More Kubernetes Than I Bargained For, with Amos Wenger
Amos Wenger walks through his production incident where adding a home computer as a Kubernetes node caused TLS certificate renewals to fail. The discussion covers debugging techniques using tools like netshoot and K9s, and explores the unexpected interactions between Kubernetes overlay networks and consumer routers.
You will learn:
How Kubernetes networking assumptions break when mixing cloud VMs with nodes behind consumer routers, and why cert-manager challenges fail in NAT environments
The differences between CNI plugins like Flannel and Calico, particularly how they handle IPv6 translation
Debugging techniques for network issues using tools like netshoot, K9s, and iproute2
Best practices for mixed infrastructure including proper node labeling, taints, and scheduling controls
Sponsor
This episode is sponsored by LearnKube — get started on your Kubernetes journey through comprehensive online, in-person or remote training.
More info
Find all the links and info for this episode here: https://ku.bz/6Ll_7slr9
Interested in sponsoring an episode? Learn more.
via KubeFM https://kube.fm
November 25, 2025 at 05:00AM
Kubernetes Configuration Good Practices
https://kubernetes.io/blog/2025/11/25/configuration-good-practices/
Configuration is one of those things in Kubernetes that seems small until it's not. Configuration is at the heart of every Kubernetes workload. A missing quote, a wrong API version or a misplaced YAML indent can ruin your entire deploy.
This blog brings together tried-and-tested configuration best practices. The small habits that make your Kubernetes setup clean, consistent and easier to manage. Whether you are just starting out or already deploying apps daily, these are the little things that keep your cluster stable and your future self sane.
This blog is inspired by the original Configuration Best Practices page, which has evolved through contributions from many members of the Kubernetes community.
General configuration practices
Use the latest stable API version
Kubernetes evolves fast. Older APIs eventually get deprecated and stop working. So, whenever you are defining resources, make sure you are using the latest stable API version. You can always check with
kubectl api-resources
This simple step saves you from future compatibility issues.
Store configuration in version control
Never apply manifest files directly from your desktop. Always keep them in a version control system like Git, it's your safety net. If something breaks, you can instantly roll back to a previous commit, compare changes or recreate your cluster setup without panic.
Write configs in YAML not JSON
Write your configuration files using YAML rather than JSON. Both work technically, but YAML is just easier for humans. It's cleaner to read and less noisy and widely used in the community.
YAML has some sneaky gotchas with boolean values: Use only true or false. Don't write yes, no, on or off. They might work in one version of YAML but break in another. To be safe, quote anything that looks like a Boolean (for example "yes").
Keep configuration simple and minimal
Avoid setting default values that are already handled by Kubernetes. Minimal manifests are easier to debug, cleaner to review and less likely to break things later.
Group related objects together
If your Deployment, Service and ConfigMap all belong to one app, put them in a single manifest file.
It's easier to track changes and apply them as a unit. See the Guestbook all-in-one.yaml file for an example of this syntax.
You can even apply entire directories with:
kubectl apply -f configs/
One command and boom everything in that folder gets deployed.
Add helpful annotations
Manifest files are not just for machines, they are for humans too. Use annotations to describe why something exists or what it does. A quick one-liner can save hours when debugging later and also allows better collaboration.
The most helpful annotation to set is kubernetes.io/description. It's like using comment, except that it gets copied into the API so that everyone else can see it even after you deploy.
Managing Workloads: Pods, Deployments, and Jobs
A common early mistake in Kubernetes is creating Pods directly. Pods work, but they don't reschedule themselves if something goes wrong.
Naked Pods (Pods not managed by a controller, such as Deployment or a StatefulSet) are fine for testing, but in real setups, they are risky.
Why? Because if the node hosting that Pod dies, the Pod dies with it and Kubernetes won't bring it back automatically.
Use Deployments for apps that should always be running
A Deployment, which both creates a ReplicaSet to ensure that the desired number of Pods is always available, and specifies a strategy to replace Pods (such as RollingUpdate), is almost always preferable to creating Pods directly. You can roll out a new version, and if something breaks, roll back instantly.
Use Jobs for tasks that should finish
A Job is perfect when you need something to run once and then stop like database migration or batch processing task. It will retry if the pods fails and report success when it's done.
Service Configuration and Networking
Services are how your workloads talk to each other inside (and sometimes outside) your cluster. Without them, your pods exist but can't reach anyone. Let's make sure that doesn't happen.
Create Services before workloads that use them
When Kubernetes starts a Pod, it automatically injects environment variables for existing Services. So, if a Pod depends on a Service, create a Service before its corresponding backend workloads (Deployments or StatefulSets), and before any workloads that need to access it.
For example, if a Service named foo exists, all containers will get the following variables in their initial environment:
FOO_SERVICE_HOST=<the host the Service runs on> FOO_SERVICE_PORT=<the port the Service runs on>
DNS based discovery doesn't have this problem, but it's a good habit to follow anyway.
Use DNS for Service discovery
If your cluster has the DNS add-on (most do), every Service automatically gets a DNS entry. That means you can access it by name instead of IP:
curl http://my-service.default.svc.cluster.local
It's one of those features that makes Kubernetes networking feel magical.
Avoid hostPort and hostNetwork unless absolutely necessary
You'll sometimes see these options in manifests:
hostPort: 8080 hostNetwork: true
But here's the thing: They tie your Pods to specific nodes, making them harder to schedule and scale. Because each <hostIP, hostPort, protocol> combination must be unique. If you don't specify the hostIP and protocol explicitly, Kubernetes will use 0.0.0.0 as the default hostIP and TCP as the default protocol. Unless you're debugging or building something like a network plugin, avoid them.
If you just need local access for testing, try kubectl port-forward:
kubectl port-forward deployment/web 8080:80
See Use Port Forwarding to access applications in a cluster to learn more. Or if you really need external access, use a type: NodePort Service. That's the safer, Kubernetes-native way.
Use headless Services for internal discovery
Sometimes, you don't want Kubernetes to load balance traffic. You want to talk directly to each Pod. That's where headless Services come in.
You create one by setting clusterIP: None. Instead of a single IP, DNS gives you a list of all Pods IPs, perfect for apps that manage connections themselves.
Working with labels effectively
Labels are key/value pairs that are attached to objects such as Pods. Labels help you organize, query and group your resources. They don't do anything by themselves, but they make everything else from Services to Deployments work together smoothly.
Use semantics labels
Good labels help you understand what's what, even after months later. Define and use labels that identify semantic attributes of your application or Deployment. For example;
labels: app.kubernetes.io/name: myapp app.kubernetes.io/component: web tier: frontend phase: test
app.kubernetes.io/name : what the app is
tier : which layer it belongs to (frontend/backend)
phase : which stage it's in (test/prod)
You can then use these labels to make powerful selectors. For example:
kubectl get pods -l tier=frontend
This will list all frontend Pods across your cluster, no matter which Deployment they came from. Basically you are not manually listing Pod names; you are just describing what you want. See the guestbook app for examples of this approach.
Use common Kubernetes labels
Kubernetes actually recommends a set of common labels. It's a standardized way to name things across your different workloads or projects. Following this convention makes your manifests cleaner, and it means that tools such as Headlamp, dashboard, or third-party monitoring systems can all automatically understand what's running.
Manipulate labels for debugging
Since controllers (like ReplicaSets or Deployments) use labels to manage Pods, you can remove a label to “detach” a Pod temporarily.
Example:
kubectl label pod mypod app-
The app- part removes the label key app. Once that happens, the controller won’t manage that Pod anymore. It’s like isolating it for inspection, a “quarantine mode” for debugging. To interactively remove or add labels, use kubectl label.
You can then check logs, exec into it and once done, delete it manually. That’s a super underrated trick every Kubernetes engineer should know.
Handy kubectl tips
These small tips make life much easier when you are working with multiple manifest files or clusters.
Apply entire directories
Instead of applying one file at a time, apply the whole folder:
Using server-side apply is also a good practice
kubectl apply -f configs/ --server-side
This command looks for .yaml, .yml and .json files in that folder and applies them all together. It's faster, cleaner and helps keep things grouped by app.
Use label selectors to get or delete resources
You don't always need to type out resource names one by one. Instead, use selectors to act on entire groups at once:
kubectl get pods -l app=myapp kubectl delete pod -l phase=test
It's especially useful in CI/CD pipelines, where you want to clean up test resources dynamically.
Quickly create Deployments and Services
For quick experiments, you don't always need to write a manifest. You can spin up a Deployment right from the CLI:
kubectl create deployment webapp --image=nginx
Then expose it as a Service:
kubectl expose deployment webapp --port=80
This is great when you just want to test something before writing full manifests. Also, see Use a Service to Access an Application in a cluster for an example.
Conclusion
Cleaner configuration leads to calmer cluster administrators. If you stick to a few simple habits: keep configuration simple and minimal, version-control everything, use consistent labels, and avoid relying on naked Pods, you'll save yourself hours of debugging down the road.
The best part? Clean configurations stay readable. Even after months, you or anyone on yo