Fresh Swap Features for Linux Users in Kubernetes 1.32
https://kubernetes.io/blog/2025/03/25/swap-linux-improvements/
Swap is a fundamental and an invaluable Linux feature.
It offers numerous benefits, such as effectively increasing a node’s memory by
swapping out unused data,
shielding nodes from system-level memory spikes,
preventing Pods from crashing when they hit their memory limits,
and much more.
As a result, the node special interest group within the Kubernetes project
has invested significant effort into supporting swap on Linux nodes.
The 1.22 release introduced Alpha support
for configuring swap memory usage for Kubernetes workloads running on Linux on a per-node basis.
Later, in release 1.28, support for swap on Linux nodes has graduated to Beta, along with many
new improvements.
In the following Kubernetes releases more improvements were made, paving the way
to GA in the near future.
Prior to version 1.22, Kubernetes did not provide support for swap memory on Linux systems.
This was due to the inherent difficulty in guaranteeing and accounting for pod memory utilization
when swap memory was involved. As a result, swap support was deemed out of scope in the initial
design of Kubernetes, and the default behavior of a kubelet was to fail to start if swap memory
was detected on a node.
In version 1.22, the swap feature for Linux was initially introduced in its Alpha stage.
This provided Linux users the opportunity to experiment with the swap feature for the first time.
However, as an Alpha version, it was not fully developed and only partially worked on limited environments.
In version 1.28 swap support on Linux nodes was promoted to Beta.
The Beta version was a drastic leap forward.
Not only did it fix a large amount of bugs and made swap work in a stable way,
but it also brought cgroup v2 support, introduced a wide variety of tests
which include complex scenarios such as node-level pressure, and more.
It also brought many exciting new capabilities such as the LimitedSwap behavior
which sets an auto-calculated swap limit to containers, OpenMetrics instrumentation
support (through the /metrics/resource endpoint) and Summary API for
VerticalPodAutoscalers (through the /stats/summary endpoint), and more.
Today we are working on more improvements, paving the way for GA.
Currently, the focus is especially towards ensuring node stability,
enhanced debug abilities, addressing user feedback,
polishing the feature and making it stable.
For example, in order to increase stability, containers in high-priority pods
cannot access swap which ensures the memory they need is ready to use.
In addition, the UnlimitedSwap behavior was removed since it might compromise
the node's health.
Secret content protection against swapping has also been introduced
(see relevant security-risk section for more info).
To conclude, compared to previous releases, the kubelet's support for running with swap enabled
is more stable and robust, more user-friendly, and addresses many known shortcomings.
That said, the NodeSwap feature introduces basic swap support, and this is just the beginning.
In the near future, additional features are planned to enhance swap functionality in various ways,
such as improving evictions, extending the API, increasing customizability, and more!
How do I use it?
In order for the kubelet to initialize on a swap-enabled node, the failSwapOn
field must be set to false on kubelet's configuration setting, or the deprecated
--fail-swap-on command line flag must be deactivated.
It is possible to configure the memorySwap.swapBehavior option to define the
manner in which a node utilizes swap memory.
For instance,
this fragment goes into the kubelet's configuration file
memorySwap:
swapBehavior: LimitedSwap
The currently available configuration options for swapBehavior are:
NoSwap (default): Kubernetes workloads cannot use swap. However, processes
outside of Kubernetes' scope, like system daemons (such as kubelet itself!) can utilize swap.
This behavior is beneficial for protecting the node from system-level memory spikes,
but it does not safeguard the workloads themselves from such spikes.
LimitedSwap: Kubernetes workloads can utilize swap memory, but with certain limitations.
The amount of swap available to a Pod is determined automatically,
based on the proportion of the memory requested relative to the node's total memory.
Only non-high-priority Pods under the Burstable
Quality of Service (QoS) tier are permitted to use swap.
For more details, see the section below.
If configuration for memorySwap is not specified,
by default the kubelet will apply the same behaviour as the NoSwap setting.
On Linux nodes, Kubernetes only supports running with swap enabled for hosts that use cgroup v2.
On cgroup v1 systems, all Kubernetes workloads are not allowed to use swap memory.
Install a swap-enabled cluster with kubeadm
Before you begin
It is required for this demo that the kubeadm tool be installed, following the steps outlined in the
kubeadm installation guide.
If swap is already enabled on the node, cluster creation may proceed.
If swap is not enabled, please refer to the provided instructions for enabling swap.
Create a swap file and turn swap on
I'll demonstrate creating 4GiB of swap, both in the encrypted and unencrypted case.
Setting up unencrypted swap
An unencrypted swap file can be set up as follows.
Allocate storage and restrict access
fallocate --length 4GiB /swapfile
chmod 600 /swapfile
mkswap /swapfile
Activate the swap space for paging
swapon /swapfile
Setting up encrypted swap
An encrypted swap file can be set up as follows.
Bear in mind that this example uses the cryptsetup binary (which is available
on most Linux distributions).
Allocate storage and restrict access
fallocate --length 4GiB /swapfile
chmod 600 /swapfile
Create an encrypted device backed by the allocated storage
cryptsetup --type plain --cipher aes-xts-plain64 --key-size 256 -d /dev/urandom open /swapfile cryptswap
mkswap /dev/mapper/cryptswap
Activate the swap space for paging
swapon /dev/mapper/cryptswap
Verify that swap is enabled
Swap can be verified to be enabled with both swapon -s command or the free command
swapon -s
Filename Type Size Used Priority
/dev/dm-0 partition 4194300 0 -2
free -h
total used free shared buff/cache available
Mem: 3.8Gi 1.3Gi 249Mi 25Mi 2.5Gi 2.5Gi
Swap: 4.0Gi 0B 4.0Gi
Enable swap on boot
After setting up swap, to start the swap file at boot time,
you either set up a systemd unit to activate (encrypted) swap, or you
add a line similar to /swapfile swap swap defaults 0 0 into /etc/fstab.
Set up a Kubernetes cluster that uses swap-enabled nodes
To make things clearer, here is an example kubeadm configuration file kubeadm-config.yaml for the swap enabled cluster.
---
apiVersion: "kubeadm.k8s.io/v1beta3"
kind: InitConfiguration
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
failSwapOn: false
memorySwap:
swapBehavior: LimitedSwap
Then create a single-node cluster using kubeadm init --config kubeadm-config.yaml.
During init, there is a warning that swap is enabled on the node and in case the kubelet
failSwapOn is set to true. We plan to remove this warning in a future release.
How is the swap limit being determined with LimitedSwap?
The configuration of swap memory, including its limitations, presents a significant
challenge. Not only is it prone to misconfiguration, but as a system-level property, any
misconfiguration could potentially compromise the entire node rather than just a specific
workload. To mitigate this risk and ensure the health of the node, we have implemented
Swap with automatic configuration of limitations.
With LimitedSwap, Pods that do not fall under the Burstable QoS classification (i.e.
BestEffort/Guaranteed QoS Pods) are prohibited from utilizing swap memory.
BestEffort QoS Pods exhibit unpredictable memory consumption patterns and lack
information regarding their memory usage, making it difficult to determine a safe
allocation of swap memory.
Conversely, Guaranteed QoS Pods are typically employed for applications that rely on the
precise allocation of resources specified by the workload, with memory being immediately available.
To maintain the aforementioned security and node health guarantees,
these Pods are not permitted to use swap memory when LimitedSwap is in effect.
In addition, high-priority pods are not permitted to use swap in order to ensure the memory
they consume always residents on disk, hence ready to use.
Prior to detailing the calculation of the swap limit, it is necessary to define the following terms:
nodeTotalMemory: The total amount of physical memory available on the node.
totalPodsSwapAvailable: The total amount of swap memory on the node that is available for use by Pods (some swap memory may be reserved for system use).
containerMemoryRequest: The container's memory request.
Swap limitation is configured as:
(containerMemoryRequest / nodeTotalMemory) × totalPodsSwapAvailable
In other words, the amount of swap that a container is able to use is proportionate to its
memory request, the node's total physical memory and the total amount of swap memory on
the node that is available for use by Pods.
It is important to note that, for containers within Burstable QoS Pods, it is possible to
opt-out of swap usage by specifying memory requests that are equal to memory limits.
Containers configured in this manner will not have access to swap memory.
How does it work?
There are a number of possible ways that one could envision swap use on a node.
When swap is already provisioned and available on a node,
the kubelet is able to be configured so that:
It can start with swap on.
It will direct the Container Runtime Interface to allocate zero swap memory
to Kubernetes workloads by default.
Swap configuration on a node is exposed to a clust