A recent issue in scalar multiplication makes for a good case study of how unsafe interfaces, undocumented assumptions, and time lead to vulnerabilities.
End-to-end (E2E) testing in Kubernetes is how the project validates
functionality with real clusters. Contributors sooner or later encounter it
when asked to write E2E tests for new features or to help with debugging test
failures. Cluster admins or vendors might run the conformance tests, a subset
of all tests in the E2E test
suite .
The underlying E2E
framework
for writing these E2E tests has been around for a long
time. Functionality was added to it as needed, leading to code that became hard
to maintain and use. The testing commons
WG
started cleaning it up, but dissolved before completely achieving their
goals.
After the migration to Gingko
v2 in Kubernetes 1.25, I
picked up several of the loose ends and started untangling them. This blog post
is a summary of those changes. Some of this content is also found in the
Kubernetes contributor document about writing good E2E
tests
and gets reproduced here to raise awareness that the document has been updated.
Overall architecture
At the moment, the framework is used in-tree for testing against a cluster
(test/e2e ), testing kubeadm (test/e2e_kubeadm ) and kubelet
(test/e2e_node ). The goal is to make the core test/e2e/framework a package
that has no dependencies on internal code and that can be used in different E2E
suites without polluting them with features or options that make no sense for
them. This is currently only a technical goal. There are no plans anymore to
actually move the code into a staging repository.
The framework acts like a normal client of an apiserver and thus doesn’t need
much more than client-go. Since the sub-package
refacoring , additional
sub-packages like test/e2e/framework/pod depend on the framework, not the
other way around. Those other sub-packages therefore can still use internal
code. The import boss configuration enforces these
constraints .
What’s left to clean up is that the framework contains a TestContext with
fields that are used only by some tests or some test suites. The configuration
for test/e2e_node
is the last remaining dependency on internal code. Such settings should get
moved into the different test suites and/or tests. The advantage besides
avoiding such dependencies will be that they will only show up in the command
line of a suite when the option really has an effect.
Debuggability
If your test fails, it should provide as detailed as possible reasons for the
failure in its failure message. The failure message is the string that gets
passed (directly or indirectly) to ginkgo.Fail[f] . That text is what gets
shown in the overview of failed tests for a Prow job and what gets aggregated
by https://go.k8s.io/triage .
A good failure message:
identifies the test failure
has enough details to provide some initial understanding of what went wrong
It’s okay for it to contain information that changes during each test
run. Aggregation simplifies the failure message with regular
expressions
before looking for similar failures.
Helper libraries like Gomega or
testify can be used to
produce informative failure messages. Gomega is a bit easier to use in
combination with Ginkgo.
The E2E framework itself only has one helper function for assertions that is
still recommended. The others are deprecated. Compared to
gomega.Expect(err).NotTo(gomega.HaveOccurred()) ,
framework.ExpectNoError(err) is shorter and produces better failure
messages because it logs the full error and then includes only the shorter
err.Error() in the failure message.
As with any other assertion, it is recommended to include additional context in
cases where the parameters being checked by an assertion helper lack relevant
information:
framework.ExpectNoError(err, "tried creating %d foobars, only created %d", foobarsReqd, foobarsCreated)
Use assertions that match the check in the test. Using Go
code to evaluate some condition and then checking the result often isn’t
informative. For example this check should be avoided:
gomega.Expect(strings.Contains(actualStr, expectedSubStr)).To(gomega.Equal(true))
Comparing a boolean
like this against true or false with gomega.Equal or
framework.ExpectEqual is not useful because dumping the actual and expected
value just distracts from the underlying failure reason.
Better pass the actual values to Gomega, which will automatically include them in the
failure message. Add an annotation that explains what the assertion is about:
gomega.Expect(actualStr).To(gomega.ContainSubstring("xyz"), "checking log output")
This produces the following failure message:
[FAILED] checking log output
Expected
string: hello world
to contain substring
string: xyz
If there is no suitable Gomega assertion, call ginkgo.Failf directly:
import "github.com/onsi/gomega/format"
ok := someCustomCheck(abc)
if !ok {
ginkgo.Failf("check xyz failed for object:\n%s", format.Object(abc))
}
It is good practice to include details like the object that failed some
assertion in the failure message because then a) the information is available
when analyzing a failure that occurred in the CI and b) it only gets logged
when some assertion fails. Always dumping objects via log messages can make the
test output very large and may distract from the relevant information.
Dumping structs with format.Object is recommended. Starting with Kubernetes
1.26, format.Object will pretty-print Kubernetes API objects or structs as
YAML and omit unset
fields , which is more
readable than other alternatives like fmt.Sprintf("%+v") .
import (
"fmt"
"k8s.io/api/core/v1"
"k8s.io/kubernetes/test/utils/format"
)
var pod v1.Pod
fmt.Printf("Printf: %+v\n\n", pod)
fmt.Printf("format.Object:\n%s", format.Object(pod, 1 /* indent one level */))
=
Printf: {TypeMeta:{Kind: APIVersion:} ObjectMeta:{Name: GenerateName: Namespace: SelfLink: UID: ResourceVersion: Generation:0 CreationTimestamp:0001-01-01 00:00:00 +0000 UTC DeletionTimestamp:nil DeletionGracePeriodSeconds:nil Labels:map[] Annotations:map[] OwnerReferences:[] Finalizers:[] ManagedFields:[]} Spec:{Volumes:[] InitContainers:[] Containers:[] EphemeralContainers:[] RestartPolicy: TerminationGracePeriodSeconds:nil ActiveDeadlineSeconds:nil DNSPolicy: NodeSelector:map[] ServiceAccountName: DeprecatedServiceAccount: AutomountServiceAccountToken:nil NodeName: HostNetwork:false HostPID:false HostIPC:false ShareProcessNamespace:nil SecurityContext:nil ImagePullSecrets:[] Hostname: Subdomain: Affinity:nil SchedulerName: Tolerations:[] HostAliases:[] PriorityClassName: Priority:nil DNSConfig:nil ReadinessGates:[] RuntimeClassName:nil EnableServiceLinks:nil PreemptionPolicy:nil Overhead:map[] TopologySpreadConstraints:[] SetHostnameAsFQDN:nil OS:nil HostUsers:nil SchedulingGates:[] ResourceClaims:[]} Status:{Phase: Conditions:[] Message: Reason: NominatedNodeName: HostIP: PodIP: PodIPs:[] StartTime:nil InitContainerStatuses:[] ContainerStatuses:[] QOSClass: EphemeralContainerStatuses:[] Resize:}}
format.Object:
v1.Pod:
metadata:
creationTimestamp: null
spec:
containers: null
status: {}
Recovering from test failures
All tests should ensure that a cluster is restored to the state that it was in
before the test ran. ginkgo.DeferCleanup
is recommended for
this because it can be called similar to defer directly after setting up
something. It is better than defer because Ginkgo will show additional
details about which cleanup code is running and (if possible) handle timeouts
for that code (see next section). It is better than ginkgo.AfterEach because
it is not necessary to define additional variables and because
ginkgo.DeferCleanup executes code in the more useful last-in-first-out order,
i.e. things that get set up first get removed last.
Objects created in the test namespace do not need to be deleted because
deleting the namespace will also delete them. However, if deleting an object
may fail, then explicitly cleaning it up is better because then failures or
timeouts related to it will be more obvious.
In cases where the test may have removed the object, framework.IgnoreNotFound
can be used to ignore the “not found” error:
podClient := f.ClientSet.CoreV1().Pods(f.Namespace.Name)
pod, err := podClient.Create(ctx, testPod, metav1.CreateOptions{})
framework.ExpectNoError(err, "create test pod")
ginkgo.DeferCleanup(framework.IgnoreNotFound(podClient.Delete), pod.Name, metav1.DeleteOptions{})
Interrupting tests
When aborting a manual gingko ./test/e2e invocation with CTRL-C or a signal,
the currently running test(s) should stop immediately. This gets achieved by
accepting a ctx context.Context as first parameter in the Ginkgo callback
function and then passing that context through to all code that might
block. When Ginkgo notices that it needs to shut down, it will cancel that
context and all code trying to use it will immediately return with a context canceled error. Cleanup callbacks get a new context which will time out
eventually to ensure that tests don’t get stuck. For a detailed description,
see https://onsi.github.io/ginkgo/#interrupting-aborting-and-timing-out-suites .
Most of the E2E tests were update to use the Ginkgo
context at the start of
the 1.27 development cycle.
There are some gotchas:
Don’t use the ctx passed into ginkgo.It in a ginkgo.DeferCleanup
callback because the context will be canceled when the cleanup code
runs. This is wrong:
ginkgo.It("something", func(ctx context.Context) {
...
ginkgo.DeferCleanup(func() {
// do something with ctx
})
})
Instead, register a function which accepts a new context:
ginkgo.DeferCleanup(func(ctx context.Context) {
// do something with the new ctx
})
Anonymous functions can be avoided by passing some existing function and its
parameters directly to ginkgo.DeferCleanup . Again, beware to not pass the
wrong ctx . This is wrong:
ginkgo.It("something", func(ctx context.Context) {
...
ginkgo.DeferCleanup(myDeleteFunc, ctx, objName)
})...
Authors : Kubernetes v1.27 Release Team
Announcing the release of Kubernetes v1.27, the first release of 2023!
This release consist of 60 enhancements. 18 of those enhancements are entering Alpha, 29 are graduating to Beta, and 13 are graduating to Stable.
Release theme and logo
Kubernetes v1.27: Chill Vibes
The theme for Kubernetes v1.27 is Chill Vibes .
It's a little silly, but there were some important shifts in this release that helped inspire the theme. Throughout a typical Kubernetes release cycle, there are several deadlines that features need to meet to remain included. If a feature misses any of these deadlines, there is an exception process they can go through. Handling these exceptions is a very normal part of the release. But v1.27 is the first release that anyone can remember where we didn't receive a single exception request after the enhancements freeze. Even as the release progressed, things remained much calmer than any of us are used to.
There's a specific reason we were able to enjoy a more calm release this time around, and that's all the work that folks put in behind the scenes to improve how we manage the release. That's what this theme celebrates, people putting in the work to make things better for the community.
Special thanks to Britnee Laverack for creating the logo. Britnee also designed the logo for Kubernetes 1.24: Stargazer .
What's New (Major Themes)
Freeze k8s.gcr.io image registry
Replacing the old image registry, k8s.gcr.io with registry.k8s.io which has been generally available for several months. The Kubernetes project created and runs the registry.k8s.io image registry, which is fully controlled by the community.
This means that the old registry k8s.gcr.io will be frozen and no further images for Kubernetes and related sub-projects will be published to the old registry.
What does this change mean for contributors?
If you are a maintainer of a sub-project, you will need to update your manifests and Helm charts to use the new registry. For more information, checkout this project .
What does this change mean for end users?
Kubernetes v1.27 release will not be published to the k8s.gcr.io registry.
Patch releases for v1.24 , v1.25 , and v1.26 will no longer be published to the old registry after April.
Starting in v1.25, the default image registry has been set to registry.k8s.io . This value is overridable in kubeadm and kubelet but setting it to k8s.gcr.io will fail for new releases after April as they won’t be present in the old registry.
If you want to increase the reliability of your cluster and remove dependency on the community-owned registry or you are running Kubernetes in networks where external traffic is restricted, you should consider hosting local image registry mirrors. Some cloud vendors may offer hosted solutions for this.
SeccompDefault graduates to stable
To use seccomp profile defaulting, you must run the kubelet with the --seccomp-default command line flag enabled for each node where you want to use it.
If enabled, the kubelet will use the RuntimeDefault seccomp profile by default, which is defined by the container runtime, instead of using the Unconfined (seccomp disabled) mode. The default profiles aim to provide a strong set of security defaults while preserving the functionality of the workload. It is possible that the default profiles differ between container runtimes and their release versions.
You can find detailed information about a possible upgrade and downgrade strategy in the related Kubernetes Enhancement Proposal (KEP): Enable seccomp by default .
Mutable scheduling directives for Jobs graduates to GA
This was introduced in v1.22 and started as a beta level, now it's stable. In most cases a parallel job will want the pods to run with constraints, like all in the same zone, or all either on GPU model x or y but not a mix of both. The suspend field is the first step towards achieving those semantics. suspend allows a custom queue controller to decide when a job should start. However, once a job is unsuspended, a custom queue controller has no influence on where the pods of a job will actually land.
This feature allows updating a Job's scheduling directives before it starts, which gives custom queue controllers
the ability to influence pod placement while at the same time offloading actual pod-to-node assignment to
kube-scheduler. This is allowed only for suspended Jobs that have never been unsuspended before.
The fields in a Job's pod template that can be updated are node affinity, node selector, tolerations, labels
,annotations, and scheduling gates .
Find more details in the KEP:
Allow updating scheduling directives of jobs .
DownwardAPIHugePages graduates to stable
In Kubernetes v1.20, support for requests.hugepages-pagesize and limits.hugepages-pagesize was added
to the downward API to be consistent with other resources like cpu, memory, and ephemeral storage.
This feature graduates to stable in this release. You can find more details in the KEP:
Downward API HugePages .
Pod Scheduling Readiness goes to beta
Upon creation, Pods are ready for scheduling. Kubernetes scheduler does its due diligence to find nodes to place all pending Pods. However, in a real-world case, some Pods may stay in a missing-essential-resources state for a long period. These Pods actually churn the scheduler (and downstream integrators like Cluster Autoscaler) in an unnecessary manner.
By specifying/removing a Pod's .spec.schedulingGates , you can control when a Pod is ready to be considered for scheduling.
The schedulingGates field contains a list of strings, and each string literal is perceived as a criteria that must be satisfied before a Pod is considered schedulable. This field can be initialized only when a Pod is created (either by the client, or mutated during admission). After creation, each schedulingGate can be removed in an arbitrary order, but addition of a new scheduling gate is disallowed.
Node log access via Kubernetes API
This feature helps cluster administrators debug issues with services running on nodes by allowing them to query service logs. To use this feature, ensure that the NodeLogQuery feature gate is enabled on that node, and that the kubelet configuration options enableSystemLogHandler and enableSystemLogQuery are both set to true.
On Linux, we assume that service logs are available via journald. On Windows, we assume that service logs are available in the application log provider. You can also fetch logs from the /var/log/ and C:\var\log directories on Linux and Windows, respectively.
A cluster administrator can try out this alpha feature across all nodes of their cluster, or on a subset of them.
ReadWriteOncePod PersistentVolume access mode goes to beta
Kuberentes v1.22 introduced a new access mode ReadWriteOncePod for PersistentVolumes (PVs) and PersistentVolumeClaims (PVCs). This access mode enables you to restrict volume access to a single pod in the cluster, ensuring that only one pod can write to the volume at a time. This can be particularly useful for stateful workloads that require single-writer access to storage.
The ReadWriteOncePod beta adds support for scheduler preemption
of pods that use ReadWriteOncePod PVCs.
Scheduler preemption allows higher-priority pods to preempt lower-priority pods. For example when a pod (A) with a ReadWriteOncePod PVC is scheduled, if another pod (B) is found using the same PVC and pod (A) has higher priority, the scheduler will return an Unschedulable status and attempt to preempt pod (B).
For more context, see the KEP: ReadWriteOncePod PersistentVolume AccessMode .
Respect PodTopologySpread after rolling upgrades
matchLabelKeys is a list of pod label keys used to select the pods over which spreading will be calculated. The keys are used to lookup values from the pod labels. Those key-value labels are ANDed with labelSelector to select the group of existing pods over which spreading will be calculated for the incoming pod. Keys that don't exist in the pod labels will be ignored. A null or empty list means only match against the labelSelector .
With matchLabelKeys , users don't need to update the pod.spec between different revisions. The controller/operator just needs to set different values to the same label key for different revisions. The scheduler will assume the values automatically based on matchLabelKeys . For example, if users use Deployment, they can use the label keyed with pod-template-hash , which is added automatically by the Deployment controller, to distinguish between different revisions in a single Deployment.
Faster SELinux volume relabeling using mounts
In this release, how SELinux labels are applied to volumes used by Pods is graduating to beta. This feature speeds up container startup by mounting volumes with the correct SELinux label instead of changing each file on the volumes recursively. Linux kernel with SELinux support allows the first mount of a volume to set SELinux label on the whole volume using -o context= mount option. This way, all files will have assigned the given label in a constant time, without recursively walking through the whole volumes.
The context mount option cannot be applied to bind mounts or re-mounts of already mounted volumes.
For CSI storage, a CSI driver does the first mount of a volume, and so it must be the CSI driver that actually
applies this mount option. We added a new field SELinuxMount to CSIDriver objects, so that drivers can
announce whether they support the -o context mount option.
If Kubernetes knows the SELinux label of a Pod and the CSI driver responsible for a pod's volume
announces SELinuxMount: true and the volume has Access Mode ReadWriteOncePod , then it
will ask the CSI driver to mount the volume with mount option context= and it will tell the container
runtime not to relabel content of the volume (because all files already have the right label).
Get more inform...
Court filing: Twitter Inc. was merged into X Corp. and no longer exists, and is part of parent firm X Holdings Corp.; both entities are registered in Nevada
According to a new court filing, it’s been merged into a new entity called X Corp. Here’s what that could mean.
The extended BPF (eBPF) virtual machine allows programs to be loaded into and executed with the kernel — and, increasingly, other environments. As the use of BPF grows, so does interest in defining what the BPF virtual machine actually is. In an effort to ensure a consistent and fair environment for defining what constitutes the official BPF language and run-time environment, and to encourage NVMe vendors to support BPF offloading, a recent effort has been undertaken to standardize BPF.
It doesn’t hurt; say what you want about 12 Factor Apps but, I could see the concept being helpful for new folks. Also, Kubernetes can have stateful workloads with external volumes. | Learn 12 Factor Apps Before Kubernetes
These best practices provide a framework for building scalable, portable, maintainable and resilient containerized applications.
Burnout is the result of chronic stress and, at work, that stress tends accumulate around your experiences of workload, values, reward, control, fairness, and community. If any are lacking or out of sync, you may be headed toward exhaustion, cynicism, and the feeling of being ineffective. When taken regularly, this short assessment can help you gauge whether you’re on the path to burnout, and where you should focus your attention to make beneficial changes.