Suggested Reads

54946 bookmarks

Newest

Critical Ping Vulnerability Allows Remote Attackers to Take Over FreeBSD Systems

A new vulnerability [CVE-2022-23093] has been reported in the ping module of the FreeBSD operating system that could be exploited to remotely crash t

1_r/devopsish

·thehackernews.com·Dec 5, 2022

Critical Ping Vulnerability Allows Remote Attackers to Take Over FreeBSD Systems

Elon Musk’s promised Twitter exposé on the Hunter Biden story is a flop that doxxed multiple people - The Verge

1_r/devopsish #zapier

·theverge.com·Dec 5, 2022

Elon Musk’s promised Twitter exposé on the Hunter Biden story is a flop that doxxed multiple people - The Verge

Airplane Mode to Become Obsolete in the EU

1_r/devopsish #zapier

·gizmodo.com·Dec 5, 2022

Airplane Mode to Become Obsolete in the EU

Open source software host Fosshost shutting down as CEO unreachable

1_r/devopsish #zapier

·bleepingcomputer.com·Dec 5, 2022

Open source software host Fosshost shutting down as CEO unreachable

Hachyderm Community

1_r/devopsish #zapier

·community.hachyderm.io·Dec 5, 2022

Hachyderm Community

Leaving the Basement

Hachyderm has reached 30,000 users. A 'small sized' service in regard to scale. However, in the process we have hit very familiar 'medium sized' scale problems which caused us to migrate our services out of my basement. This is the outage report, post mortem, and high level overview of the process of migrating to Hetzner in Germany. From observation to production fixes. This is the story.

1_r/devopsish

·community.hachyderm.io·Dec 5, 2022

Leaving the Basement

Blog: Forensic container checkpointing in Kubernetes

Authors: Adrian Reber (Red Hat) Forensic container checkpointing is based on Checkpoint/Restore In Userspace (CRIU) and allows the creation of stateful copies of a running container without the container knowing that it is being checkpointed. The copy of the container can be analyzed and restored in a sandbox environment multiple times without the original container being aware of it. Forensic container checkpointing was introduced as an alpha feature in Kubernetes v1.25. How does it work? With the help of CRIU it is possible to checkpoint and restore containers. CRIU is integrated in runc, crun, CRI-O and containerd and forensic container checkpointing as implemented in Kubernetes uses these existing CRIU integrations. Why is it important? With the help of CRIU and the corresponding integrations it is possible to get all information and state about a running container on disk for later forensic analysis. Forensic analysis might be important to inspect a suspicious container without stopping or influencing it. If the container is really under attack, the attacker might detect attempts to inspect the container. Taking a checkpoint and analysing the container in a sandboxed environment offers the possibility to inspect the container without the original container and maybe attacker being aware of the inspection. In addition to the forensic container checkpointing use case, it is also possible to migrate a container from one node to another node without loosing the internal state. Especially for stateful containers with long initialization times restoring from a checkpoint might save time after a reboot or enable much faster startup times. How do I use container checkpointing? The feature is behind a feature gate , so make sure to enable the ContainerCheckpoint gate before you can use the new feature. The runtime must also support container checkpointing: containerd: support is currently under discussion. See containerd pull request #6965 for more details. CRI-O: v1.25 has support for forensic container checkpointing. Usage example with CRI-O To use forensic container checkpointing in combination with CRI-O, the runtime needs to be started with the command-line option --enable-criu-support=true . For Kubernetes, you need to run your cluster with the ContainerCheckpoint feature gate enabled. As the checkpointing functionality is provided by CRIU it is also necessary to install CRIU. Usually runc or crun depend on CRIU and therefore it is installed automatically. It is also important to mention that at the time of writing the checkpointing functionality is to be considered as an alpha level feature in CRI-O and Kubernetes and the security implications are still under consideration. Once containers and pods are running it is possible to create a checkpoint. Checkpointing is currently only exposed on the kubelet level. To checkpoint a container, you can run curl on the node where that container is running, and trigger a checkpoint: curl -X POST "https://localhost:10250/checkpoint/namespace/podId/container" For a container named counter in a pod named counters in a namespace named default the kubelet API endpoint is reachable at: curl -X POST "https://localhost:10250/checkpoint/default/counters/counter" For completeness the following curl command-line options are necessary to have curl accept the kubelet 's self signed certificate and authorize the use of the kubelet checkpoint API: --insecure --cert /var/run/kubernetes/client-admin.crt --key /var/run/kubernetes/client-admin.key Triggering this kubelet API will request the creation of a checkpoint from CRI-O. CRI-O requests a checkpoint from your low-level runtime (for example, runc ). Seeing that request, runc invokes the criu tool to do the actual checkpointing. Once the checkpointing has finished the checkpoint should be available at /var/lib/kubelet/checkpoints/checkpoint-pod-name_namespace-name-container-name-timestamp.tar You could then use that tar archive to restore the container somewhere else. Restore a checkpointed container outside of Kubernetes (with CRI-O) With the checkpoint tar archive it is possible to restore the container outside of Kubernetes in a sandboxed instance of CRI-O. For better user experience during restore, I recommend that you use the latest version of CRI-O from the main CRI-O GitHub branch. If you're using CRI-O v1.25, you'll need to manually create certain directories Kubernetes would create before starting the container. The first step to restore a container outside of Kubernetes is to create a pod sandbox using crictl : crictl runp pod-config.json Then you can restore the previously checkpointed container into the newly created pod sandbox: crictl create POD_ID container-config.json pod-config.json Instead of specifying a container image in a registry in container-config.json you need to specify the path to the checkpoint archive that you created earlier: { "metadata" : { "name" : "counter" }, "image" :{ "image" : "/var/lib/kubelet/checkpoints/checkpoint-archive.tar" } } Next, run crictl start CONTAINER_ID to start that container, and then a copy of the previously checkpointed container should be running. Restore a checkpointed container within of Kubernetes To restore the previously checkpointed container directly in Kubernetes it is necessary to convert the checkpoint archive into an image that can be pushed to a registry. One possible way to convert the local checkpoint archive consists of the following steps with the help of buildah : newcontainer = $( buildah from scratch) buildah add $newcontainer /var/lib/kubelet/checkpoints/checkpoint-pod-name_namespace-name-container-name-timestamp.tar / buildah config --annotation= io.kubernetes.cri-o.annotations.checkpoint.name= container-name $newcontainer buildah commit $newcontainer checkpoint-image:latest buildah rm $newcontainer The resulting image is not standardized and only works in combination with CRI-O. Please consider this image format as pre-alpha. There are ongoing discussions to standardize the format of checkpoint images like this. Important to remember is that this not yet standardized image format only works if CRI-O has been started with --enable-criu-support=true . The security implications of starting CRI-O with CRIU support are not yet clear and therefore the functionality as well as the image format should be used with care. Now, you'll need to push that image to a container image registry. For example: buildah push localhost/checkpoint-image:latest container-image-registry.example/user/checkpoint-image:latest To restore this checkpoint image (container-image-registry.example/user/checkpoint-image:latest ), the image needs to be listed in the specification for a Pod. Here's an example manifest: apiVersion : v1 kind : Pod metadata : namePrefix : example- spec : containers : - name : container-name image : container-image-registry.example/user/checkpoint-image:latest nodeName : destination-node Kubernetes schedules the new Pod onto a node. The kubelet on that node instructs the container runtime (CRI-O in this example) to create and start a container based on an image specified as registry/user/checkpoint-image:latest . CRI-O detects that registry/user/checkpoint-image:latest is a reference to checkpoint data rather than a container image. Then, instead of the usual steps to create and start a container, CRI-O fetches the checkpoint data and restores the container from that specified checkpoint. The application in that Pod would continue running as if the checkpoint had not been taken; within the container, the application looks and behaves like any other container that had been started normally and not restored from a checkpoint. With these steps, it is possible to replace a Pod running on one node with a new equivalent Pod that is running on a different node, and without losing the state of the containers in that Pod. How do I get involved? You can reach SIG Node by several means: Slack: #sig-node Mailing list

1_r/devopsish #zapier

·kubernetes.io·Dec 5, 2022

Blog: Forensic container checkpointing in Kubernetes

ellie/atuin: 🐢 Magical shell history

🐢 Magical shell history. Contribute to ellie/atuin development by creating an account on GitHub.

1_r/devopsish

·github.com·Dec 4, 2022

ellie/atuin: 🐢 Magical shell history

Mastodon-Twitter Crossposter (@crossposter@masto.donte.com.br)

It seems like this only solves one part of the problem, though. I'm seeing that this post above did not go through to twitter, because "User is over daily status update limit.", which is not the case for the crossposter. I'm thinking they are enforcing tighter limits around automated tools, so this might be the issue. I'll get in touch with their support, but given the :hot_shit: they seem to be going through, I would not expect a swift response. -- @renatolond

1_r/devopsish

·masto.donte.com.br·Dec 4, 2022

Mastodon-Twitter Crossposter (@crossposter@masto.donte.com.br)

Acorn interests me. But, compose is a spec. Why do I need to rewrite anything? There should be a tool in Acorn’s wheelhouse to do it for me. — Converting Docker Compose file to Acornfile | Acorn Labs

1_r/devopsish

·acorn.io·Dec 2, 2022

Scaling the Mastodon - Leahs Gedanken

1_r/devopsish

·leah.is·Dec 2, 2022

Scaling the Mastodon - Leahs Gedanken

Disk performance of lightweight macOS VMs on Apple silicon

Writing to the Data volume in a VM is dismally slow. Is using shared storage any quicker? What happens when you copy a VM to an external SSD, or to another Mac?

1_r/devopsish

·eclecticlight.co·Dec 2, 2022

Disk performance of lightweight macOS VMs on Apple silicon

Blog: Finding suspicious syscalls with the seccomp notifier

Authors: Sascha Grunert Debugging software in production is one of the biggest challenges we have to face in our containerized environments. Being able to understand the impact of the available security options, especially when it comes to configuring our deployments, is one of the key aspects to make the default security in Kubernetes stronger. We have all those logging, tracing and metrics data already at hand, but how do we assemble the information they provide into something human readable and actionable? Seccomp is one of the standard mechanisms to protect a Linux based Kubernetes application from malicious actions by interfering with its system calls . This allows us to restrict the application to a defined set of actionable items, like modifying files or responding to HTTP requests. Linking the knowledge of which set of syscalls is required to, for example, modify a local file, to the actual source code is in the same way non-trivial. Seccomp profiles for Kubernetes have to be written in JSON and can be understood as an architecture specific allow-list with superpowers, for example: { "defaultAction" : "SCMP_ACT_ERRNO" , "defaultErrnoRet" : 38 , "defaultErrno" : "ENOSYS" , "syscalls" : [ { "names" : ["chmod" , "chown" , "open" , "write" ], "action" : "SCMP_ACT_ALLOW" } ] } The above profile errors by default specifying the defaultAction of SCMP_ACT_ERRNO . This means we have to allow a set of syscalls via SCMP_ACT_ALLOW , otherwise the application would not be able to do anything at all. Okay cool, for being able to allow file operations, all we have to do is adding a bunch of file specific syscalls like open or write , and probably also being able to change the permissions via chmod and chown , right? Basically yes, but there are issues with the simplicity of that approach: Seccomp profiles need to include the minimum set of syscalls required to start the application. This also includes some syscalls from the lower level Open Container Initiative (OCI) container runtime, for example runc or crun . Beside that, we can only guarantee the required syscalls for a very specific version of the runtimes and our application, because the code parts can change between releases. The same applies to the termination of the application as well as the target architecture we're deploying on. Features like executing commands within containers also require another subset of syscalls. Not to mention that there are multiple versions for syscalls doing slightly different things and the seccomp profiles are able to modify their arguments. It's also not always clearly visible to the developers which syscalls are used by their own written code parts, because they rely on programming language abstractions or frameworks. How can we know which syscalls are even required then? Who should create and maintain those profiles during its development life-cycle? Well, recording and distributing seccomp profiles is one of the problem domains of the Security Profiles Operator , which is already solving that. The operator is able to record seccomp , SELinux and even AppArmor profiles into a Custom Resource Definition (CRD) , reconciles them to each node and makes them available for usage. The biggest challenge about creating security profiles is to catch all code paths which execute syscalls. We could achieve that by having 100% logical coverage of the application when running an end-to-end test suite. You get the problem with the previous statement: It's too idealistic to be ever fulfilled, even without taking all the moving parts during application development and deployment into account. Missing a syscall in the seccomp profiles' allow list can have tremendously negative impact on the application. It's not only that we can encounter crashes, which are trivially detectable. It can also happen that they slightly change logical paths, change the business logic, make parts of the application unusable, slow down performance or even expose security vulnerabilities. We're simply not able to see the whole impact of that, especially because blocked syscalls via SCMP_ACT_ERRNO do not provide any additional audit logging on the system. Does that mean we're lost? Is it just not realistic to dream about a Kubernetes where everyone uses the default seccomp profile ? Should we stop striving towards maximum security in Kubernetes and accept that it's not meant to be secure by default? Definitely not. Technology evolves over time and there are many folks working behind the scenes of Kubernetes to indirectly deliver features to address such problems. One of the mentioned features is the seccomp notifier , which can be used to find suspicious syscalls in Kubernetes. The seccomp notify feature consists of a set of changes introduced in Linux 5.9. It makes the kernel capable of communicating seccomp related events to the user space. That allows applications to act based on the syscalls and opens for a wide range of possible use cases. We not only need the right kernel version, but also at least runc v1.1.0 (or crun v0.19) to be able to make the notifier work at all. The Kubernetes container runtime CRI-O gets support for the seccomp notifier in v1.26.0 . The new feature allows us to identify possibly malicious syscalls in our application, and therefore makes it possible to verify profiles for consistency and completeness. Let's give that a try. First of all we need to run the latest main version of CRI-O, because v1.26.0 has not been released yet at time of writing. You can do that by either compiling it from the source code or by using the pre-built binary bundle via the get-script . The seccomp notifier feature of CRI-O is guarded by an annotation, which has to be explicitly allowed, for example by using a configuration drop-in like this: cat /etc/crio/crio.conf.d/02-runtimes.conf [crio.runtime] default_runtime = "runc" [crio.runtime.runtimes.runc] allowed_annotations = [ "io.kubernetes.cri-o.seccompNotifierAction" ] If CRI-O is up and running, then it should indicate that the seccomp notifier is available as well: sudo ./bin/crio --enable-metrics … INFO[…] Starting seccomp notifier watcher INFO[…] Serving metrics on :9090 via HTTP … We also enable the metrics, because they provide additional telemetry data about the notifier. Now we need a running Kubernetes cluster for demonstration purposes. For this demo, we mainly stick to the hack/local-up-cluster.sh approach to locally spawn a single node Kubernetes cluster. If everything is up and running, then we would have to define a seccomp profile for testing purposes. But we do not have to create our own, we can just use the RuntimeDefault profile which gets shipped with each container runtime. For example the RuntimeDefault profile for CRI-O can be found in the containers/common library. Now we need a test container, which can be a simple nginx pod like this: apiVersion : v1 kind : Pod metadata : name : nginx annotations : io.kubernetes.cri-o.seccompNotifierAction : "stop" spec : restartPolicy : Never containers : - name : nginx image : nginx:1.23.2 securityContext : seccompProfile : type : RuntimeDefault Please note the annotation io.kubernetes.cri-o.seccompNotifierAction , which enables the seccomp notifier for this workload. The value of the annotation can be either stop for stopping the workload or anything else for doing nothing else than logging and throwing metrics. Because of the termination we also use the restartPolicy: Never to not automatically recreate the container on failure. Let's run the pod and check if it works: kubectl apply -f nginx.yaml kubectl get pods -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES nginx 1/1 Running 0 3m39s 10.85.0.3 127.0.0.1 none none We can also test if the web server itself works as intended: curl 10.85.0.3 !DOCTYPE html html head titleWelcome to nginx!/title … While everything is now up and running, CRI-O also indicates that it has started the seccomp notifier: … INFO[…] Injecting seccomp notifier into seccomp profile of container 662a3bb0fdc7dd1bf5a88a8aa8ef9eba6296b593146d988b4a9b85822422febb … If we would now run a forbidden syscall inside of the container, then we can expect that the workload gets terminated. Let's give that a try by running chroot in the containers namespaces: kubectl exec -it nginx -- bash root@nginx:/# chroot /tmp chroot: cannot change root directory to '/tmp': Function not implemented root@nginx:/# command terminated with exit code 137 The exec session got terminated, so it looks like the container is not running any more: kubectl get pods NAME READY STATUS RESTARTS AGE nginx 0/1 seccomp killed 0 96s Alright, the container got killed by seccomp, do we get any more information about what was going on? kubectl describe pod nginx Name: nginx … Containers: nginx: … State: Terminated Reason: seccomp killed Message: Used forbidden syscalls: chroot (1x) Exit Code: 137 Started: Mon, 14 Nov 2022 12:19:46 +0100 Finished: Mon, 14 Nov 2022 12:20:26 +0100 … The seccomp notifier feature of CRI-O correctly set the termination reason and message, including which forbidden syscall has been used how often (1x ). How often? Yes, the notifier gives the application up to 5 seconds after the last seen syscall until it starts the termination. This means that it's possible to catch multiple forbidden syscalls within one test by avoiding time-consuming trial and errors. kubectl exec -it nginx -- chroot /tmp chroot: cannot change root directory to '/tmp': Function not implemented command terminated with exit code 125 kubectl exec -it nginx -- chroot /tmp chroot: cannot change root directory to '/tmp': Function not implemented command terminated with exit code 125 kubectl exec -it nginx -- swapoff -a command terminated with exit ...

1_r/devopsish #zapier

·kubernetes.io·Dec 2, 2022

Blog: Finding suspicious syscalls with the seccomp notifier

re:Invent 2022 - Recap Best Practices for Building a Container Management Platform | Jessica Deen | Deen of DevOps

Recap: Best practices for a container management platform In this chalk talk, we learned about five best practices for building a container platform management system. This talk covered key steps that help drive efficiency and effectiveness, from assembling the right teams to innovate, design, and operate your container management platform to choosing the right AWS and ISV tools. Behind the scenes: Making the diagrams I used a tool called Excalidraw to make the diagrams we used in our chalktalk.

1_r/devopsish

·jessicadeen.com·Dec 1, 2022

re:Invent 2022 - Recap Best Practices for Building a Container Management Platform | Jessica Deen | Deen of DevOps

OSS Supply-chain Security: What Will It Take? - ACM Queue

1_r/devopsish

·queue.acm.org·Dec 1, 2022

OSS Supply-chain Security: What Will It Take? - ACM Queue

Twitch's new 'Shield Mode' is a one-button anti-harassment tool for streamers | Engadget

1_r/devopsish

·engadget.com·Dec 1, 2022

Twitch's new 'Shield Mode' is a one-button anti-harassment tool for streamers | Engadget

New Sigstore Landscape: Add your signed project

A Sigstore section was added to the Open Source Security Foundation (OpenSSF)’s Landscape.

1_r/devopsish

·blog.sigstore.dev·Dec 1, 2022

New Sigstore Landscape: Add your signed project

Flux is a CNCF Graduated project

Flux becomes a graduated project within the Cloud Native Computing Foundation. Today is a day to celebrate our hard work as a community!

1_r/devopsish

·fluxcd.io·Dec 1, 2022

Flux is a CNCF Graduated project

Notice of Recent Security Incident - The LastPass Blog

We are working diligently to understand the scope of the incident and identify what specific information has been accessed.

1_r/devopsish

·blog.lastpass.com·Dec 1, 2022

Notice of Recent Security Incident - The LastPass Blog

Blog: Boosting Kubernetes container runtime observability with OpenTelemetry

Authors: Sascha Grunert When speaking about observability in the cloud native space, then probably everyone will mention OpenTelemetry (OTEL) at some point in the conversation. That's great, because the community needs standards to rely on for developing all cluster components into the same direction. OpenTelemetry enables us to combine logs, metrics, traces and other contextual information (called baggage) into a single resource. Cluster administrators or software engineers can use this resource to get a viewport about what is going on in the cluster over a defined period of time. But how can Kubernetes itself make use of this technology stack? Kubernetes consists of multiple components where some are independent and others are stacked together. Looking at the architecture from a container runtime perspective, then there are from the top to the bottom: kube-apiserver : Validates and configures data for the API objects kubelet : Agent running on each node CRI runtime : Container Runtime Interface (CRI) compatible container runtime like CRI-O or containerd OCI runtime : Lower level Open Container Initiative (OCI) runtime like runc or crun Linux kernel or Microsoft Windows : Underlying operating system That means if we encounter a problem with running containers in Kubernetes, then we start looking at one of those components. Finding the root cause for problems is one of the most time consuming actions we face with the increased architectural complexity from today's cluster setups. Even if we know the component which seems to cause the issue, we still have to take the others into account to maintain a mental timeline of events which are going on. How do we achieve that? Well, most folks will probably stick to scraping logs, filtering them and assembling them together over the components borders. We also have metrics, right? Correct, but bringing metrics values in correlation with plain logs makes it even harder to track what is going on. Some metrics are also not made for debugging purposes. They have been defined based on the end user perspective of the cluster for linking usable alerts and not for developers debugging a cluster setup. OpenTelemetry to the rescue: the project aims to combine signals such as traces , metrics and logs together to maintain the right viewport on the cluster state. What is the current state of OpenTelemetry tracing in Kubernetes? From an API server perspective, we have alpha support for tracing since Kubernetes v1.22, which will graduate to beta in one of the upcoming releases. Unfortunately the beta graduation has missed the v1.26 Kubernetes release. The design proposal can be found in the API Server Tracing Kubernetes Enhancement Proposal (KEP) which provides more information about it. The kubelet tracing part is tracked in another KEP , which was implemented in an alpha state in Kubernetes v1.25. A beta graduation is not planned as time of writing, but more may come in the v1.27 release cycle. There are other side-efforts going on beside both KEPs, for example klog is considering OTEL support , which would boost the observability by linking log messages to existing traces. Within SIG Instrumentation and SIG Node, we're also discussing how to link the kubelet traces together , because right now they're focused on the gRPC calls between the kubelet and the CRI container runtime. CRI-O features OpenTelemetry tracing support since v1.23.0 and is working on continuously improving them, for example by attaching the logs to the traces or extending the spans to logical parts of the application . This helps users of the traces to gain the same information like parsing the logs, but with enhanced capabilities of scoping and filtering to other OTEL signals. The CRI-O maintainers are also working on a container monitoring replacement for conmon , which is called conmon-rs and is purely written in Rust . One benefit of having a Rust implementation is to be able to add features like OpenTelemetry support, because the crates (libraries) for those already exist. This allows a tight integration with CRI-O and lets consumers see the most low level tracing data from their containers. The containerd folks added tracing support since v1.6.0, which is available by using a plugin . Lower level OCI runtimes like runc or crun feature no support for OTEL at all and it does not seem to exist a plan for that. We always have to consider that there is a performance overhead when collecting the traces as well as exporting them to a data sink. I still think it would be worth an evaluation on how extended telemetry collection could look like in OCI runtimes. Let's see if the Rust OCI runtime youki is considering something like that in the future. I'll show you how to give it a try. For my demo I'll stick to a stack with a single local node that has runc, conmon-rs, CRI-O, and a kubelet. To enable tracing in the kubelet, I need to apply the following KubeletConfiguration : apiVersion : kubelet.config.k8s.io/v1beta1 kind : KubeletConfiguration featureGates : KubeletTracing : true tracing : samplingRatePerMillion : 1000000 A samplingRatePerMillion equally to one million will internally translate to sampling everything. A similar configuration has to be applied to CRI-O; I can either start the crio binary with --enable-tracing and --tracing-sampling-rate-per-million 1000000 or we use a drop-in configuration like this: cat /etc/crio/crio.conf.d/99-tracing.conf [crio.tracing] enable_tracing = true tracing_sampling_rate_per_million = 1000000 To configure CRI-O to use conmon-rs, you require at least the latest CRI-O v1.25.x and conmon-rs v0.4.0. Then a configuration drop-in like this can be used to make CRI-O use conmon-rs: cat /etc/crio/crio.conf.d/99-runtimes.conf [crio.runtime] default_runtime = "runc" [crio.runtime.runtimes.runc] runtime_type = "pod" monitor_path = "/path/to/conmonrs" # or will be looked up in $PATH That's it, the default configuration will point to an OpenTelemetry collector gRPC endpoint of localhost:4317 , which has to be up and running as well. There are multiple ways to run OTLP as described in the docs , but it's also possible to kubectl proxy into an existing instance running within Kubernetes. If everything is set up, then the collector should log that there are incoming traces: ScopeSpans #0 ScopeSpans SchemaURL: InstrumentationScope go.opentelemetry.io/otel/sdk/tracer Span #0 Trace ID : 71896e69f7d337730dfedb6356e74f01 Parent ID : a2a7714534c017e6 ID : 1d27dbaf38b9da8b Name : github.com/cri-o/cri-o/server.(*Server).filterSandboxList Kind : SPAN_KIND_INTERNAL Start time : 2022-11-15 09:50:20.060325562 +0000 UTC End time : 2022-11-15 09:50:20.060326291 +0000 UTC Status code : STATUS_CODE_UNSET Status message : Span #1 Trace ID : 71896e69f7d337730dfedb6356e74f01 Parent ID : a837a005d4389579 ID : a2a7714534c017e6 Name : github.com/cri-o/cri-o/server.(*Server).ListPodSandbox Kind : SPAN_KIND_INTERNAL Start time : 2022-11-15 09:50:20.060321973 +0000 UTC End time : 2022-11-15 09:50:20.060330602 +0000 UTC Status code : STATUS_CODE_UNSET Status message : Span #2 Trace ID : fae6742709d51a9b6606b6cb9f381b96 Parent ID : 3755d12b32610516 ID : 0492afd26519b4b0 Name : github.com/cri-o/cri-o/server.(*Server).filterContainerList Kind : SPAN_KIND_INTERNAL Start time : 2022-11-15 09:50:20.0607746 +0000 UTC End time : 2022-11-15 09:50:20.060795505 +0000 UTC Status code : STATUS_CODE_UNSET Status message : Events: SpanEvent #0 - Name: log - Timestamp: 2022-11-15 09:50:20.060778668 +0000 UTC - DroppedAttributesCount: 0 - Attributes:: - id: Str(adf791e5-2eb8-4425-b092-f217923fef93) - log.message: Str(No filters were applied, returning full container list) - log.severity: Str(DEBUG) - name: Str(/runtime.v1.RuntimeService/ListContainers) I can see that the spans have a trace ID and typically have a parent attached. Events such as logs are part of the output as well. In the above case, the kubelet is periodically triggering a ListPodSandbox RPC to CRI-O caused by the Pod Lifecycle Event Generator (PLEG). Displaying those traces can be done via, for example, Jaeger . When running the tracing stack locally, then a Jaeger instance should be exposed on http://localhost:16686 per default. The ListPodSandbox requests are directly visible within the Jaeger UI: That's not too exciting, so I'll run a workload directly via kubectl : kubectl run -it --rm --restart= Never --image= alpine alpine -- echo hi hi pod "alpine" deleted Looking now at Jaeger, we can see that we have traces for conmonrs , crio as well as the kubelet for the RunPodSandbox and CreateContainer CRI RPCs: The kubelet and CRI-O spans are connected to each other to make investigation easier. If we now take a closer look at the spans, then we can see that CRI-O's logs are correctly accosted with the corresponding functionality. For example we can extract the container user from the traces like this: The lower level spans of conmon-rs are also part of this trace. For example conmon-rs maintains an internal read_loop for handling IO between the container and the end user. The logs for reading and writing bytes are part of the span. The same applies to the wait_for_exit_code span, which tells us that the container exited successfully with code 0 : Having all that information at hand side by side to the filtering capabilities of Jaeger makes the whole stack a great solution for debugging container issues! Mentioning the "whole stack" also shows the biggest downside of the overall approach: Compared to parsing logs it adds a noticeable overhead on top of the cluster setup. Users have to maintain a sink like Elasticsearch to persist the data, expose the Jaeger UI and possibly take the performance drawback into account. Anyways, it's still one of the best ways to increase the observability...

1_r/devopsish #zapier

·kubernetes.io·Dec 1, 2022

Blog: Boosting Kubernetes container runtime observability with OpenTelemetry

coder/coder: Coder provisions software development environments via Terraform on Linux, macOS, Windows, X86, ARM, and of course, Kubernetes.

1_r/devopsish #zapier

·github.com·Nov 30, 2022

coder/coder: Coder provisions software development environments via Terraform on Linux, macOS, Windows, X86, ARM, and of course, Kubernetes.

Cloudscape – Cloudscape Design System

Cloudscape offers user interface guidelines, front-end components, design resources, and development tools for building intuitive, engaging, and inclusive user experiences at scale.

1_r/devopsish

·cloudscape.design·Nov 30, 2022

Cloudscape – Cloudscape Design System

AWS Serverless Application Model (AWS SAM) Documentation

1_r/devopsish

·docs.aws.amazon.com·Nov 30, 2022

AWS Serverless Application Model (AWS SAM) Documentation

Hacking The Cloud - Hacking The Cloud

1_r/devopsish

·hackingthe.cloud·Nov 30, 2022

Hacking The Cloud - Hacking The Cloud

kubesphere/kubeeye: KubeEye aims to find various problems on Kubernetes, such as application misconfiguration, unhealthy cluster components and node problems.

KubeEye aims to find various problems on Kubernetes, such as application misconfiguration, unhealthy cluster components and node problems. - GitHub - kubesphere/kubeeye: KubeEye aims to find variou...

1_r/devopsish

·github.com·Nov 30, 2022

kubesphere/kubeeye: KubeEye aims to find various problems on Kubernetes, such as application misconfiguration, unhealthy cluster components and node problems.

ahmetb/kubectl-foreach: Run kubectl commands in all/some contexts in parallel (similar to GNU xargs+parallel)

Run kubectl commands in all/some contexts in parallel (similar to GNU xargs+parallel) - GitHub - ahmetb/kubectl-foreach: Run kubectl commands in all/some contexts in parallel (similar to GNU xargs+...

1_r/devopsish

·github.com·Nov 30, 2022

ahmetb/kubectl-foreach: Run kubectl commands in all/some contexts in parallel (similar to GNU xargs+parallel)

I’ve noticed a lot of container and Kubernetes users don’t realize there are very real standards orgs in the ecosystem, like OCI, which are vitally important for capability and consistency | Introduction to CRI

Let's learn about what CRI actually means

1_r/devopsish

·blog.kubesimplify.com·Nov 30, 2022

A fast PHP release cycle feels… different, in a good way | PHP 8.0 Reaches End of Life - Percona Database Performance Blog

PHP 8.0 has passed into End Of Life status, and here's what you need to do to be ready for 8.1

1_r/devopsish

·percona.com·Nov 30, 2022

A fast PHP release cycle feels… different, in a good way | PHP 8.0 Reaches End of Life - Percona Database Performance Blog

Encryption Vendor for Sony, Lexar, and Sandisk Leaked API Keys | RestorePrivacy

1_r/devopsish

·restoreprivacy.com·Nov 30, 2022

Encryption Vendor for Sony, Lexar, and Sandisk Leaked API Keys | RestorePrivacy

FWIW I have uninstalled Telegram on almost all my devices, for some reason it got real spammy | Telegram Discloses Personal Details of Pirating Users Following Court Order * TorrentFreak

Telegram has complied with an order from the High Court in Delhi by sharing the personal details of pirating users with rightsholders.

1_r/devopsish

·torrentfreak.com·Nov 30, 2022

FWIW I have uninstalled Telegram on almost all my devices, for some reason it got real spammy | Telegram Discloses Personal Details of Pirating Users Following Court Order * TorrentFreak