Hachyderm has reached 30,000 users. A 'small sized' service in regard to scale. However, in the process we have hit very familiar 'medium sized' scale problems which caused us to migrate our services out of my basement. This is the outage report, post mortem, and high level overview of the process of migrating to Hetzner in Germany. From observation to production fixes. This is the story.
Blog: Forensic container checkpointing in Kubernetes
Authors: Adrian Reber (Red Hat)
Forensic container checkpointing is based on Checkpoint/Restore In
Userspace (CRIU) and allows the creation of stateful copies
of a running container without the container knowing that it is being
checkpointed. The copy of the container can be analyzed and restored in a
sandbox environment multiple times without the original container being aware
of it. Forensic container checkpointing was introduced as an alpha feature in
Kubernetes v1.25.
How does it work?
With the help of CRIU it is possible to checkpoint and restore containers.
CRIU is integrated in runc, crun, CRI-O and containerd and forensic container
checkpointing as implemented in Kubernetes uses these existing CRIU
integrations.
Why is it important?
With the help of CRIU and the corresponding integrations it is possible to get
all information and state about a running container on disk for later forensic
analysis. Forensic analysis might be important to inspect a suspicious
container without stopping or influencing it. If the container is really under
attack, the attacker might detect attempts to inspect the container. Taking a
checkpoint and analysing the container in a sandboxed environment offers the
possibility to inspect the container without the original container and maybe
attacker being aware of the inspection.
In addition to the forensic container checkpointing use case, it is also
possible to migrate a container from one node to another node without loosing
the internal state. Especially for stateful containers with long initialization
times restoring from a checkpoint might save time after a reboot or enable much
faster startup times.
How do I use container checkpointing?
The feature is behind a feature gate , so
make sure to enable the ContainerCheckpoint gate before you can use the new
feature.
The runtime must also support container checkpointing:
containerd: support is currently under discussion. See containerd
pull request #6965 for more details.
CRI-O: v1.25 has support for forensic container checkpointing.
Usage example with CRI-O
To use forensic container checkpointing in combination with CRI-O, the runtime
needs to be started with the command-line option --enable-criu-support=true .
For Kubernetes, you need to run your cluster with the ContainerCheckpoint
feature gate enabled. As the checkpointing functionality is provided by CRIU it
is also necessary to install CRIU. Usually runc or crun depend on CRIU and
therefore it is installed automatically.
It is also important to mention that at the time of writing the checkpointing functionality is
to be considered as an alpha level feature in CRI-O and Kubernetes and the
security implications are still under consideration.
Once containers and pods are running it is possible to create a checkpoint.
Checkpointing
is currently only exposed on the kubelet level. To checkpoint a container,
you can run curl on the node where that container is running, and trigger a
checkpoint:
curl -X POST "https://localhost:10250/checkpoint/namespace/podId/container"
For a container named counter in a pod named counters in a namespace named
default the kubelet API endpoint is reachable at:
curl -X POST "https://localhost:10250/checkpoint/default/counters/counter"
For completeness the following curl command-line options are necessary to
have curl accept the kubelet 's self signed certificate and authorize the
use of the kubelet checkpoint API:
--insecure --cert /var/run/kubernetes/client-admin.crt --key /var/run/kubernetes/client-admin.key
Triggering this kubelet API will request the creation of a checkpoint from
CRI-O. CRI-O requests a checkpoint from your low-level runtime (for example,
runc ). Seeing that request, runc invokes the criu tool
to do the actual checkpointing.
Once the checkpointing has finished the checkpoint should be available at
/var/lib/kubelet/checkpoints/checkpoint-pod-name_namespace-name-container-name-timestamp.tar
You could then use that tar archive to restore the container somewhere else.
Restore a checkpointed container outside of Kubernetes (with CRI-O)
With the checkpoint tar archive it is possible to restore the container outside
of Kubernetes in a sandboxed instance of CRI-O. For better user experience
during restore, I recommend that you use the latest version of CRI-O from the
main CRI-O GitHub branch. If you're using CRI-O v1.25, you'll need to
manually create certain directories Kubernetes would create before starting the
container.
The first step to restore a container outside of Kubernetes is to create a pod sandbox
using crictl :
crictl runp pod-config.json
Then you can restore the previously checkpointed container into the newly created pod sandbox:
crictl create POD_ID container-config.json pod-config.json
Instead of specifying a container image in a registry in container-config.json
you need to specify the path to the checkpoint archive that you created earlier:
{
"metadata" : {
"name" : "counter"
},
"image" :{
"image" : "/var/lib/kubelet/checkpoints/checkpoint-archive.tar"
}
}
Next, run crictl start CONTAINER_ID to start that container, and then a
copy of the previously checkpointed container should be running.
Restore a checkpointed container within of Kubernetes
To restore the previously checkpointed container directly in Kubernetes it is
necessary to convert the checkpoint archive into an image that can be pushed to
a registry.
One possible way to convert the local checkpoint archive consists of the
following steps with the help of buildah :
newcontainer = $( buildah from scratch)
buildah add $newcontainer /var/lib/kubelet/checkpoints/checkpoint-pod-name_namespace-name-container-name-timestamp.tar /
buildah config --annotation= io.kubernetes.cri-o.annotations.checkpoint.name= container-name $newcontainer
buildah commit $newcontainer checkpoint-image:latest
buildah rm $newcontainer
The resulting image is not standardized and only works in combination with
CRI-O. Please consider this image format as pre-alpha. There are ongoing
discussions to standardize the format of checkpoint
images like this. Important to remember is that this not yet standardized image
format only works if CRI-O has been started with --enable-criu-support=true .
The security implications of starting CRI-O with CRIU support are not yet clear
and therefore the functionality as well as the image format should be used with
care.
Now, you'll need to push that image to a container image registry. For example:
buildah push localhost/checkpoint-image:latest container-image-registry.example/user/checkpoint-image:latest
To restore this checkpoint image (container-image-registry.example/user/checkpoint-image:latest ), the
image needs to be listed in the specification for a Pod. Here's an example
manifest:
apiVersion : v1
kind : Pod
metadata :
namePrefix : example-
spec :
containers :
- name : container-name
image : container-image-registry.example/user/checkpoint-image:latest
nodeName : destination-node
Kubernetes schedules the new Pod onto a node. The kubelet on that node
instructs the container runtime (CRI-O in this example) to create and start a
container based on an image specified as registry/user/checkpoint-image:latest .
CRI-O detects that registry/user/checkpoint-image:latest
is a reference to checkpoint data rather than a container image. Then,
instead of the usual steps to create and start a container,
CRI-O fetches the checkpoint data and restores the container from that
specified checkpoint.
The application in that Pod would continue running as if the checkpoint had not been taken;
within the container, the application looks and behaves like any other container that had been
started normally and not restored from a checkpoint.
With these steps, it is possible to replace a Pod running on one node
with a new equivalent Pod that is running on a different node,
and without losing the state of the containers in that Pod.
How do I get involved?
You can reach SIG Node by several means:
Slack: #sig-node
Mailing list
It seems like this only solves one part of the problem, though. I'm seeing that this post above did not go through to twitter, because "User is over daily status update limit.", which is not the case for the crossposter.
I'm thinking they are enforcing tighter limits around automated tools, so this might be the issue.
I'll get in touch with their support, but given the :hot_shit: they seem to be going through, I would not expect a swift response.
-- @renatolond
Acorn interests me. But, compose is a spec. Why do I need to rewrite anything? There should be a tool in Acorn’s wheelhouse to do it for me. — Converting Docker Compose file to Acornfile | Acorn Labs
Disk performance of lightweight macOS VMs on Apple silicon
Writing to the Data volume in a VM is dismally slow. Is using shared storage any quicker? What happens when you copy a VM to an external SSD, or to another Mac?
Blog: Finding suspicious syscalls with the seccomp notifier
Authors: Sascha Grunert
Debugging software in production is one of the biggest challenges we have to
face in our containerized environments. Being able to understand the impact of
the available security options, especially when it comes to configuring our
deployments, is one of the key aspects to make the default security in
Kubernetes stronger. We have all those logging, tracing and metrics data already
at hand, but how do we assemble the information they provide into something
human readable and actionable?
Seccomp is one of the standard mechanisms to protect a Linux based
Kubernetes application from malicious actions by interfering with its system
calls . This allows us to restrict the application to a defined set of
actionable items, like modifying files or responding to HTTP requests. Linking
the knowledge of which set of syscalls is required to, for example, modify a
local file, to the actual source code is in the same way non-trivial. Seccomp
profiles for Kubernetes have to be written in JSON and can be understood
as an architecture specific allow-list with superpowers, for example:
{
"defaultAction" : "SCMP_ACT_ERRNO" ,
"defaultErrnoRet" : 38 ,
"defaultErrno" : "ENOSYS" ,
"syscalls" : [
{
"names" : ["chmod" , "chown" , "open" , "write" ],
"action" : "SCMP_ACT_ALLOW"
}
]
}
The above profile errors by default specifying the defaultAction of
SCMP_ACT_ERRNO . This means we have to allow a set of syscalls via
SCMP_ACT_ALLOW , otherwise the application would not be able to do anything at
all. Okay cool, for being able to allow file operations, all we have to do is
adding a bunch of file specific syscalls like open or write , and probably
also being able to change the permissions via chmod and chown , right?
Basically yes, but there are issues with the simplicity of that approach:
Seccomp profiles need to include the minimum set of syscalls required to start
the application. This also includes some syscalls from the lower level
Open Container Initiative (OCI) container runtime, for example
runc or crun . Beside that, we can only guarantee the required
syscalls for a very specific version of the runtimes and our application,
because the code parts can change between releases. The same applies to the
termination of the application as well as the target architecture we're
deploying on. Features like executing commands within containers also require
another subset of syscalls. Not to mention that there are multiple versions for
syscalls doing slightly different things and the seccomp profiles are able to
modify their arguments. It's also not always clearly visible to the developers
which syscalls are used by their own written code parts, because they rely on
programming language abstractions or frameworks.
How can we know which syscalls are even required then? Who should create and
maintain those profiles during its development life-cycle?
Well, recording and distributing seccomp profiles is one of the problem domains
of the Security Profiles Operator , which is already solving that. The
operator is able to record seccomp , SELinux and even
AppArmor profiles into a Custom Resource Definition (CRD) ,
reconciles them to each node and makes them available for usage.
The biggest challenge about creating security profiles is to catch all code
paths which execute syscalls. We could achieve that by having 100% logical
coverage of the application when running an end-to-end test suite. You get the
problem with the previous statement: It's too idealistic to be ever fulfilled,
even without taking all the moving parts during application development and
deployment into account.
Missing a syscall in the seccomp profiles' allow list can have tremendously
negative impact on the application. It's not only that we can encounter crashes,
which are trivially detectable. It can also happen that they slightly change
logical paths, change the business logic, make parts of the application
unusable, slow down performance or even expose security vulnerabilities. We're
simply not able to see the whole impact of that, especially because blocked
syscalls via SCMP_ACT_ERRNO do not provide any additional audit
logging on the system.
Does that mean we're lost? Is it just not realistic to dream about a Kubernetes
where everyone uses the default seccomp profile ? Should we
stop striving towards maximum security in Kubernetes and accept that it's not
meant to be secure by default?
Definitely not. Technology evolves over time and there are many folks
working behind the scenes of Kubernetes to indirectly deliver features to
address such problems. One of the mentioned features is the seccomp notifier ,
which can be used to find suspicious syscalls in Kubernetes.
The seccomp notify feature consists of a set of changes introduced in Linux 5.9.
It makes the kernel capable of communicating seccomp related events to the user
space. That allows applications to act based on the syscalls and opens for a
wide range of possible use cases. We not only need the right kernel version,
but also at least runc v1.1.0 (or crun v0.19) to be able to make the notifier
work at all. The Kubernetes container runtime CRI-O gets support for
the seccomp notifier in v1.26.0 . The new feature allows us to
identify possibly malicious syscalls in our application, and therefore makes it
possible to verify profiles for consistency and completeness. Let's give that a
try.
First of all we need to run the latest main version of CRI-O, because v1.26.0
has not been released yet at time of writing. You can do that by either
compiling it from the source code or by using the pre-built binary
bundle via the get-script . The seccomp notifier feature of CRI-O is
guarded by an annotation, which has to be explicitly allowed, for example by
using a configuration drop-in like this:
cat /etc/crio/crio.conf.d/02-runtimes.conf
[crio.runtime]
default_runtime = "runc"
[crio.runtime.runtimes.runc]
allowed_annotations = [ "io.kubernetes.cri-o.seccompNotifierAction" ]
If CRI-O is up and running, then it should indicate that the seccomp notifier is
available as well:
sudo ./bin/crio --enable-metrics
…
INFO[…] Starting seccomp notifier watcher
INFO[…] Serving metrics on :9090 via HTTP
…
We also enable the metrics, because they provide additional telemetry data about
the notifier. Now we need a running Kubernetes cluster for demonstration
purposes. For this demo, we mainly stick to the
hack/local-up-cluster.sh approach to locally spawn a single node
Kubernetes cluster.
If everything is up and running, then we would have to define a seccomp profile
for testing purposes. But we do not have to create our own, we can just use the
RuntimeDefault profile which gets shipped with each container runtime. For
example the RuntimeDefault profile for CRI-O can be found in the
containers/common library.
Now we need a test container, which can be a simple nginx pod like
this:
apiVersion : v1
kind : Pod
metadata :
name : nginx
annotations :
io.kubernetes.cri-o.seccompNotifierAction : "stop"
spec :
restartPolicy : Never
containers :
- name : nginx
image : nginx:1.23.2
securityContext :
seccompProfile :
type : RuntimeDefault
Please note the annotation io.kubernetes.cri-o.seccompNotifierAction , which
enables the seccomp notifier for this workload. The value of the annotation can
be either stop for stopping the workload or anything else for doing nothing
else than logging and throwing metrics. Because of the termination we also use
the restartPolicy: Never to not automatically recreate the container on
failure.
Let's run the pod and check if it works:
kubectl apply -f nginx.yaml
kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx 1/1 Running 0 3m39s 10.85.0.3 127.0.0.1 none none
We can also test if the web server itself works as intended:
curl 10.85.0.3
!DOCTYPE html
html
head
titleWelcome to nginx!/title
…
While everything is now up and running, CRI-O also indicates that it has started
the seccomp notifier:
…
INFO[…] Injecting seccomp notifier into seccomp profile of container 662a3bb0fdc7dd1bf5a88a8aa8ef9eba6296b593146d988b4a9b85822422febb
…
If we would now run a forbidden syscall inside of the container, then we can
expect that the workload gets terminated. Let's give that a try by running
chroot in the containers namespaces:
kubectl exec -it nginx -- bash
root@nginx:/# chroot /tmp
chroot: cannot change root directory to '/tmp': Function not implemented
root@nginx:/# command terminated with exit code 137
The exec session got terminated, so it looks like the container is not running
any more:
kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx 0/1 seccomp killed 0 96s
Alright, the container got killed by seccomp, do we get any more information
about what was going on?
kubectl describe pod nginx
Name: nginx
…
Containers:
nginx:
…
State: Terminated
Reason: seccomp killed
Message: Used forbidden syscalls: chroot (1x)
Exit Code: 137
Started: Mon, 14 Nov 2022 12:19:46 +0100
Finished: Mon, 14 Nov 2022 12:20:26 +0100
…
The seccomp notifier feature of CRI-O correctly set the termination reason and
message, including which forbidden syscall has been used how often (1x ). How
often? Yes, the notifier gives the application up to 5 seconds after the last
seen syscall until it starts the termination. This means that it's possible to
catch multiple forbidden syscalls within one test by avoiding time-consuming
trial and errors.
kubectl exec -it nginx -- chroot /tmp
chroot: cannot change root directory to '/tmp': Function not implemented
command terminated with exit code 125
kubectl exec -it nginx -- chroot /tmp
chroot: cannot change root directory to '/tmp': Function not implemented
command terminated with exit code 125
kubectl exec -it nginx -- swapoff -a
command terminated with exit ...
re:Invent 2022 - Recap Best Practices for Building a Container Management Platform | Jessica Deen | Deen of DevOps
Recap: Best practices for a container management platform In this chalk talk, we learned about five best practices for building a container platform management system. This talk covered key steps that help drive efficiency and effectiveness, from assembling the right teams to innovate, design, and operate your container management platform to choosing the right AWS and ISV tools.
Behind the scenes: Making the diagrams I used a tool called Excalidraw to make the diagrams we used in our chalktalk.
Blog: Boosting Kubernetes container runtime observability with OpenTelemetry
Authors: Sascha Grunert
When speaking about observability in the cloud native space, then probably
everyone will mention OpenTelemetry (OTEL) at some point in the
conversation. That's great, because the community needs standards to rely on
for developing all cluster components into the same direction. OpenTelemetry
enables us to combine logs, metrics, traces and other contextual information
(called baggage) into a single resource. Cluster administrators or software
engineers can use this resource to get a viewport about what is going on in the
cluster over a defined period of time. But how can Kubernetes itself make use of
this technology stack?
Kubernetes consists of multiple components where some are independent and others
are stacked together. Looking at the architecture from a container runtime
perspective, then there are from the top to the bottom:
kube-apiserver : Validates and configures data for the API objects
kubelet : Agent running on each node
CRI runtime : Container Runtime Interface (CRI) compatible container runtime
like CRI-O or containerd
OCI runtime : Lower level Open Container Initiative (OCI) runtime
like runc or crun
Linux kernel or Microsoft Windows : Underlying operating system
That means if we encounter a problem with running containers in Kubernetes, then
we start looking at one of those components. Finding the root cause for problems
is one of the most time consuming actions we face with the increased
architectural complexity from today's cluster setups. Even if we know the
component which seems to cause the issue, we still have to take the others into
account to maintain a mental timeline of events which are going on. How do we
achieve that? Well, most folks will probably stick to scraping logs, filtering
them and assembling them together over the components borders. We also have
metrics, right? Correct, but bringing metrics values in correlation with plain
logs makes it even harder to track what is going on. Some metrics are also not
made for debugging purposes. They have been defined based on the end user
perspective of the cluster for linking usable alerts and not for developers
debugging a cluster setup.
OpenTelemetry to the rescue: the project aims to combine signals such as
traces , metrics and logs together to maintain the
right viewport on the cluster state.
What is the current state of OpenTelemetry tracing in Kubernetes? From an API
server perspective, we have alpha support for tracing since Kubernetes v1.22,
which will graduate to beta in one of the upcoming releases. Unfortunately the
beta graduation has missed the v1.26 Kubernetes release. The design proposal can
be found in the API Server Tracing Kubernetes Enhancement Proposal
(KEP) which provides more information about it.
The kubelet tracing part is tracked in another KEP , which was
implemented in an alpha state in Kubernetes v1.25. A beta graduation is not
planned as time of writing, but more may come in the v1.27 release cycle.
There are other side-efforts going on beside both KEPs, for example klog is
considering OTEL support , which would boost the observability by
linking log messages to existing traces. Within SIG Instrumentation and SIG Node,
we're also discussing how to link the
kubelet traces together , because right now they're focused on the
gRPC calls between the kubelet and the CRI container runtime.
CRI-O features OpenTelemetry tracing support since v1.23.0 and is
working on continuously improving them, for example by attaching the logs to the
traces or extending the spans to logical parts of the
application . This helps users of the traces to gain the same
information like parsing the logs, but with enhanced capabilities of scoping and
filtering to other OTEL signals. The CRI-O maintainers are also working on a
container monitoring replacement for conmon , which is called
conmon-rs and is purely written in Rust . One benefit of
having a Rust implementation is to be able to add features like OpenTelemetry
support, because the crates (libraries) for those already exist. This allows a
tight integration with CRI-O and lets consumers see the most low level tracing
data from their containers.
The containerd folks added tracing support since v1.6.0, which is
available by using a plugin . Lower level OCI runtimes like
runc or crun feature no support for OTEL at all and it does not
seem to exist a plan for that. We always have to consider that there is a
performance overhead when collecting the traces as well as exporting them to a
data sink. I still think it would be worth an evaluation on how extended
telemetry collection could look like in OCI runtimes. Let's see if the Rust OCI
runtime youki is considering something like that in the future.
I'll show you how to give it a try. For my demo I'll stick to a stack with a single local node
that has runc, conmon-rs, CRI-O, and a kubelet. To enable tracing in the kubelet, I need to
apply the following KubeletConfiguration :
apiVersion : kubelet.config.k8s.io/v1beta1
kind : KubeletConfiguration
featureGates :
KubeletTracing : true
tracing :
samplingRatePerMillion : 1000000
A samplingRatePerMillion equally to one million will internally translate to
sampling everything. A similar configuration has to be applied to CRI-O; I can
either start the crio binary with --enable-tracing and
--tracing-sampling-rate-per-million 1000000 or we use a drop-in configuration
like this:
cat /etc/crio/crio.conf.d/99-tracing.conf
[crio.tracing]
enable_tracing = true
tracing_sampling_rate_per_million = 1000000
To configure CRI-O to use conmon-rs, you require at least the latest CRI-O
v1.25.x and conmon-rs v0.4.0. Then a configuration drop-in like this can be used
to make CRI-O use conmon-rs:
cat /etc/crio/crio.conf.d/99-runtimes.conf
[crio.runtime]
default_runtime = "runc"
[crio.runtime.runtimes.runc]
runtime_type = "pod"
monitor_path = "/path/to/conmonrs" # or will be looked up in $PATH
That's it, the default configuration will point to an OpenTelemetry
collector gRPC endpoint of localhost:4317 , which has to be up and
running as well. There are multiple ways to run OTLP as described in the
docs , but it's also possible to kubectl proxy into an existing
instance running within Kubernetes.
If everything is set up, then the collector should log that there are incoming
traces:
ScopeSpans #0
ScopeSpans SchemaURL:
InstrumentationScope go.opentelemetry.io/otel/sdk/tracer
Span #0
Trace ID : 71896e69f7d337730dfedb6356e74f01
Parent ID : a2a7714534c017e6
ID : 1d27dbaf38b9da8b
Name : github.com/cri-o/cri-o/server.(*Server).filterSandboxList
Kind : SPAN_KIND_INTERNAL
Start time : 2022-11-15 09:50:20.060325562 +0000 UTC
End time : 2022-11-15 09:50:20.060326291 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Span #1
Trace ID : 71896e69f7d337730dfedb6356e74f01
Parent ID : a837a005d4389579
ID : a2a7714534c017e6
Name : github.com/cri-o/cri-o/server.(*Server).ListPodSandbox
Kind : SPAN_KIND_INTERNAL
Start time : 2022-11-15 09:50:20.060321973 +0000 UTC
End time : 2022-11-15 09:50:20.060330602 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Span #2
Trace ID : fae6742709d51a9b6606b6cb9f381b96
Parent ID : 3755d12b32610516
ID : 0492afd26519b4b0
Name : github.com/cri-o/cri-o/server.(*Server).filterContainerList
Kind : SPAN_KIND_INTERNAL
Start time : 2022-11-15 09:50:20.0607746 +0000 UTC
End time : 2022-11-15 09:50:20.060795505 +0000 UTC
Status code : STATUS_CODE_UNSET
Status message :
Events:
SpanEvent #0
- Name: log
- Timestamp: 2022-11-15 09:50:20.060778668 +0000 UTC
- DroppedAttributesCount: 0
- Attributes::
- id: Str(adf791e5-2eb8-4425-b092-f217923fef93)
- log.message: Str(No filters were applied, returning full container list)
- log.severity: Str(DEBUG)
- name: Str(/runtime.v1.RuntimeService/ListContainers)
I can see that the spans have a trace ID and typically have a parent attached.
Events such as logs are part of the output as well. In the above case, the kubelet is
periodically triggering a ListPodSandbox RPC to CRI-O caused by the Pod
Lifecycle Event Generator (PLEG). Displaying those traces can be done via,
for example, Jaeger . When running the tracing stack locally, then a Jaeger
instance should be exposed on http://localhost:16686 per default.
The ListPodSandbox requests are directly visible within the Jaeger UI:
That's not too exciting, so I'll run a workload directly via kubectl :
kubectl run -it --rm --restart= Never --image= alpine alpine -- echo hi
hi
pod "alpine" deleted
Looking now at Jaeger, we can see that we have traces for conmonrs , crio as
well as the kubelet for the RunPodSandbox and CreateContainer CRI RPCs:
The kubelet and CRI-O spans are connected to each other to make investigation
easier. If we now take a closer look at the spans, then we can see that CRI-O's
logs are correctly accosted with the corresponding functionality. For example we
can extract the container user from the traces like this:
The lower level spans of conmon-rs are also part of this trace. For example
conmon-rs maintains an internal read_loop for handling IO between the
container and the end user. The logs for reading and writing bytes are part of
the span. The same applies to the wait_for_exit_code span, which tells us that
the container exited successfully with code 0 :
Having all that information at hand side by side to the filtering capabilities
of Jaeger makes the whole stack a great solution for debugging container issues!
Mentioning the "whole stack" also shows the biggest downside of the overall
approach: Compared to parsing logs it adds a noticeable overhead on top of the
cluster setup. Users have to maintain a sink like Elasticsearch to
persist the data, expose the Jaeger UI and possibly take the performance
drawback into account. Anyways, it's still one of the best ways to increase the
observability...
Cloudscape offers user interface guidelines, front-end components, design resources, and development tools for building intuitive, engaging, and inclusive user experiences at scale.
kubesphere/kubeeye: KubeEye aims to find various problems on Kubernetes, such as application misconfiguration, unhealthy cluster components and node problems.
KubeEye aims to find various problems on Kubernetes, such as application misconfiguration, unhealthy cluster components and node problems. - GitHub - kubesphere/kubeeye: KubeEye aims to find variou...
ahmetb/kubectl-foreach: Run kubectl commands in all/some contexts in parallel (similar to GNU xargs+parallel)
Run kubectl commands in all/some contexts in parallel (similar to GNU xargs+parallel) - GitHub - ahmetb/kubectl-foreach: Run kubectl commands in all/some contexts in parallel (similar to GNU xargs+...
I’ve noticed a lot of container and Kubernetes users don’t realize there are very real standards orgs in the ecosystem, like OCI, which are vitally important for capability and consistency | Introduction to CRI
FWIW I have uninstalled Telegram on almost all my devices, for some reason it got real spammy | Telegram Discloses Personal Details of Pirating Users Following Court Order * TorrentFreak
Telegram has complied with an order from the High Court in Delhi by sharing the personal details of pirating users with rightsholders.