Social Media Is Probably Doomed – Lauren Weinstein's Blog

Suggested Reads
How to market your dev tool on Hacker News: Learnings from Tailscale
Deep dive into how Tailscale successfully markets its products on Hacker News. And learnings on how to write product articles for the developer audience.
Blog: Kubernetes 1.26: Non-Graceful Node Shutdown Moves to Beta
Author: Xing Yang (VMware), Ashutosh Kumar (VMware)
Kubernetes v1.24 introduced an alpha quality implementation of improvements
for handling a non-graceful node shutdown .
In Kubernetes v1.26, this feature moves to beta. This feature allows stateful workloads to failover to a different node after the original node is shut down or in a non-recoverable state, such as the hardware failure or broken OS.
What is a node shutdown in Kubernetes?
In a Kubernetes cluster, it is possible for a node to shut down. This could happen either in a planned way or it could happen unexpectedly. You may plan for a security patch, or a kernel upgrade and need to reboot the node, or it may shut down due to preemption of VM instances. A node may also shut down due to a hardware failure or a software problem.
To trigger a node shutdown, you could run a shutdown or poweroff command in a shell,
or physically press a button to power off a machine.
A node shutdown could lead to workload failure if the node is not drained before the shutdown.
In the following, we will describe what is a graceful node shutdown and what is a non-graceful node shutdown.
What is a graceful node shutdown?
The kubelet's handling for a graceful node shutdown
allows the kubelet to detect a node shutdown event, properly terminate the pods on that node,
and release resources before the actual shutdown.
Critical pods
are terminated after all the regular pods are terminated, to ensure that the
essential functions of an application can continue to work as long as possible.
What is a non-graceful node shutdown?
A Node shutdown can be graceful only if the kubelet's node shutdown manager can
detect the upcoming node shutdown action. However, there are cases where a kubelet
does not detect a node shutdown action. This could happen because the shutdown
command does not trigger the Inhibitor Locks mechanism used by the kubelet on Linux, or because of a user error. For example, if
the shutdownGracePeriod and shutdownGracePeriodCriticalPods details are not
configured correctly for that node.
When a node is shut down (or crashes), and that shutdown was not detected by the kubelet
node shutdown manager, it becomes a non-graceful node shutdown. Non-graceful node shutdown
is a problem for stateful apps.
If a node containing a pod that is part of a StatefulSet is shut down in a non-graceful way, the Pod
will be stuck in Terminating status indefinitely, and the control plane cannot create a replacement
Pod for that StatefulSet on a healthy node.
You can delete the failed Pods manually, but this is not ideal for a self-healing cluster.
Similarly, pods that ReplicaSets created as part of a Deployment will be stuck in Terminating status, and
that were bound to the now-shutdown node, stay as Terminating indefinitely.
If you have set a horizontal scaling limit, even those terminating Pods count against the limit,
so your workload may struggle to self-heal if it was already at maximum scale.
(By the way: if the node that had done a non-graceful shutdown comes back up, the kubelet does delete
the old Pod, and the control plane can make a replacement.)
What's new for the beta?
For Kubernetes v1.26, the non-graceful node shutdown feature is beta and enabled by default.
The NodeOutOfServiceVolumeDetach
feature gate is enabled by default
on kube-controller-manager instead of being opt-in; you can still disable it if needed
(please also file an issue to explain the problem).
On the instrumentation side, the kube-controller-manager reports two new metrics.
force_delete_pods_total
number of pods that are being forcibly deleted (resets on Pod garbage collection controller restart)
force_delete_pod_errors_total
number of errors encountered when attempting forcible Pod deletion (also resets on Pod garbage collection controller restart)
How does it work?
In the case of a node shutdown, if a graceful shutdown is not working or the node is in a
non-recoverable state due to hardware failure or broken OS, you can manually add an out-of-service
taint on the Node. For example, this can be node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
or node.kubernetes.io/out-of-service=nodeshutdown:NoSchedule . This taint trigger pods on the node to
be forcefully deleted if there are no matching tolerations on the pods. Persistent volumes attached to the shutdown node will be detached, and new pods will be created successfully on a different running node.
kubectl taint nodes node-name node.kubernetes.io/out-of-service=nodeshutdown:NoExecute
Note: Before applying the out-of-service taint, you must verify that a node is already in shutdown
or power-off state (not in the middle of restarting), either because the user intentionally shut it down
or the node is down due to hardware failures, OS issues, etc.
Once all the workload pods that are linked to the out-of-service node are moved to a new running node, and the shutdown node has been recovered, you should remove that taint on the affected node after the node is recovered.
What’s next?
Depending on feedback and adoption, the Kubernetes team plans to push the Non-Graceful Node Shutdown implementation to GA in either 1.27 or 1.28.
This feature requires a user to manually add a taint to the node to trigger the failover of workloads and remove the taint after the node is recovered.
The cluster operator can automate this process by automatically applying the out-of-service taint
if there is a programmatic way to determine that the node is really shut down and there isn’t IO between
the node and storage. The cluster operator can then automatically remove the taint after the workload
fails over successfully to another running node and that the shutdown node has been recovered.
In the future, we plan to find ways to automatically detect and fence nodes that are shut down or in a non-recoverable state and fail their workloads over to another node.
How can I learn more?
To learn more, read Non Graceful node shutdown in the Kubernetes documentation.
How to get involved?
We offer a huge thank you to all the contributors who helped with design, implementation, and review of this feature:
Michelle Au (msau42 )
Derek Carr (derekwaynecarr )
Danielle Endocrimes (endocrimes )
Tim Hockin (thockin )
Ashutosh Kumar (sonasingh46 )
Hemant Kumar (gnufied )
Yuiko Mouri(YuikoTakada )
Mrunal Patel (mrunalp )
David Porter (bobbypage )
Yassine Tijani (yastij )
Jing Xu (jingxu97 )
Xing Yang (xing-yang )
There are many people who have helped review the design and implementation along the way. We want to thank everyone who has contributed to this effort including the about 30 people who have reviewed the KEP and implementation over the last couple of years.
This feature is a collaboration between SIG Storage and SIG Node. For those interested in getting involved with the design and development of any part of the Kubernetes Storage system, join the Kubernetes Storage Special Interest Group (SIG). For those interested in getting involved with the design and development of the components that support the controlled interactions between pods and host resources, join the Kubernetes Node SIG.
Ex-Twitter employee sentenced to over 3 years in prison for spying for Saudi Arabia
Ahmad Abouammo was found guilty for his part in a scheme to acquire the personal information of Twitter users for a Saudi government agent, NBC News reports.
Announcing Rust 1.66.0 | Rust Blog
Empowering everyone to build reliable and efficient software.
Hello, Mastodon
I finally decided to create an account on Mastodon. You can follow me at @jsq@mastodon.social. I put this off for so long because I was skeptical and I did n...
Twitter manually reviewed all accounts that posted links to ElonJet -exec
Twitter Inc's head of trust and safety told Reuters the company manually reviewed "any and all accounts" that violated its new privacy policy by posting links to a Twitter account called ElonJet that tracked Elon Musk's private jet using information in the public domain.
CISA Alert: Veeam Backup and Replication Vulnerabilities Being Exploited in Attacks
U.S. cybersecurity agency CISA has added two critical vulnerabilities in Veeam Backup & Replication software to its list of known exploited flaws.
Big Ideas in Tech for 2023: An a16z Omnibus | Andreessen Horowitz
From entertainment franchise games to the precision delivery of medicines, the a16z team highlights over 40 builder-worthy pursuits for the coming year.
I sold mine because it was far too heavy and exacerbated issues in my arm and elbow. It’s not an accessible device IMHO | Valve: No Performance Upgrades for the Next-Gen Steam Deck
The next generation of Steam Decks will likely focus on better displays and battery life, say designers.
What is the OpenSSF? - Brian Behlendorf, OpenSSF
📢 GitOpsCon relocates to Open Source Summit | OpenGitOps
Tesla Has Been Denied A Retrial Following Multimillion-Dollar Racism Verdict
Tesla's request for a retrial following a Black worker's racism verdict has been denied. Tesla's attempts for a retail were denied by District Judge William Orrick.
Leaving Twitter's Walled Garden | Electronic Frontier Foundation
Introducing tailnet lock: use Tailscale without trusting our infrastructure! · Tailscale
Amazon EKS add-ons: Advanced configuration | Containers
Never did I ever think RSS would live on forever yet at the same time I see how it came to fall out of favor | How to rebuild social media on top of RSS
We should look for ways to make reading, publishing, and community services all play nicely together. I'm calling this model "the unbundled web," and I think RSS should be the primary method of interop.
Building and using a macOS 13.1 VM on Apple silicon from an external drive
Step-by-step guide to installing, configuring and using a Ventura 13.1 virtual machine on an Apple silicon Mac.
GNU/Linux shell related internals | Viacheslav Biriukov
QEMU version 7.2.0 released - QEMU
How Hachyderm leveraged DigitalOcean Spaces to scale their Mastodon community
Interesting | Why using Alpine Docker images and Python is probably bad for your project (right now)
Alpine Linux is a distribution that is designed to be lightweight. In particular, it’s seen a lot of use in Docker images because the resulting image bundles are considerably smaller than those generated by other minimal distros. However, in the context of building a Docker image for a Python application, it’s worth thinking carefully before using Alpine, as it can often result in slower builds and counterintuitively it can even result in larger images occasionally.
Raising the bar for software security: next steps for GitHub.com 2FA | The GitHub Blog
Blog: Kubernetes 1.26: Alpha API For Dynamic Resource Allocation
Authors: Patrick Ohly (Intel), Kevin Klues (NVIDIA)
Dynamic resource allocation is a new API for requesting resources. It is a
generalization of the persistent volumes API for generic resources, making it possible to:
access the same resource instance in different pods and containers,
attach arbitrary constraints to a resource request to get the exact resource
you are looking for,
initialize a resource according to parameters provided by the user.
Third-party resource drivers are responsible for interpreting these parameters
as well as tracking and allocating resources as requests come in.
Dynamic resource allocation is an alpha feature and only enabled when the
DynamicResourceAllocation feature
gate and the
resource.k8s.io/v1alpha1 API group are enabled. For details, see the
--feature-gates and --runtime-config kube-apiserver
parameters .
The kube-scheduler, kube-controller-manager and kubelet components all need
the feature gate enabled as well.
The default configuration of kube-scheduler enables the DynamicResources
plugin if and only if the feature gate is enabled. Custom configurations may
have to be modified to include it.
Once dynamic resource allocation is enabled, resource drivers can be installed
to manage certain kinds of hardware. Kubernetes has a test driver that is used
for end-to-end testing, but also can be run manually. See
below for step-by-step instructions.
API
The new resource.k8s.io/v1alpha1 API group provides four new types:
ResourceClass
Defines which resource driver handles a certain kind of
resource and provides common parameters for it. ResourceClasses
are created by a cluster administrator when installing a resource
driver.
ResourceClaim
Defines a particular resource instances that is required by a
workload. Created by a user (lifecycle managed manually, can be shared
between different Pods) or for individual Pods by the control plane based on
a ResourceClaimTemplate (automatic lifecycle, typically used by just one
Pod).
ResourceClaimTemplate
Defines the spec and some meta data for creating
ResourceClaims. Created by a user when deploying a workload.
PodScheduling
Used internally by the control plane and resource drivers
to coordinate pod scheduling when ResourceClaims need to be allocated
for a Pod.
Parameters for ResourceClass and ResourceClaim are stored in separate objects,
typically using the type defined by a CRD that was created when
installing a resource driver.
With this alpha feature enabled, the spec of Pod defines ResourceClaims that are needed for a Pod
to run: this information goes into a new
resourceClaims field. Entries in that list reference either a ResourceClaim
or a ResourceClaimTemplate. When referencing a ResourceClaim, all Pods using
this .spec (for example, inside a Deployment or StatefulSet) share the same
ResourceClaim instance. When referencing a ResourceClaimTemplate, each Pod gets
its own ResourceClaim instance.
For a container defined within a Pod, the resources.claims list
defines whether that container gets
access to these resource instances, which makes it possible to share resources
between one or more containers inside the same Pod. For example, an init container could
set up the resource before the application uses it.
Here is an example of a fictional resource driver. Two ResourceClaim objects
will get created for this Pod and each container gets access to one of them.
Assuming a resource driver called resource-driver.example.com was installed
together with the following resource class:
apiVersion: resource.k8s.io/v1alpha1
kind: ResourceClass
name: resource.example.com
driverName: resource-driver.example.com
An end-user could then allocate two specific resources of type
resource.example.com as follows:
---
apiVersion : cats.resource.example.com/v1
kind : ClaimParameters
name : large-black-cats
spec :
color : black
size : large
---
apiVersion : resource.k8s.io/v1alpha1
kind : ResourceClaimTemplate
metadata :
name : large-black-cats
spec :
spec :
resourceClassName : resource.example.com
parametersRef :
apiGroup : cats.resource.example.com
kind : ClaimParameters
name : large-black-cats
–--
apiVersion : v1
kind : Pod
metadata :
name : pod-with-cats
spec :
containers : # two example containers; each container claims one cat resource
- name : first-example
image : ubuntu:22.04
command : ["sleep" , "9999" ]
resources :
claims :
- name : cat-0
- name : second-example
image : ubuntu:22.04
command : ["sleep" , "9999" ]
resources :
claims :
- name : cat-1
resourceClaims :
- name : cat-0
source :
resourceClaimTemplateName : large-black-cats
- name : cat-1
source :
resourceClaimTemplateName : large-black-cats
Scheduling
In contrast to native resources (such as CPU or RAM) and
extended resources
(managed by a
device plugin, advertised by kubelet), the scheduler has no knowledge of what
dynamic resources are available in a cluster or how they could be split up to
satisfy the requirements of a specific ResourceClaim. Resource drivers are
responsible for that. Drivers mark ResourceClaims as allocated once resources
for it are reserved. This also then tells the scheduler where in the cluster a
claimed resource is actually available.
ResourceClaims can get resources allocated as soon as the ResourceClaim
is created (immediate allocation ), without considering which Pods will use
the resource. The default (wait for first consumer ) is to delay allocation until
a Pod that relies on the ResourceClaim becomes eligible for scheduling.
This design with two allocation options is similar to how Kubernetes handles
storage provisioning with PersistentVolumes and PersistentVolumeClaims.
In the wait for first consumer mode, the scheduler checks all ResourceClaims needed
by a Pod. If the Pods has any ResourceClaims, the scheduler creates a PodScheduling
(a special object that requests scheduling details on behalf of the Pod). The PodScheduling
has the same name and namespace as the Pod and the Pod as its as owner.
Using its PodScheduling, the scheduler informs the resource drivers
responsible for those ResourceClaims about nodes that the scheduler considers
suitable for the Pod. The resource drivers respond by excluding nodes that
don't have enough of the driver's resources left.
Once the scheduler has that resource
information, it selects one node and stores that choice in the PodScheduling
object. The resource drivers then allocate resources based on the relevant
ResourceClaims so that the resources will be available on that selected node.
Once that resource allocation is complete, the scheduler attempts to schedule the Pod
to a suitable node. Scheduling can still fail at this point; for example, a different Pod could
be scheduled to the same node in the meantime. If this happens, already allocated
ResourceClaims may get deallocated to enable scheduling onto a different node.
As part of this process, ResourceClaims also get reserved for the
Pod. Currently ResourceClaims can either be used exclusively by a single Pod or
an unlimited number of Pods.
One key feature is that Pods do not get scheduled to a node unless all of
their resources are allocated and reserved. This avoids the scenario where
a Pod gets scheduled onto one node and then cannot run there, which is bad
because such a pending Pod also blocks all other resources like RAM or CPU that were
set aside for it.
Limitations
The scheduler plugin must be involved in scheduling Pods which use
ResourceClaims. Bypassing the scheduler by setting the nodeName field leads
to Pods that the kubelet refuses to start because the ResourceClaims are not
reserved or not even allocated. It may be possible to remove this
limitation in the
future.
Writing a resource driver
A dynamic resource allocation driver typically consists of two separate-but-coordinating
components: a centralized controller, and a DaemonSet of node-local kubelet
plugins. Most of the work required by the centralized controller to coordinate
with the scheduler can be handled by boilerplate code. Only the business logic
required to actually allocate ResourceClaims against the ResourceClasses owned
by the plugin needs to be customized. As such, Kubernetes provides
the following package, including APIs for invoking this boilerplate code as
well as a Driver interface that you can implement to provide their custom
business logic:
k8s.io/dynamic-resource-allocation/controller
Likewise, boilerplate code can be used to register the node-local plugin with
the kubelet, as well as start a gRPC server to implement the kubelet plugin
API. For drivers written in Go, the following package is recommended:
k8s.io/dynamic-resource-allocation/kubeletplugin
It is up to the driver developer to decide how these two components
communicate. The KEP outlines an approach using
CRDs .
Within SIG Node, we also plan to provide a complete example
driver that can serve
as a template for other drivers.
Running the test driver
The following steps bring up a local, one-node cluster directly from the
Kubernetes source code. As a prerequisite, your cluster must have nodes with a container
runtime that supports the
Container Device Interface
(CDI). For example, you can run CRI-O v1.23.2 or later.
Once containerd v1.7.0 is released, we expect that you can run that or any later version.
In the example below, we use CRI-O.
First, clone the Kubernetes source code. Inside that directory, run:
$ hack/install-etcd.sh
...
$ RUNTIME_CONFIG = resource.k8s.io/v1alpha1 \
FEATURE_GATES=DynamicResourceAllocation=true \
DNS_ADDON="coredns" \
CGROUP_DRIVER=systemd \
CONTAINER_RUNTIME_ENDPOINT=unix:///var/run/crio/crio.sock \
LOG_LEVEL=6 \
ENABLE_CSI_SNAPSHOTTER=false \
API_SECURE_PORT=6444 \
ALLOW_PRIVILEGED=1 \
PATH=$(pwd)/third_party/etcd:$PATH \
./hack/local-up-cluster.sh -O
...
To start using your cluster, you...
WebAssembly vs. Kubernetes
WebAssembly, or Wasm, was shown to be a very practical way to run code on a web browser, serving as a compiler of sorts. Eventually, it dawned on developers that Wasm could run on server operations systems as well and its use now extends across hardware platforms, leading some to attempt to view it as an alternative to Kubernetes.
Week Ending December 11, 2022
Developer News
Space debris expert: Orbits will be lost—and people will die—later this decade
"Flexing geopolitical muscles in space to harm others has already happened."
Zoë Schiffer on Twitter
NEW: Twitter currently does not have admin access to some of its GitHub repos. These repos contain Twitter source code (much of it is open source; some is not). This includes code for companies Twitter acquired, like Smyte. 1/— Zoë Schiffer (@ZoeSchiffer) December 13, 2022
I think the big takeaway is the US gov’t got a Discord Voice chat of the scheme | SEC says social media influencers used Twitter and Discord to manipulate stocks
The regulatory agency charged them in what it says was a $100 million securities fraud scheme run by people who portrayed themselves as successful stock traders.
Elon Musk is using the Twitter Files to discredit foes and push conspiracy theories
The Twitter CEO's selective release of internal communications largely corroborate what is already known about the messy business of policing a large social network.