SRE

240 bookmarks

Custom sorting

OpenSLO/openslo-backstage-plugins: Backstage plugins for OpenSLO

Backstage plugins for OpenSLO. Contribute to OpenSLO/openslo-backstage-plugins development by creating an account on GitHub.

#open_slo #slo

·github.com·Oct 14, 2022

OpenSLO/openslo-backstage-plugins: Backstage plugins for OpenSLO

OpenSLO/slogen: tool to create and manage content for reliability tracking from logs/event data.

tool to create and manage content for reliability tracking from logs/event data. - OpenSLO/slogen: tool to create and manage content for reliability tracking from logs/event data.

#slo #open_slo #tools

·github.com·Oct 14, 2022

OpenSLO/slogen: tool to create and manage content for reliability tracking from logs/event data.

/bin/bash based SSL/TLS tester: testssl.sh

TLS/SSL security testing with Open Source Software

#tools #cli #ssl/tls

·testssl.sh·Oct 12, 2022

/bin/bash based SSL/TLS tester: testssl.sh

dastergon/awesome-sre: A curated list of Site Reliability and Production Engineering resources.

A curated list of Site Reliability and Production Engineering resources. - dastergon/awesome-sre: A curated list of Site Reliability and Production Engineering resources.

#awesome_list #sre

·github.com·Oct 11, 2022

dastergon/awesome-sre: A curated list of Site Reliability and Production Engineering resources.

iximiuz/pq

Parse and Query log files as time series. Contribute to iximiuz/pq development by creating an account on GitHub.

#monitoring #query #logging #log

·github.com·Jul 9, 2021

iximiuz/pq

How To Read The SSL Certificate Info From the CLI

This guide will show you how to read the SSL Certificate Information from a text-file on your server or from a remote server by connecting to it with the OpenSSL client.

#ssl/tls #tools #cli

·ma.ttias.be·Oct 7, 2021

How To Read The SSL Certificate Info From the CLI

How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning @Uber)

Introduction As part of Uber engineering’s wide efforts to reach profitability, recently our team was focused on reducing cost of compute capacity by improving efficiency. Some of the most impactful work was around GOGC optimization. In this blog we want to share our experience with a highly effective, low-risk, large-scale, semi-automated Go GC tuning mechanism. Uber’s tech stack is composed of thousands of microservices, backed by a cloud-native, scheduler-based infrastructure. Most of these services are written in Go. Our team, Maps Production Engineering, has previously played an instrumental role in significantly improving the efficiency of multiple Java services by tuning

#golang #gc #performance

·eng.uber.com·Jan 2, 2022

How We Saved 70K Cores Across 30 Mission-Critical Services (Large-Scale, Semi-Automated Go GC Tuning @Uber)

Day 23 - What is eBPF?

By: Ania Kapuścińska ( @lambdanis ) Edited by: Shaun Mouton ( @sdmouton ) Like many engineers, for a long time I’ve thought ...

#ebpf #bpf #linux

·sysadvent.blogspot.com·Jan 2, 2022

Day 23 - What is eBPF?

victoria.dev

#sqlite #production

·victoria.dev·Jan 29, 2022

victoria.dev

Appropriate Uses For SQLite

#production #sqlite

·sqlite.org·Jan 29, 2022

Appropriate Uses For SQLite

Kit “SLOconf is May 9-12 2022" Merker on Twitter

I've had a repeated conversation recently about SLO Adoption. The question I get is "Which services should I start with?"And there is a counterintuitive idea I want to share. 🧵— Kit “SLOconf is May 9-12 2022" Merker (@KitMerker) March 17, 2022

#slo

·twitter.com·Mar 18, 2022

Kit “SLOconf is May 9-12 2022" Merker on Twitter

James Eastham on Twitter

Finally.Trace of a request into my #serverless event driven system, API Gateway - Dynamo - Dynamo Streams - Lambda - SQS x 2 - Event Bridge. One consistent trace through the entire flow.Written in .NET, traced with @opentelemetry, observed in @honeycombio #dotnet #o11y pic.twitter.com/YfvpAYPiTD— James Eastham (@plantpowerjames) October 8, 2022

#observability

·twitter.com·Oct 8, 2022

James Eastham on Twitter

How HashiCorp Does Site Reliability Engineering - The New Stack

The company's SRE journey started three years ago, and it now has reliability teams focused on infrastructure, products and developer productivity. #SRE #reliability

·thenewstack.io·Oct 8, 2022

How HashiCorp Does Site Reliability Engineering - The New Stack

Isolates, microVMs, and WebAssembly

I’ve been thinking about WebAssembly a lot lately.

#webassembly #wasm #cloud

·notes.crmarsh.com·Sep 29, 2022

Isolates, microVMs, and WebAssembly

Signals · mperham/sidekiq Wiki

TSTP (get ready to be shutdown). TERM (heroku default is 25 seconds, elsewhere defined by config / timeout switch).

#timeout #magic_numbers

·github.com·Sep 27, 2022

Signals · mperham/sidekiq Wiki

Introducing LiteFS

We are building a distributed file system for your SQLite databases. Kinda weird, huh?

#sqlite #performance #database

·fly.io·Sep 22, 2022

Introducing LiteFS

Felix Geisendörfer on Twitter

🎉 Announcing fgtrace, a new profiler/tracer for #golang.It captures wallclock timeline views for each goroutine and it's really simple to use:defer fgtrace.Config{}.Start().Stop()Check it out & let me know what you think https://t.co/Ttdm5hl0Vi pic.twitter.com/4iP9SNVypD— Felix Geisendörfer (@felixge) September 19, 2022

#profilers #datadog #golang

·twitter.com·Sep 21, 2022

Felix Geisendörfer on Twitter

Profile Data Formats

Profiler's output file formats.

#profilers #debug

·profilerpedia.markhansen.co.nz·Sep 21, 2022

Profile Data Formats

What is eBPF? | An Introduction and Practical Tips

Addr：https://ebpf.xyz/post/an_introduction_and_practical_tips March 23, 2022 This article introduces developers to eBPF and explains how it can be used to add security, networking, and other capabilities in the Linux kernel space. In Linux architecture, memory is separated into kernel space and user space. The kernel space is used to run the core kernel code and the device drivers. Processes running in kernel space have unrestricted access to all hardware, including CPU, memory, and disks.

#ebpf

·ebpf.xyz·Sep 21, 2022

What is eBPF? | An Introduction and Practical Tips

From Critical User Journey to SLO/SLIs

We often look at a Service and wonder where to start monitoring the thing let alone what the SLOs should be. Critical User Journeys help…

#slo #sla #critical_user_journeys

·medium.com·Sep 12, 2022

From Critical User Journey to SLO/SLIs

GitHub - res-eng/resilience-for-software: Introduction to resilience engineering concepts for software engineers

Introduction to resilience engineering concepts for software engineers - GitHub - res-eng/resilience-for-software: Introduction to resilience engineering concepts for software engineers

#resiliency_engineering

·github.com·Sep 10, 2022

GitHub - res-eng/resilience-for-software: Introduction to resilience engineering concepts for software engineers

Hollnagel: What is Resilience Engineering? - Resilience Engineering Association

From Erik Hollnagel. ” A system is resilient if it can adjust its functioning prior to, during, or following events (changes, disturbances, and opportunities), and […]

#resiliency_engineering

·resilience-engineering-association.org·Sep 10, 2022

Hollnagel: What is Resilience Engineering? - Resilience Engineering Association

Resilience Engineering and Strange Loops

My notes and takeaways from a long read on anomalies and system complexity called the STELLA Report from the SNAFUcatchers Workshop on Coping With Complexity, 2017. Via Matt. This paper is one of t…

#resiliency_engineering

·sensible.blog·Sep 10, 2022

Resilience Engineering and Strange Loops

Who Destroyed Three Mile Island? - Nickolas Means | #LeadDevLondon 2018

Check out the latest from The Lead Developer at theleaddeveloper.com. On March 28, 1979, at exactly 4 o’clock in the morning, control rods slammed into the reactor core of Three Mile Island Unit #2, halting the nuclear reaction because of a fault in the reactor cooling system. At 4:02, the automated emergency cooling system activated as the reactor core temperature continued to rise. At 4:04, one of the plant operators made the befuddling decision to switch off the emergency cooling system, dooming the reactor to partial meltdown. Why? When something bad happens, it’s easy to just blame someone and move on. Taking the time to find the systemic causes, though, will not only help keep the problem from repeating, it will enable you to build the psychological safety necessary for your team to truly collaborate. Let’s let the story of Three Mile Island teach us how to make our teams stronger through systems thinking and just culture.

#human_error #failure #human_factor #incident_management

·youtu.be·Sep 9, 2022

Who Destroyed Three Mile Island? - Nickolas Means | #LeadDevLondon 2018

SLI, SLO, SLA explained in a way your kids will understand… maybe

Imagine you are in a remote meeting using terms like SLI, SLO, or SLA, and your kid asks you what it means? How would you explain it to them? Or maybe you need to explain it to your boss or a colleague. In this article, I will try to put SLI, SLO, and SLA in a way even your kids would understand… maybe.

#analogy #slo #sli #sla

·thrownewexception.com·Sep 6, 2022

SLI, SLO, SLA explained in a way your kids will understand… maybe

[PUBLIC] The Art of SLOs – Slides

Self link: https://cre.page.link/art-of-slos-slides Participant Handbook: https://cre.page.link/art-of-slos-handbook Facilitator Handbook: https://cre.page.link/art-of-slos-howto SLO Worksheet: https://cre.page.link/art-of-slos-worksheet Errors in the content? https://cre.page.link/art-of-slos-bu...

#reference #slo #sli

·docs.google.com·Sep 6, 2022

[PUBLIC] The Art of SLOs – Slides

Focus on Readiness: It's a Good Day for a Game Day! - DZone DevOps

Learn about Game Days, a DevOps method for software development and testing teams to build up their readiness to achieve better Incident Management.

#gameday #resiliency #readiness

·dzone.com·Sep 6, 2022

Focus on Readiness: It's a Good Day for a Game Day! - DZone DevOps

Chaos Gamedays: A Step-by-Step Guide to Chaos - DZone DevOps

Chaos engineering is an implementation of testing the robustness of your application and the readiness of your team in handling application failures.

#gameday #resiliency #readiness

·dzone.com·Sep 6, 2022

Chaos Gamedays: A Step-by-Step Guide to Chaos - DZone DevOps

Improving Incident Management through Role Assignments and Game Days

John Arundel, principal consultant at Bitfield Consulting, shared his thoughts on how to ensure incidents are handled smoothly and quickly. He suggests assigning specific roles to each team member responding to the incident. Red team versus blue team exercises can also be leveraged to ensure the team is prepared to respond accurately and quickly.

#gameday #resiliency #readiness

·infoq.com·Sep 6, 2022

Improving Incident Management through Role Assignments and Game Days

EventCatalog: Discover, Explore and Document your Event Driven Architectures.

An open source tool powered by markdown to document your Event Driven Architecture.

#servicecatalogue #events

·eventcatalog.dev·Sep 5, 2022

EventCatalog: Discover, Explore and Document your Event Driven Architectures.