SRE

SRE

237 bookmarks
Custom sorting
PagerDuty Incident Response Documentation
PagerDuty Incident Response Documentation
A collection of information about the PagerDuty incident response process. Not only how to prepare new employees for on-call responsibilities, but also how to handle major incidents, both in preparation and after-work.
·response.pagerduty.com·
PagerDuty Incident Response Documentation
Monitoring in the Kubernetes era
Monitoring in the Kubernetes era
Learn about the key components in a Kubernetes architecture and how container orchestration changes your approach to monitoring.
·datadoghq.com·
Monitoring in the Kubernetes era
hot-shots
hot-shots
Node.js client for StatsD, DogStatsD, and Telegraf. Latest version: 10.0.0, last published: 2 months ago. Start using hot-shots in your project by running `npm i hot-shots`. There are 502 other projects in the npm registry using hot-shots.
·npmjs.com·
hot-shots
9 insights on real world container use
9 insights on real world container use
Our latest report examines more than 1.5 billion containers run by tens of thousands of Datadog customers to understand the state of the container ecosystem.
·datadoghq.com·
9 insights on real world container use
Datadog On Reliability Engineering
Datadog On Reliability Engineering
There are many different ways to implement Site Reliability Engineering (SRE). From team structures to roles and responsibilities to planning and prioritizat...
·youtube.com·
Datadog On Reliability Engineering
Container Lifecycle Hooks
Container Lifecycle Hooks
This page describes how kubelet managed Containers can use the Container lifecycle hook framework to run code triggered by events during their management lifecycle. Overview Analogous to many programming language frameworks that have component lifecycle hooks, such as Angular, Kubernetes provides Containers with lifecycle hooks. The hooks enable Containers to be aware of events in their management lifecycle and run code implemented in a handler when the corresponding lifecycle hook is executed.
·kubernetes.io·
Container Lifecycle Hooks
On Rake Collections and Software Engineering
On Rake Collections and Software Engineering
Illustration by Furryviza Matthew posted on twitter a metaphor about rakes and software engineering – well, software development but at this point I would argue anyone arguing over these distinctio…
·flameeyes.blog·
On Rake Collections and Software Engineering
SLOconf 2022: Leo Vasiliou- Perform How many Nines Depends on Accumulation
SLOconf 2022: Leo Vasiliou- Perform How many Nines Depends on Accumulation
Meet the powerful analytic for performance-based SLOs. This talk starts with the fact that most teaching SLO discussions focus on using an internal, non-cumulative endpoint (e.g. how many successful GET requests to /API) to illustrate SLO concepts. And arriving at the fact that when it comes to setting SLO for cumulative endpoints (e.g. an app or page consisting of many, distributed requests), determining the number of nines for this objective must be accordingly adjusted to account. In other words, three or four nines may be acceptable for /API. But three or four nines for an experience-based (cumulative) endpoint is not practical. In this session, will discuss the various adjustments needed for experience-based (cumulative) endpoints through both an availability and performance lens. Will further expand on the performance lens and discuss semi-advanced distribution functions for analyzing them – with the ultimate goal being reliable, resilient experiences to better serve self, team, and business.
·youtu.be·
SLOconf 2022: Leo Vasiliou- Perform How many Nines Depends on Accumulation
SLOconf 2022: Stephen Townshend & Gwen De Leon- Defining SLOs When You Dont Know Anything About SLOs
SLOconf 2022: Stephen Townshend & Gwen De Leon- Defining SLOs When You Dont Know Anything About SLOs
In this talk we walk through our SLO definition workshop, a facilitated session that we used at IAG as an experiment to help teams embed customer focus. We talk openly about what did and did not work, and the experimentation and adjustments we made along the way.=
·youtu.be·
SLOconf 2022: Stephen Townshend & Gwen De Leon- Defining SLOs When You Dont Know Anything About SLOs
What does an SRE do?
What does an SRE do?
Are you a software engineering director in charge of some Site Reliability Engineers (SRE) and wondering what they’re doing - or should do? Then read on!
·stanza.systems·
What does an SRE do?
Wassim Chegham 🇲🇦 on Twitter
Wassim Chegham 🇲🇦 on Twitter
“Ever wondered what happens when you type in a URL in an address bar in a browser? Here is a brief overview... #programming #web #sketchnotes”
·twitter.com·
Wassim Chegham 🇲🇦 on Twitter
Observability Anti-Patterns | Lightstep Blog
Observability Anti-Patterns | Lightstep Blog
Avoid committing "crimes against Observability" and get your Observability practice off the ground the right way, by avoiding these common Observability pitfalls!
·lightstep.com·
Observability Anti-Patterns | Lightstep Blog