Technology Commentary

Technology Commentary

8215 bookmarks
Custom sorting
How Generative AI Can Support DevOps and SRE Workflows
How Generative AI Can Support DevOps and SRE Workflows
Of course, engineers love new tech. But how can they harness the revolution in AI for their day-to-day work, right now? Here are six ideas.
·thenewstack.io·
How Generative AI Can Support DevOps and SRE Workflows
Heat-It: Because Removing Itches Is So Hot Right Now
Heat-It: Because Removing Itches Is So Hot Right Now
Lessons on a summer with a smartphone-enabled itch-killing device. Yes—it works, even if a smartphone is a weird vessel for the Heat-It.
·tedium.co·
Heat-It: Because Removing Itches Is So Hot Right Now
Platform Engineering Helps a Scale-up Tame DevOps Complexity
Platform Engineering Helps a Scale-up Tame DevOps Complexity
Capillary Technologies experienced growing pains as it doubled its customer base and moved into new markets. A partnership with Facets.cloud helped improve its uptime and developer productivity.
·thenewstack.io·
Platform Engineering Helps a Scale-up Tame DevOps Complexity
Formal Methods can't fix everything and that's okay
Formal Methods can't fix everything and that's okay
Finally done with conferences and glad to be back in Chicago. Next big event is the TLA+ workshop on October 16, which still has slots available! The faster...
·buttondown.email·
Formal Methods can't fix everything and that's okay
On APIs and their responses - Dmitry Kudryavtsev
On APIs and their responses - Dmitry Kudryavtsev
Since the dawn of the web, humans created CRUD APIs. And we were instructed that modification verbs should return the modified resource in response. But, should they?
·yieldcode.blog·
On APIs and their responses - Dmitry Kudryavtsev
Trust but Verify: To Get AI Right, Its Adoption Requires Guardrails
Trust but Verify: To Get AI Right, Its Adoption Requires Guardrails
To responsibly adopt AI, organizations must look for ways to align it with their goals, while also considering what updates to security and privacy policies may be required.
·thenewstack.io·
Trust but Verify: To Get AI Right, Its Adoption Requires Guardrails
70 The Mars Rover Isn't Stealing Our Data, with Janet Vertesi - Initiative for Digital Public Infrastructure
70 The Mars Rover Isn't Stealing Our Data, with Janet Vertesi - Initiative for Digital Public Infrastructure
Today on Reimagining, we welcome our first conscientious objector to Google—and our first ever NASA alum. Janet Vertesi joins for a fascinating conversation about her project to keep any data about her children off the web, and ties it in to tales about her old job as in-house ethnographer for the Mars Rover missions.
·publicinfrastructure.org·
70 The Mars Rover Isn't Stealing Our Data, with Janet Vertesi - Initiative for Digital Public Infrastructure
Paper: The Strategic Agility Gap
Paper: The Strategic Agility Gap
This week I read David Woods' The Strategic Agility Gap: How Organizations Are Slow and Stale to Adapt in Turbulent Worlds [https://link.springer.com/chapter/10.1007/978-3-030-25639-5_11], an open access chapter that sort of surveys and puts together a lot of the concepts he has written about in the past, particularly around the need of organizations to balance growth in capabilities with the ability to adjust to the changes they enable. The idea here is that growths in capability—often due to better technology—brings rapid changes at a societal level: new opportunities are found, complexity grows, and new threats emerge. New capabilities generally mean growth, expansion, bigger scales, and more interactions, which means more surprises. On the other hand, organizations are generally slow and stale when it comes to adapting to these threats or to seize these opportunities: As capability grows to improve performance on some criteria, interdependencies become more extensive and produce surprising anomalies as the systems also become more brittle. The strategic agility gap is the difference between the rate at which an organization adapts to change and the rise of new unexpected challenges at a larger industry/society scale. It is a mismatch in velocities of change and velocities of adaptation. This figure is attached: Figure 1: The strategic agility gap. A graph where the x-axis is technical progress (and time) and the y-axis quality/impact. An arrow curving upwards and to the right is labelled 'pace of societal change/needs'; one in the same direction but curving far more smoothly below it is labelled 'current trajectory'. In between both arrows is an 'accelerated trajectory' dotted arrow curving upwards. The space between the bottom and middle arrow is the strategic agility; the space between the middle arrow and the top arrow is the strategic agility gap. [https://s3.us-east-2.amazonaws.com/ferd.ca/cohost/strategic-ability-gap.png] Because the risks are difficult to see ahead, and that the growth is continuous, there's a risk of cascade to disturbances and challenges; this requires anticipating challenges and building a "readiness-to-response" to avoid having to generate and deploy them while the challenge is taking place. Here the text seems to intent something different from just having a plan for specific challenges; the words used are "organizations need to coordinate and synchronize activities over changing tempos, otherwise decisions will be slow and stale". This hints at overall response patterns and reorganization more than having a runbook with specific scenarios. To provide an example of a failing and a successful case, Woods covers the Knight Capital Collapse from 2012 [https://michaelhamilton.quora.com/How-a-software-bug-made-Knight-Capital-lose-500M-in-a-day-almost-go-bankrupt] (other great link [https://www.kitchensoap.com/2013/10/29/counterfactuals-knight-capital/]) and of a transport company dealing with Hurricane Sandy [https://journals.sagepub.com/doi/10.1177/1541931213571072] (illegal source [https://sci-hub.ru/10.1177/1541931213571072]). In the case of Knight Capital, they rolled out code that reused an old feature flag that had been repurposed, and the deployment failed on a single out of eight servers. When it went live, it produced unexpected behavior that ran more transactions than expected; rolling it back produced even more anomalous behavior due to the flag. People involved struggled to understand the issue. Woods mentions that it took a while before upper management was informed and then authorized to stop trading. By then, it had been less than an hour, but it was too late and the company went bankrupt from their now untenable market position. The author picked it as an example that shows that: * small problems interact and can escalate quickly * as effects cascade, roles struggle to understand the situation and figure out how to react * non-routine responses are more difficult to get authorization for * this requires more coordination which slows things down while effects still amplify * response can't keep pace with events, particularly when communications are serialized vertically through the organization The comparative case of a large transportation firm that reconfigured itself during hurricane Sandy has the following elements named behind their effective adaptation. Quoted literally from the text, they: * re-prioritized over multiple conflicting goals, * sacrificed cost control processes in the face of safety risks, * valued timely responsive decisions and actions, * coordinated horizontally across functions to reduce the risk of missing critical information or side effects when replanning under time pressure, * controlled the cost of coordination to avoid overloading already busy people and communication channels, * pushed initiative and authority down to the lowest unit of action in the situation to increase the readiness to respond when unanticipated challenges arose. This, Woods mention, helped balance what is called the efficiency-thoroughness tradeoff. Also noted ETTO [https://en.wikipedia.org/wiki/Efficiency%E2%80%93thoroughness_trade-off_principle], this is a principle that states that needs for safety tend to reduce efficiency, and demands for productivity tend to reduce thoroughness. All of these are because people are limited on time and these two values are in tension. Specifically, they sacrificed economics and standard processes to keep up with events, by using patterns that existed within the organization already given adapting to surprises was a normal experience. In comparing both cases, the author mentions that following plan is not enough in these situations. There's a need for anticipation and initiative, particularly when events challenge existing plans. The difference between both organizations is that for the transportation company: From facing surprises in the past, the varying roles/levels had opportunities to exercise their coordinative ‘muscles,’ even though this specific event presented unique difficulties. In the strategic agility gap, the challenge for organizations is to develop new forms of coordination across functional, spatial, and temporal scales—otherwise organizations will be slow, stale and fragmented as they inevitably confront surprising challenges. While I personally feel the time scales between cases are very different for the comparison, they probably do a decent job of demonstrating the types of behaviors on each side of the accelerated trajectory line. The paper shifts toward a "Systems are messy" section, recalling the wold WWII term SNAFU, standing for "Situation Normal: All Fucked Up". Standard plans inevitably break down, and some people in some roles do "SNAFU catching", often in hard to detect manners: all organizations are adaptive systems, consist of a network of adaptive systems, and exist in a web of adaptive systems—i.e., the resilience engineering paradigm. All human adaptive systems make trade-offs to cope with finite resource and all live in a changing world. The pace of change is accelerated by past successes, as growth stimulates more adaptation by more players in a more interconnected system. The point here is that operating within the strategic agility gap is unavoidable. Organizations love to rationalize this away: * Since SNAFUs occur rarely, this is a low priority issue * There's a record of improvement that reduces the challenge SNAFUs represent * Poor response when SNAFUs occur is due to people who fail to follow the plan and design Woods states directly that these rationalisations are wrong empirically, technically, and theoretically. When framing surprises as deviations from the established plan, the compliance pressure that follows undermines the system's adaptive capacities. The background of improvements and a sudden collapse surprises and confuses people within the system. The argument here is that this is normal: as scale and interdependencies increase, performance increases, but so does the proportion of large collapses and failures. The Resilience Engineering statement here is that what we shouldn't be surprised by the failures, but by how few of them we have. One of Woods favorite laws is the fluency law, which states: well adapted activity occurs with a facility that belies the difficulty of the demands resolved and the dilemmas balanced. The reason we see so few failures is that adapting to SNAFUs continually takes place, and that it is nearly invisible. It is, in fact, one of the tenets of resilience engineering. Past successes in these situations drive effective leaders to take advantage of improvements and drive the systems to do even more, and this creates adaptive cycles which accelerate the strategic gap. Organizations end up living in that strategic agility gap, and to thrive in there they need to develop and sustain the ability to continuously adapt. Resilience Engineering researchers turn to web operations in order to study this: outages and near-misses are incredibly common even in the best organizations, and things change so fast that they provide a great laboratory to study constraints and shifting opportunities and risks. The key ingredients identified are: * anticipation: seeing signs of trouble and starting adaptation before it becomes definitive * contingent synchronization: based on pacing, roles at different levels coordinate differently * readiness to respond: developing and mobilizing response capability before surprises * proactive learning: studying how surprises are caught and resolved before major collapses or accidents To express and apply initiatives, there's a need to push it down closer to action; this can be miscalibrated in a way that fragments efforts and makes units work at cross-purposes. Since we can't just enforce plans harder, resilience engineering seeks system architectures that can adjust the expres
·cohost.org·
Paper: The Strategic Agility Gap
Why I Pay for Email (and Domains)
Why I Pay for Email (and Domains)
In a world where you can get free email accounts seemingly anywhere, I recently decided to pay for an email service. This doesn’t mean that I don’t still have Gmail, Microsoft/Hotmail, …
·genxjamerican.com·
Why I Pay for Email (and Domains)
The Future of Remote Work
The Future of Remote Work
The remote discussion is complex and hard to discuss rationally. In this post we discuss the future of remote work, and provide a framework for thinking through the remote revolution
·staysaasy.com·
The Future of Remote Work
A Brief, Incomplete and Mostly Wrong Devops Glossary
A Brief, Incomplete and Mostly Wrong Devops Glossary
You’ve seen them—the pristine glossaries, endorsed by industry titans like the CNCF, with terms that sound like they’re straight out of a sci-fi mo...
·earthly.dev·
A Brief, Incomplete and Mostly Wrong Devops Glossary