Software engineering

142 bookmarks

Newest

AsyncAPI Initiative for event-driven APIs

·asyncapi.com·Mar 23, 2024

AsyncAPI Initiative for event-driven APIs

Domain analysis for microservices - Azure Architecture Center

·learn.microsoft.com·Mar 20, 2024

Domain analysis for microservices - Azure Architecture Center

`async: false` is the worst. Here's how to never need it again.

Elixir #tests

·saltycrackers.dev·Mar 13, 2024

`async: false` is the worst. Here's how to never need it again.

Hardhat — hardhat v1.0.2

Yet Another HTTP Client lmao

Elixir

·hexdocs.pm·Mar 12, 2024

Hardhat — hardhat v1.0.2

Bluefin

·projectbluefin.io·Mar 1, 2024

Bluefin

Turb(l)o(g) » Fabriquer son internet

Network

·blog.spyou.org·Mar 1, 2024

Turb(l)o(g) » Fabriquer son internet

A powerful, flexible, Markdown-based authoring framework

·markdoc.dev·Feb 23, 2024

A powerful, flexible, Markdown-based authoring framework

Fast and flexible observability with canonical log lines

Logs that aggregate every metadata possible and that are generated for each requests, if possible at the end of the request's handling (?)

·stripe.com·Feb 23, 2024

Fast and flexible observability with canonical log lines

Designing accessible color systems

Frontend

·stripe.com·Feb 23, 2024

Designing accessible color systems

tompave/fun_with_flags: Feature Flags/Toggles for Elixir

Elixir

·github.com·Feb 20, 2024

tompave/fun_with_flags: Feature Flags/Toggles for Elixir

Friday Deploy Freezes Are Exactly Like Murdering Puppies

Process

Never accept a diff if there’s no explanation for the question, “how will you know when this code breaks? how will you know if the deploy is not behaving as planned?” Instrument every commit so you can answer this question in production.

DevOps

·charity.wtf·Feb 20, 2024

Friday Deploy Freezes Are Exactly Like Murdering Puppies

Surviving Continuous Deployment in Distributed Systems

DevOps #distributed programming

·oooops.dev·Feb 16, 2024

Surviving Continuous Deployment in Distributed Systems

Karpenter

DevOps

·karpenter.sh·Feb 16, 2024

Karpenter

(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup

Karpenter

DevOps

·cep.dev·Feb 16, 2024

(Almost) Every infrastructure decision I endorse or regret after 4 years running infrastructure at a startup

Software Sprawl, The Golden Path, and Scaling Teams With Agency

·charity.wtf·Feb 2, 2024

Software Sprawl, The Golden Path, and Scaling Teams With Agency

How We Use Golden Paths to Solve Fragmentation in Our Software Ecosystem - Spotify Engineering

Clearly defined audience

One main purpose

But primarily, we write with new engineers in mind, which ensures the instructions are clear for everyone.

the best way to do something now (versus how they did it five years ago).

“The Golden Path is the opinionated and supported path to build your system and the Golden Path tutorial walks you through this path.”

Step-by-step-by-step

Also, step-by-step is a good way to get clear sight on how long your actual Golden Path is.

The tutorials just reflect the actual Golden Path. That’s what we need to shorten.

we need to make the actual Golden Path easier to follow and with fewer steps.

And the Golden Path tutorials are also about education. So, yes, we need to make it easier for engineers to build stuff — but it’s important to keep in mind that the tutorials are there to educate (in particular, new hires).

Going by the data, the Golden Path tutorials are Spotify’s most read and most used technical documentation.

and you only have a limited time for working on documentation — work on that.

they will use the set of tutorials as the basis for building an end-to-end product.

And, of course, it’s not just feedback on the tutorial — it’s feedback on the Golden Path itself

Improved ownership model

A Golden State is a list of checks that engineers can use to know if their systems are following the Golden Path.

End-to-end product Golden Path tutorial The way that the tutorials are set up currently is to serve individual contributors. But individual contributors are in teams and teams build products (features, run experiments, company bets) and are multi-disciplinary. It is a challenging task, but we are definitely excited by the idea of creating a series of Golden Path tutorials for teams that provide blueprints for building various types of products.

·engineering.atspotify.com·Feb 2, 2024

How We Use Golden Paths to Solve Fragmentation in Our Software Ecosystem - Spotify Engineering

Measuring Developer Productivity: Real-World Examples

You can see there’s a wide range of metrics in use, including:Ease of Delivery (Amplitude, GoodRx, Intercom, Postman, Lattice)Experiment Velocity (Etsy)Stability of Services / Apps (DoorDash)SPACE metrics (Microsoft)Weekly focus time per engineer (Uber)

Developer Net User Satisfaction (NSAT) measures how happy developers are overall with LinkedIn’s development systems. It’s measured on a quarterly basis.Developer Build Time (P50 and P90) measures in seconds how long developers spend waiting for their builds to finish locally during development.Code Reviewer Response Time (P50 and P90) measures how long it takes, in business hours, for code reviewers to respond to each code review update from the author.Post-Commit CI Speed (P50 and P90) measures how long it takes, in minutes, for each commit to get through the continuous integration (CI) pipeline.CI Determinism is the opposite of test flakiness. It’s the likelihood a test suite’s result will be valid and not a flake.Deployment Success Rate measures how often deployments to production succeed.

“Even if the quantitative metrics say that everyone’s builds are fantastic, if developers are saying ‘I hate my builds,’ you should probably listen to that.“ – Grant Jenks, Senior Tech Lead for the developer insights platform

“We don't track the Developer Experience Index over time. We reserve the right to change the aggregation and the weightings behind it at any time. We tell people not to ever put this metric into an OKR, either.”

Engagement: Developer Satisfaction Score Velocity: Time to 1st and 10th PR for all new hires, Lead Time, Deployment Frequency Quality: % of PRs under 250 lines, Line Coverage, Change Failure RateStability: Time to Restore Services

Ease of Delivery (moveable). Most of these companies measure ease of delivery; a qualitative measure of how easy or difficult developers feel it is to do their job.

2. Engagement. Most of these companies also track engagement, a measure of how excited and stimulated developers feel about their work. While engagement is commonly measured in HR engagement surveys, DevProd teams also cited focusing on Engagement for these reasons:Developer engagement and productivity are closely linked. In other words, “happy developers are productive developers,” and so developer engagement can be viewed as an indicator of productivity. A real benefit of measuring engagement is to counterbalance other metrics which emphasize speed. Delivering software faster is good, but not at the expense of developer happiness decreasing.

3. Time Loss (moveable). GoodRx and Postman pay attention to the average amount of lost time. This is measured by the percentage of developers’ time lost to obstacles in the work environment. This metric is similar to ease of delivery, in that it provides DevProd teams a moveable metric which their work can directly impact.

This metric can be translated into dollars: a major benefit! This makes Time Loss easy for business leaders to understand. For example, if an organization with $10M in engineering payroll costs reduces time loss from 20% to 10% through an initiative, that translates into $1M of savings.

4. Change Failure Rate. This is one of the four key metrics from the DORA research program. It’s a top-level metric tracked by several companies, including Amplitude and Lattice. The DORA team defines the change failure rate like this:

“The percentage of changes to production or releases to users result in degraded service (for example, lead to service impairment or service outage) and subsequently require remediation (for example, require a hotfix, rollback, fix forward, patch).” Lattice measures change failure rate as the number of PagerDuty incidents divided by the number of deployments. Amplitude measures it as the P0s (priority zeros) – the most important priorities – over production deploys. The P0 count goes through PagerDuty, and the deploy count is from their continuous delivery service, Spinnaker.

A big emphasis on “focus time”I was surprised by how many companies track “focus time” as a top-level metric. Although research has shown “deep work” to be an important factor in developer productivity, I didn’t expect as much attention on it as I found. Stripe and Uber shared specific metrics, such as “Number of Days with Sufficient Focus Time,” and “Weekly Focus Time Per Engineer, while other companies mentioned deep work as a topic they measure in their developer survey programs. The Pragmatic Engineer previously covered how Uber measures engineering productivity.

Adoption Rate (DoorDash, GoodRx, and Spotify.)

Design Docs Generated per Engineer (Uber.)

Experiment Velocity (Etsy.)

Metrics include how many experiments start each week, how many have been stopped, and how many have a positive hit rate. For context, the ultimate goal is to measure learning velocity.

I always recommend borrowing Google’s Goals, Signals, Metrics (GSM) framework to help guide metric selection. Too often, teams jump to metrics before thinking through what they actually want to understand or track. The GSM framework can help teams identify what their goal is, and then work backwards to pick metrics that serve this.

“We always encourage people to follow the goal, signals, metrics approach. We ask them to first write down your goals. What is your goal for speed? What is your goal for ease? What's your goal for quality? Write those down first and then ask your question of: ‘what are the signals that would let you know that you've achieved your goal?’ Regardless of whether they're measurable. Signals are not metrics. What would be true of the world if you've achieved your goal? At that point, try to figure out what are the right metrics.”

Start by defining your charter. Why does your DevProd team exist? Here are three examples of DevProd team charters:Google: “Make it fast and easy for developers to deliver great products.” Slack: “Make the development experience seamless for all engineers”Stripe: “Make software engineering easier.”

How easy it is for developers to deliver softwareHow quickly developers deliver softwareThe quality of software deliveredFor each category, define metrics to help track how it’s going. For example:Speed = Perceived Delivery Speed, Perceived Productivity Ease = Ease of Delivery, Deployment Lead Time, Build Failure Rate Quality = Incident frequency, Perceived Software Quality

Use similar top-level metrics for your DevProd team to convey the value and impact of your efforts. With the right metrics, you can keep everyone aligned within and outside of your team.

Operational metrics include:Developer satisfaction with specific toolsAdoption rate of a particular serviceGranular measurements of developers’ workflows.… and many others!

1. Business impact. You should report on current or planned projects, alongside data that addresses questions like: Why are these the right things to build now? How does this project make the business money, or otherwise support its goals?Is this project on track or delayed?

2. System performance.

3. Engineering effectiveness

·newsletter.pragmaticengineer.com·Feb 2, 2024

Measuring Developer Productivity: Real-World Examples

Maxjourney: Pushing Discord’s Limits with a Million+ Online Users in a Single Server

Elixir

·discord.com·Feb 2, 2024

Maxjourney: Pushing Discord’s Limits with a Million+ Online Users in a Single Server

How Discord Creates Insights from Trillions of Data Points

Data science & engineering

·discord.com·Feb 2, 2024

How Discord Creates Insights from Trillions of Data Points

Consistent Hash Rings Explained Simply

Consistent hash rings are beautiful structures, yet often poorly explained. Implementations tend to focus on clever language-specific tricks, and theoretical approaches insist on befuddling it with math and tangents irrelevant. This is an attempt at explanation - and a Python implementation - accessible to an ordinary high-schooler.

#distributed programming

·akshatm.svbtle.com·Jan 27, 2024

Consistent Hash Rings Explained Simply

Efficient Software Project Management at its Roots

Clarity from the start Milestones that are directional Transparency on an ongoing basis Dependency and Risk Management in a pragmatic way

A common tool I see used is via a project kickoff with all stakeholders present and involved.

Given this is an expensive meeting, the person leading the meeting typically prepares an overview about the background ("why"), goals ("what") suggested approach ("how") and end state. Showing off designs or other visuals is a great thing at this stage in order to get through to people who are more visual than textual types.

"Does everyone understand why we are doing this project, how we will get there and what your role will be to help? Raise your hand if the answer is "no" or "maybe" to any of these."

However, people are teams are often late to admit to stakeholders when they come across trouble that they cannot fully mitigate.

A very simple tool I see help in creating this transparency is having a regular, no-BS update on where the team really is.

All team members are part in delivering/writing this update. This serves as a continuous reality check on what is actually happening, close to where the real work is going on: the engineers themselves.

In a team where everyone is aware of how good (or bad) things are going, people will pick up work that can help the team the most.

A lot of business stakeholders don't have much understanding or appreciation of what is easy or hard about software development. By exposing them to more granular details and helping them understand what tradeoffs the team is continuously making helps build empathy and more realistic expectations on both ends.

poor dependency management

This is a symptom of poor risk management.

People practicing things like Scrum or Kanban to also think a lot less of these areas and don't do it early enough.

Discovery. Figure out who your dependencies are and what they need to do. Agreement. Talk to them and agree what they will do and by when. Check-in before the due date. For teams that you don't have a good track record working with, do more frequent check-ins, to make sure they are on track. Give feedback and/or escalate. Once the work is complete, give feedback. If your dependent team did a great job, call this out clearly. If they did not do a good job, consider understanding why. If they are really late, consider escalating earlier, rather than later.

For risk management, have a culture that rewards raising concerns early on and be pragmatic in tradeoffs to mitigate risk.

So when they come across a problem, they see a challenge to solve, not a potential delay to the project.

To tackle this, create a culture of talking about interesting challenges coming up on a day to day basis. Start rewarding people who flag things that might take longer to do and bring tradeoffs to the table.

Whenever risk comes up, consider reacting to it ahead of time.

When we eventually do get things done, looking back and figuring out where we can do better next time is a key part of individual and team growth. And of course, celebrating a big achievement

This last step really helps build a cohesive team who will be ready and hungry to deliver on the next, more complex project, in a way even better than this last one.

Org

·blog.pragmaticengineer.com·Jan 27, 2024

Efficient Software Project Management at its Roots

An Engineering Team where Everyone is a Leader

A group where every member has the skills, confidence, and empowerment to take initiative, make decisions, and lead others.

As an engineering manager, I am the one accountable and responsible for my team delivering projects. I delegated the responsibility - deciding how to do things - but kept the accountability. If the project would fail, and someone would get in trouble, it would still be me, not the project lead.

Collaboration. Set up a framework for collaboration.Milestones. Break down the project into milestones & provide estimates on these.Communication. Communicate project status to stakeholders.Risks. Manage and call out risks.Delegate. Help the team ship and delegate (both to the team and upwards).Motivation. Motivate the team on the way.Quality. Ensure the overall quality and reliability of the shipped product.

One of the powerful tools I've found leads and teams to hold themselves accountable was a short email status update sent out by the team every week. The update would summarise progress towards the next milestone, how this process changed from last time, and progress the previous week. Risks and delays would explicitly be called out, along with plans to mitigate. This update would be emailed to me, key stakeholders, and all of the team members.

Stakeholders typically care about milestone estimates, evidence on the progress being made towards those estimates. In the case of risks and scope changes, they care about what changes in scope mean for the business. Finally, stakeholders ended up often pinging the project lead directly. This forced the lead to strengthen their stakeholder management skills.

First-time project leads needed to strengthen leadership skills before being thrown into deep water. There are multiple things a project lead needs to do, from facilitating meetings, reporting, calling out risks, coming up with mitigation strategies, and others. Could they start to practice a few of these skills on a project they are not formally leading?

For example, a more junior member started to facilitate the regular standup, getting feedback from the project lead afterward. Preparing for planning meetings, or leading certain stakeholder meetings started to be done by less experienced members - after plenty of preparation, and the project lead being present to support.

Even better, the project lead was strengthening their ability to mentor well

I took a more "prescriptive" approach with first-time project leads, going forward. I suggested them to follow certain processes to the T - kickoff meeting following a template, daily standups, weekly emails based on a template. I asked them to humor this for the first time, and that on their next project, they will be free to choose their tools more freely. Just experience out how these "standard" tools worked, for the duration of the whole project. I put the Checklist for first-time projects part in the guiding document in place at this time.

The perception of the team improved greatly. Stakeholders started to appreciate - and depend on - the weekly status update emails, and loved the transparency these updates provided. Turns out that unexpected delays are easier to work through, when stakeholders trust the team, and understand what happens under the hood.

The approach of engineers owning features end-to-end became more sustainable across the team. In a sprint-based environment, most engineers tend to "forget" about a feature, after development is complete.

This is despite the project far from being complete: rollout, A/B testing and user feedback are still to come - and all these parts carry additional project risk.

As much of the team transitioned to the new project, the project lead was still engaged, looking at usage numbers, figuring out if something needed fixing.

Members of the team saw themselves as leaders, even when not being assigned a project lead role. When interacting with stakeholders, they made decisions on the spot, informing relevant parties.

Likely related to professional growth, very few people decided to leave the team. Those who did, moved to teams owning domains they had more interest in, quickly becoming a goto person on their new teams as well.

Smaller, one-person side-projects were also an area I experimented with. For those who were eager to lead, I suggested we treat one of the smaller things they worked on as a project. I assigned a mentor to them, to make it a two-person team, and asked them to follow the usual expectations, from having a kickoff, incremental milestones, and weekly updates. You might think this was an overkill. Perhaps so, but the people doing this loved it - and improved their leadership skills on a small, non-critical project.

This was the point where I began suggesting that people take on ownership on parts of the project: specifically, project leads delegating smaller parts.

I also tried to "mix and match" parallel projects, so larger and smaller efforts would be better balanced.

The time-consuming part of planning and resourcing projects was the main downside of this approach. I found myself and our product manager becoming the bottleneck in planning out who will work on what project, next, and who will be the lead. Initially, I did not mind: the payoff and professional growth for team members made up for a bit of extra time spent here. As the team is growing, we'll have to decide if we keep this structure, with smaller teams, or not.

But I want to code, not do project management…" Early on, a few engineers expressed worry that I'm asking them to do project management. "Isn't that what project managers are for?" - they asked.

Or they do the project management - doing so with autonomy, and learn a new skill. Do as little project management they'd like to, as long as we have a way to know where we are, and if we are on track.

Org

·blog.pragmaticengineer.com·Jan 27, 2024

An Engineering Team where Everyone is a Leader

How to plan?

Instead, start working on new things when you aren’t Planning. Document what you want to do/build/etc. Share the proposal with people. Get their explicit buy-in to support the new thing. Test your assumptions and estimates with people in your organization with expertise that are different from yours. Get feedback from leadership about whether your idea is aligned with the direction of the organization (or if they’re willing to change directions). Tell people how much it’s going to cost: in terms of people assigned to it, time to iterate on it, to break through the noise in the market and educate your target customers, etc. Solicit internal interest for people who would be interested to join the effort if it gets approved. This is a Funding Proposal, and it’s a great process to run outside of Planning.

The other option that also works well for a large class of ideas: just try them. Don’t ask for resources, don’t ask for support from other teams, do the quick and dirty test to validate or disprove your idea, build a thing knowing that you’ll probably fail and be ready to back out your changes and throw it away. That’s a great alternative to doing a real plan. It’s the middle zone, where we pretend to plan, but do it badly, that gets us into trouble.

% of the company who adopts a product once it’s available, etc.

An investment portfolio approach is a framework. 20% of our effort should be going to brand new efforts that are high risk and high reward. The number of people it takes to operate our mature products should decrease 20% YoY even as usage grows.

they need to be constraints people know going into Planning.

These are things you should be tracking throughout the year, but if you aren’t, setting aside some time before planning to get these answers is important.

As the person designing Planning, encourage people to build their plans on the foundation of things that are already working and shipped. New capabilities, and building support and excitement for them should be done via Funding Proposals outside this process (see “Planning is the wrong time to introduce anything new.”)

Org

·kellanem.com·Jan 27, 2024

How to plan?

dashbitco/nimble_ownership

Tracking ownership of resources in different processes

Elixir #distributed programming

·github.com·Jan 27, 2024

dashbitco/nimble_ownership

Phoenix LiveView: Multi-step forms

Elixir

·bernheisel.com·Jan 25, 2024

Phoenix LiveView: Multi-step forms

Aurae Runtime

·aurae.io·Jan 21, 2024

Aurae Runtime

Polar – A creator platform for developers

#community

·polar.sh·Jan 15, 2024

Polar – A creator platform for developers

Goal Oriented Action Planning for a Smarter AI | Envato Tuts+

Game Dev

·gamedevelopment.tutsplus.com·Jan 14, 2024

Goal Oriented Action Planning for a Smarter AI | Envato Tuts+

Project Roadmap | mise-en-place

·mise.jdx.dev·Jan 14, 2024

Project Roadmap | mise-en-place

8 Top Docker Tips & Tricks for 2024 | Docker

Another lifesaver is using RUN --mount type=cache when installing packages. This little gem keeps your package cache intact between builds. No more re-downloading the entire internet every time you build your image. It’s especially handy when you’re working with large dependencies. Implement this, and watch your build efficiency go through the roof.

DevOps #docker

·docker.com·Jan 11, 2024

8 Top Docker Tips & Tricks for 2024 | Docker