The benefits of schema driven development and single source of truth for microservices or API or event systems w.r.t productivity, maintainability & agility
This article focuses on Schema Driven Development (SDD) and Single Source of Truth (STT) paradigms as two first principles every team must follow. It is an essential read for CTOs, tech leaders and every aspiring 10X engineer out there. While I will touch on SDD mainly, I will talk in brief also about the 8 practices I believe are essential, and why we need them. Later in the blog you will see pratical examples of SDD and STT with screenshots and code snippets as applicable.
What is Schema Driven Development?
SDD is about using a single schema definition as the single source of truth, and letting that determine or generate everything else that depends on the schema. For ex. generating CRUD APIs for multiple kinds of event sources and protocols, doing input/output validations in producer and consumer, generating API documentation & Postman collection, starting mock servers and parallel development, generating basic test cases. And as well - sigh of relief that changing in one place will reflect change everywhere else automatically (Single Source of Truth)
SDD helps to speedily kickstart and smoothly manage parallel development across teams, without writing a single custom line of code by hand . It is not only useful for kickstarting the project, but also seamlessly upgrading along with the source schema updates. For ex. If you have a database schema, you can generate CRUD API, Swagger, Postman, Test cases, Graphql API, API clients etc. from the source database schema. Can you imagine the effort and errors saved in this approach? Hint: I once worked in a team of three backend engineers who for three months, only wrote CRUD APIs, validations, documentation and didn't get time to write test cases. We used to share Postman collection over emails.
What are the signs that your team doesn't use SDD?
Such teams don't have an "official" source schema. They manually create and manage dependent schemas, APIs, test cases, documentation, API clients etc. as independent activities (while they should be dependent on the source schema). For ex. They handcraft Postman collections and share over email. They handcraft the CRUD APIs for their Graphql, REST, gRpc services.
In this approach you will have
Multiple sources of Truth (your DB schema, the user.schema.js file maintained separately, the Express routes & middlewares maintained separately, the Swagger and Postman collections maintained separately, the test cases maintained separately and the API clients created separately. So much redundant effort and increased chances of mistakes!
Coupling of schema with code, with event sources setup (Express, Graphql etc).
Non-reusability of the effort already done.
Lack of standardisation and maintainability - Also every developer may implement this differently based on their style or preference of coding. This means more chaos, inefficiencies and mistakes! And also difficulty to switch between developers.
You will be
Writing repetitive validation code in your event source controllers, middleware and clients
Creating boilerplatefor authentication & authorisation
Manually creating Swagger specs & Postman collection (and maintaining often varying versions across developers and teams, shared across emails)
Manually creating CRUD APIs (for database access)
Manually writing integration test cases
Manually creating API clients
Whether we listen on (sync or async) events, query a database, call an API or return data from our sync event calls (http, graphql, grpc etc) - in all such cases you will be witnessing
Redundant effort in maintaining SST derivatives & shipping upgrades
Gaps in API, documentation, test cases, client versions
Increased work means increase in the probability of errors by 10X
Increased work means increased areas to look into when errors happen (like finding needle in haystack) - Imagine wrong data flowing from one microservice to another, and breaking things across a distributed system! You would need to look across all to identify the source of error.
When not following SST, there is no real source of truth
This means whenever a particular API has a new field or changed schema, we need to make manual change in five places - service, client(s), service, swagger, postman collection, integration test cases. What if the developer forgets to update the shared Postman collection? Or write validation for the new field in the APIs? Do you now see how versions and shared API collections can often get out of sync without a single source of truth? Can you imagine the risk, chaos, bugs and inefficiencies this can now bring?
Before we resume back to studying more about SDD and SST, lets have a quick detour to first understand some basic best practices which I believe are critically important for tech orgs, and why they are important?
The 8 best practices
In upcoming articles we will touch upon these 8 best practices.
Schema Driven Development & Single Source of Truth (topic of this post)
Configure Over Code
Security & compliance
Decoupled (Modular) Architecture
Shift Left Approach
Essential coding practices
Efficient SDLC: Issue management, documentation, test automation, code reviews, productivity measurement, source control and version management
Observability for fast resolution
Why should you care about ensuring best practices?
As a tech leader should your main focus be limited to hustling together an MVP and taking it to market?
But MVP is just a small first step of a long journey. This journey includes multiple iterations for finding PMF, and then growth, optimisation and sustainability. There you face dynamic and unpredictable situations like changing teams, customer needs, new compliance, new competition etc. Given this, should you lay your foundation keeping in mind the future as well? For ex. maintainability, agility, quality, democratisation & avoiding risks?