Data Engineering

Data Engineering

107 bookmarks
Newest
What the Heck is a Data Mesh?! | Chris Riccomini
What the Heck is a Data Mesh?! | Chris Riccomini
I got sucked into a data mesh Twitter thread this weekend (it’s worth a read if you haven’t seen it). Data meshes have clearly struck a nerve. Some don’t understand them, while others believe they’r
·cnr.sh·
What the Heck is a Data Mesh?! | Chris Riccomini
System Design Interview Tutorial – The Beginner's Guide to System Design
System Design Interview Tutorial – The Beginner's Guide to System Design
System Design is an important topic to understand if you want to advance further in your career as a software engineer. Even if you are just beginning your coding journey, it's a good idea to get a head start on learning about system design. Early in your career you will
·freecodecamp.org·
System Design Interview Tutorial – The Beginner's Guide to System Design
How to make data pipelines idempotent
How to make data pipelines idempotent
A common way to make your data pipeline idempotent is to use the delete-write pattern.
“Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application”
running a data pipeline multiple times with the same input will always produce the same output.
A common way to make your data pipeline idempotent is to use the delete-write pattern.
·startdataengineering.com·
How to make data pipelines idempotent
GitHub OCTO | Flat Data
GitHub OCTO | Flat Data
OCTO Project: Flat explores how to make it easy to work with data in git and GitHub. It builds on the “[git scraping” approach pioneered by Simon Willison](https://simonwillison.net/2020/Oct/9/git-scraping/) to offer a simple pattern for bringing working datasets into your repositories and versioning them, because developing against local datasets is faster and easier than working with data over the wire.
·octo.github.com·
GitHub OCTO | Flat Data
Top Python ETL Tools for 2021
Top Python ETL Tools for 2021
List of the 6 best Python ETL tools for 2021 with a detailed comparison of features and capabilities. What is the best Python tool for ETL workflows?
·xplenty.com·
Top Python ETL Tools for 2021
What is a Data Warehouse
What is a Data Warehouse
This post goes over what the term data warehousing means. This post provides a simple e-commerce relational data model and how it has to be changed to fit analytical queries. It also covers the reasoning behind wanting to use a data warehouse and how to choose an appropriate database for your project.
·startdataengineering.com·
What is a Data Warehouse
What is DataOps? - Gradient Flow
What is DataOps? - Gradient Flow
The rise of tools and processes to manage and control data. By Assaf Araki and Ben Lorica. Data has emerged as an imperative foundational asset for all organizations. Data fuels significant initiatives such as digital transformation and the adoption of analytics, machine learning, and AI. Organizations that are able to tame, manage, and unlock theirContinue reading "What is DataOps?"
·gradientflow.com·
What is DataOps? - Gradient Flow
How to trigger a spark job from AWS Lambda
How to trigger a spark job from AWS Lambda
Wondering how to execute a spark job on an AWS EMR cluster, based on a file upload event on S3? Then this post if for you. In this post we go over how to trigger spark jobs on an AWS EMR cluster, using AWS Lambda. The lambda function will execute in response to an S3 upload event. We will go over this event driven pattern with code snippets and set up a fully functioning pipeline.
·startdataengineering.com·
How to trigger a spark job from AWS Lambda
Data Engineering Project: Stream Edition · Start Data Engineering
Data Engineering Project: Stream Edition · Start Data Engineering
Data engineering project for beginners, stream edition. In this post we design and build a simple data streaming pipeline using Apache Kafka, Apache Flink and PostgreSQL DB. We will also review the design and understand some common issues to avoid while building distributed stream processing systems.
·startdataengineering.com·
Data Engineering Project: Stream Edition · Start Data Engineering
Uber's Journey Toward Better Data Culture From First Principles
Uber's Journey Toward Better Data Culture From First Principles
Data powers Uber Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science that powers everything that Uber does, such as better pricing and matching, fraud detection, lowering ETAs, and experimentation. Petabytes of data are collected and processed per day and thousands of users derive insights and make decisions from this data to build/improve these products. Problems beyond scale While we are able to scale our data systems, we previously didn’t focus enough
·eng.uber.com·
Uber's Journey Toward Better Data Culture From First Principles