Data Engineering

107 bookmarks

Newest

What the Heck is a Data Mesh?! | Chris Riccomini

I got sucked into a data mesh Twitter thread this weekend (it’s worth a read if you haven’t seen it). Data meshes have clearly struck a nerve. Some don’t understand them, while others believe they’r

Data Architect

·cnr.sh·Jun 16, 2021

What the Heck is a Data Mesh?! | Chris Riccomini

System Design Interview Tutorial – The Beginner's Guide to System Design

System Design is an important topic to understand if you want to advance further in your career as a software engineer. Even if you are just beginning your coding journey, it's a good idea to get a head start on learning about system design. Early in your career you will

To-Read

·freecodecamp.org·Jun 15, 2021

System Design Interview Tutorial – The Beginner's Guide to System Design

How to make data pipelines idempotent

A common way to make your data pipeline idempotent is to use the delete-write pattern.

“Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application”

running a data pipeline multiple times with the same input will always produce the same output.

A common way to make your data pipeline idempotent is to use the delete-write pattern.

Patterns

·startdataengineering.com·May 26, 2021

How to make data pipelines idempotent

GitHub OCTO | Flat Data

OCTO Project: Flat explores how to make it easy to work with data in git and GitHub. It builds on the “[git scraping” approach pioneered by Simon Willison](https://simonwillison.net/2020/Oct/9/git-scraping/) to offer a simple pattern for bringing working datasets into your repositories and versioning them, because developing against local datasets is faster and easier than working with data over the wire.

·octo.github.com·May 26, 2021

GitHub OCTO | Flat Data

Top Python ETL Tools for 2021

List of the 6 best Python ETL tools for 2021 with a detailed comparison of features and capabilities. What is the best Python tool for ETL workflows?

Tools and Packages

·xplenty.com·May 5, 2021

Top Python ETL Tools for 2021

Hadoop Vs SQL | Find Out The Top 6 Most Successful Differences

Guide to Hadoop Vs SQL. Here we have discussed Hadoop vs SQL head to head comparison, key difference along with infographics and comparison table.

Tutorials

·educba.com·May 2, 2021

Hadoop Vs SQL | Find Out The Top 6 Most Successful Differences

What is a Data Warehouse

This post goes over what the term data warehousing means. This post provides a simple e-commerce relational data model and how it has to be changed to fit analytical queries. It also covers the reasoning behind wanting to use a data warehouse and how to choose an appropriate database for your project.

Data Architect

·startdataengineering.com·Apr 22, 2021

What is a Data Warehouse

A proven approach to land a Data Engineering job

Proven approach to get usable experience and land a data engineering job

Patterns

·startdataengineering.com·Apr 22, 2021

A proven approach to land a Data Engineering job

What is DataOps? - Gradient Flow

The rise of tools and processes to manage and control data. By Assaf Araki and Ben Lorica. Data has emerged as an imperative foundational asset for all organizations. Data fuels significant initiatives such as digital transformation and the adoption of analytics, machine learning, and AI. Organizations that are able to tame, manage, and unlock theirContinue reading "What is DataOps?"

Patterns

·gradientflow.com·Apr 21, 2021

What is DataOps? - Gradient Flow

What Skills Do Data Engineers Need - The Data Engineering Skill Pyramid

With an extensive background in data science, analytics, and cloud computing, I am consistently asked...

Roadmaps

·dev.to·Apr 18, 2021

What Skills Do Data Engineers Need - The Data Engineering Skill Pyramid

How to trigger a spark job from AWS Lambda

Wondering how to execute a spark job on an AWS EMR cluster, based on a file upload event on S3? Then this post if for you. In this post we go over how to trigger spark jobs on an AWS EMR cluster, using AWS Lambda. The lambda function will execute in response to an S3 upload event. We will go over this event driven pattern with code snippets and set up a fully functioning pipeline.

Tutorials

·startdataengineering.com·Apr 8, 2021

How to trigger a spark job from AWS Lambda

Data Engineering Project: Stream Edition · Start Data Engineering

Data engineering project for beginners, stream edition. In this post we design and build a simple data streaming pipeline using Apache Kafka, Apache Flink and PostgreSQL DB. We will also review the design and understand some common issues to avoid while building distributed stream processing systems.

Tutorials

·startdataengineering.com·Mar 29, 2021

Data Engineering Project: Stream Edition · Start Data Engineering

Become a Data Engineer with this Complete List of Resources

Want to know how to become a data engineer? Here is a list of resources, certifications and other important links that will help you to get started with it.

Roadmaps

·analyticsvidhya.com·Mar 24, 2021

Become a Data Engineer with this Complete List of Resources

What Skills Do You Need to Become a Data Engineer?

What skills do you need to become a data engineer? Learn how to grow your data engineer skillset with this introductory guide.

Roadmaps

·springboard.com·Mar 24, 2021

What Skills Do You Need to Become a Data Engineer?

Uber's Journey Toward Better Data Culture From First Principles

Data powers Uber Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science that powers everything that Uber does, such as better pricing and matching, fraud detection, lowering ETAs, and experimentation. Petabytes of data are collected and processed per day and thousands of users derive insights and make decisions from this data to build/improve these products. Problems beyond scale While we are able to scale our data systems, we previously didn’t focus enough

·eng.uber.com·Mar 23, 2021

Uber's Journey Toward Better Data Culture From First Principles