Data Engineering

Data Engineering

103 bookmarks
Custom sorting
6 Key Concepts, to Master Window Functions
6 Key Concepts, to Master Window Functions
In this post, we go over 6 key concepts to help you master window functions. Window functions are one the most powerful features of SQL, they are very useful in analytics and performing operations that cannot be done easily with the standard group by, subquery and filters. Despite this, window functions are not used frequently. If you have ever thought 'window functions are confusing', then this post is for you.
·startdataengineering.com·
6 Key Concepts, to Master Window Functions
What are Common Table Expressions(CTEs) and when to use them?
What are Common Table Expressions(CTEs) and when to use them?
You have heard of Common Table Expressions(CTEs), but are not be sure what they are and when to use them. What if you knew exactly what Common Table Expressions(CTEs) were and when to use them? In this post, we go over what CTEs are, and their performance comparisons against subqueries, derived tables, and temp tables to help decide when to use them.
·startdataengineering.com·
What are Common Table Expressions(CTEs) and when to use them?
Designing a Data Project to Impress Hiring Managers
Designing a Data Project to Impress Hiring Managers
Frustrated that hiring managers are not reading your Github projects? then this post is for you. In this post, we discuss a way to impress hiring managers by hosting a live dashboard with near real-time data. We will also go over coding best practices such as project structure, automated formatting, and testing to make your code professional. By the end of this post, you will have deployed a live dashboard that you can link to your resume and LinkedIn.
·startdataengineering.com·
Designing a Data Project to Impress Hiring Managers
datastacktv/data-engineer-roadmap
datastacktv/data-engineer-roadmap
Roadmap to becoming a data engineer in 2021. Contribute to datastacktv/data-engineer-roadmap development by creating an account on GitHub.
·github.com·
datastacktv/data-engineer-roadmap
Connect to Azure SQL in Python with MFA Active Directory Interactive Authentication without using Microsoft.IdentityModel.Clients.ActiveDirectory dll
Connect to Azure SQL in Python with MFA Active Directory Interactive Authentication without using Microsoft.IdentityModel.Clients.ActiveDirectory dll
To connect to Azure SQL Database using MFA (which is in SSMS as "Active Directory - Universal") Microsoft recommends and currently only has a tutorial on connecting with C# using Microsoft.Identity...
·stackoverflow.com·
Connect to Azure SQL in Python with MFA Active Directory Interactive Authentication without using Microsoft.IdentityModel.Clients.ActiveDirectory dll
How to make data pipelines idempotent
How to make data pipelines idempotent
A common way to make your data pipeline idempotent is to use the delete-write pattern.
“Idempotence is the property of certain operations in mathematics and computer science whereby they can be applied multiple times without changing the result beyond the initial application”
running a data pipeline multiple times with the same input will always produce the same output.
A common way to make your data pipeline idempotent is to use the delete-write pattern.
·startdataengineering.com·
How to make data pipelines idempotent
GitHub OCTO | Flat Data
GitHub OCTO | Flat Data
OCTO Project: Flat explores how to make it easy to work with data in git and GitHub. It builds on the “[git scraping” approach pioneered by Simon Willison](https://simonwillison.net/2020/Oct/9/git-scraping/) to offer a simple pattern for bringing working datasets into your repositories and versioning them, because developing against local datasets is faster and easier than working with data over the wire.
·octo.github.com·
GitHub OCTO | Flat Data
Data Engineering Project: Stream Edition · Start Data Engineering
Data Engineering Project: Stream Edition · Start Data Engineering
Data engineering project for beginners, stream edition. In this post we design and build a simple data streaming pipeline using Apache Kafka, Apache Flink and PostgreSQL DB. We will also review the design and understand some common issues to avoid while building distributed stream processing systems.
·startdataengineering.com·
Data Engineering Project: Stream Edition · Start Data Engineering
Uber's Journey Toward Better Data Culture From First Principles
Uber's Journey Toward Better Data Culture From First Principles
Data powers Uber Uber has revolutionized how the world moves by powering billions of rides and deliveries connecting millions of riders, businesses, restaurants, drivers, and couriers. At the heart of this massive transportation platform is Big Data and Data Science that powers everything that Uber does, such as better pricing and matching, fraud detection, lowering ETAs, and experimentation. Petabytes of data are collected and processed per day and thousands of users derive insights and make decisions from this data to build/improve these products. Problems beyond scale While we are able to scale our data systems, we previously didn’t focus enough
·eng.uber.com·
Uber's Journey Toward Better Data Culture From First Principles