Data Engineering

Data Engineering

109 bookmarks
Newest
Data Engineering Interview Series #2: System Design
Data Engineering Interview Series #2: System Design
System design interviews are usually vague and depend on you (as the interviewee) to guide the interviewer. If you are thinking: How do I prepare for data engineering system design interviews? I struggle to think of questions you would ask in a system design interview for data engineering; I don't have enough interview experience to know what companies ask. Is data engineering "system design" more than choosing between technologies like Spark and Airflow? This post is for you! Imagine being able to solve any data systems design interviews systematically. You'll be able to showcase your abilities and demonstrate clear thinking to your interviewer. By the end of this post, you will have a list of questions ordered by concepts that you can use to approach any data systems design interview.
·startdataengineering.com·
Data Engineering Interview Series #2: System Design
Building Cost Efficient Data Pipelines with Python & DuckDB
Building Cost Efficient Data Pipelines with Python & DuckDB
Imagine working for a company that processes a few GBs of data every day but spends hours configuring/debugging large-scale data processing systems! Whoever set up the data infrastructure copied it from some blog/talk by big tech. Now, the responsibility of managing the data team's expenses has fallen on your shoulders. You're under pressure to scrutinize every system expense, no matter how small, in an effort to save some money for the organization. It can be frustrating when data vendors charge you a lot and will gladly charge you more if you are not careful with usage. Imagine if your data processing costs were dirt cheap! Imagine being able to replicate and debug issues quickly on your laptop! In this post, we will discuss how to use the latest advancements in data processing systems and cheap hardware to enable cheap data processing. We will use DuckDB and Python to demonstrate how to process data quickly while improving developer ergonomics.
·startdataengineering.com·
Building Cost Efficient Data Pipelines with Python & DuckDB
Netflix Data Tech Stack
Netflix Data Tech Stack
Learn about the Data Tech Stack used by Netflix to process trillions of events every day.
·junaideffendi.com·
Netflix Data Tech Stack
Data Pipeline Design Patterns - #2. Coding patterns in Python
Data Pipeline Design Patterns - #2. Coding patterns in Python
As data engineers, you might have heard the terms functional data pipeline, factory pattern, singleton pattern, etc. One can quickly look up the implementation, but it can be tricky to understand what they are precisely and when to (& when not to) use them. Blindly following a pattern can help in some cases, but not knowing the caveats of a design will lead to hard-to-maintain and brittle code! While writing clean and easy-to-read code takes years of experience, you can accelerate that by understanding the nuances and reasoning behind each pattern. Imagine being able to design an implementation that provides the best extensibility and maintainability! Your colleagues (& future self) will be extremely grateful, your feature delivery speed will increase, and your boss will highly value your opinion. In this post, we will go over the specific code design patterns used for data pipelines, when and why to use them, and when not to use them, and we will also go over a few python specific techniques to help you write better pipelines. By the end of this post, you will be able to identify patterns in your data pipelines and apply the appropriate code design patterns. You will also be able to take advantage of pythonic features to write bug-free, maintainable code that is a joy to work on!
·startdataengineering.com·
Data Pipeline Design Patterns - #2. Coding patterns in Python
Building an End-To-End Analytic solution in Power BI: Part 3 – Level Up with Data Modeling! | LinkedIn
Building an End-To-End Analytic solution in Power BI: Part 3 – Level Up with Data Modeling! | LinkedIn
When I talk to people who are not deep into the Power BI world, I often get the impression that they think of Power BI as a visualization tool exclusively. While that is true to a certain extent, it seems to me that they are not seeing the bigger picture – or maybe it’s better to say – they see just
·linkedin.com·
Building an End-To-End Analytic solution in Power BI: Part 3 – Level Up with Data Modeling! | LinkedIn
(1) Data Modeling for Mere Mortals – Part 1: What is Data Modeling?! | LinkedIn
(1) Data Modeling for Mere Mortals – Part 1: What is Data Modeling?! | LinkedIn
In recent years, I’ve done dozens of training on various data platform topics, for all kinds of audiences. When teaching various data platform concepts and techniques, I find one of the concepts particularly intimidating for many business analysts, especially those who are just starting their journe
·linkedin.com·
(1) Data Modeling for Mere Mortals – Part 1: What is Data Modeling?! | LinkedIn
A Comprehensive Guide to Vector Databases
A Comprehensive Guide to Vector Databases
[et_pb_section fb_built=”1″ _builder_version=”4.21.0″ _module_preset=”default” background_color=”#0A0900″ width=”100%” global_colors_info=”{}”][et_pb_row column_structure=”1_4,3_4″ _builder_version=”4.21.0″ _module_preset=”default” background_color=”#0A0900″ width=”100%” global_colors_info=”{}”][et_pb_column type=”1_4″ _builder_version=”4.21.0″ _module_preset=”default” global_colors_info=”{}”][et_pb_text _builder_version=”4.21.0″ _module_preset=”default” sticky_position=”top” hover_enabled=”0″ global_colors_info=”{}” background_color=”#0A0900″ sticky_enabled=”0″] SECTION ONEWhat is a Vector Database? SECTION TWOThe Business Value of Vector Databases SECTION THREEVector Databases Use Cases SECTION FOURRequired Capabilities of Vector Databases SUMMARY [/et_pb_text][/et_pb_column][et_pb_column type=”3_4″ _builder_version=”4.21.0″ _module_preset=”default” global_colors_info=”{}”][et_pb_text _builder_version=”4.21.0″ […]
·kdb.ai·
A Comprehensive Guide to Vector Databases