Data Engineering

Data Engineering

103 bookmarks
Custom sorting
Netflix Data Tech Stack
Netflix Data Tech Stack
Learn about the Data Tech Stack used by Netflix to process trillions of events every day.
·junaideffendi.com·
Netflix Data Tech Stack
Data Pipeline Design Patterns - #2. Coding patterns in Python
Data Pipeline Design Patterns - #2. Coding patterns in Python
As data engineers, you might have heard the terms functional data pipeline, factory pattern, singleton pattern, etc. One can quickly look up the implementation, but it can be tricky to understand what they are precisely and when to (& when not to) use them. Blindly following a pattern can help in some cases, but not knowing the caveats of a design will lead to hard-to-maintain and brittle code! While writing clean and easy-to-read code takes years of experience, you can accelerate that by understanding the nuances and reasoning behind each pattern. Imagine being able to design an implementation that provides the best extensibility and maintainability! Your colleagues (& future self) will be extremely grateful, your feature delivery speed will increase, and your boss will highly value your opinion. In this post, we will go over the specific code design patterns used for data pipelines, when and why to use them, and when not to use them, and we will also go over a few python specific techniques to help you write better pipelines. By the end of this post, you will be able to identify patterns in your data pipelines and apply the appropriate code design patterns. You will also be able to take advantage of pythonic features to write bug-free, maintainable code that is a joy to work on!
·startdataengineering.com·
Data Pipeline Design Patterns - #2. Coding patterns in Python
A Comprehensive Guide to Vector Databases
A Comprehensive Guide to Vector Databases
[et_pb_section fb_built=”1″ _builder_version=”4.21.0″ _module_preset=”default” background_color=”#0A0900″ width=”100%” global_colors_info=”{}”][et_pb_row column_structure=”1_4,3_4″ _builder_version=”4.21.0″ _module_preset=”default” background_color=”#0A0900″ width=”100%” global_colors_info=”{}”][et_pb_column type=”1_4″ _builder_version=”4.21.0″ _module_preset=”default” global_colors_info=”{}”][et_pb_text _builder_version=”4.21.0″ _module_preset=”default” sticky_position=”top” hover_enabled=”0″ global_colors_info=”{}” background_color=”#0A0900″ sticky_enabled=”0″] SECTION ONEWhat is a Vector Database? SECTION TWOThe Business Value of Vector Databases SECTION THREEVector Databases Use Cases SECTION FOURRequired Capabilities of Vector Databases SUMMARY [/et_pb_text][/et_pb_column][et_pb_column type=”3_4″ _builder_version=”4.21.0″ _module_preset=”default” global_colors_info=”{}”][et_pb_text _builder_version=”4.21.0″ […]
·kdb.ai·
A Comprehensive Guide to Vector Databases
Top Python ETL Tools for 2021
Top Python ETL Tools for 2021
List of the 6 best Python ETL tools for 2021 with a detailed comparison of features and capabilities. What is the best Python tool for ETL workflows?
·xplenty.com·
Top Python ETL Tools for 2021
How to trigger a spark job from AWS Lambda
How to trigger a spark job from AWS Lambda
Wondering how to execute a spark job on an AWS EMR cluster, based on a file upload event on S3? Then this post if for you. In this post we go over how to trigger spark jobs on an AWS EMR cluster, using AWS Lambda. The lambda function will execute in response to an S3 upload event. We will go over this event driven pattern with code snippets and set up a fully functioning pipeline.
·startdataengineering.com·
How to trigger a spark job from AWS Lambda
Course Information - Big Data Platforms, Autumn 2021
Course Information - Big Data Platforms, Autumn 2021
Helsingin yliopiston kaikille avoin ja ilmainen ohjelmoinnin perusteet opettava verkkokurssi. Kurssilla perehdytään nykyaikaisen ohjelmoinnin perusideoihin sekä ohjelmoinnissa käytettävien työvälineiden lisäksi algoritmien laatimiseen. Kurssille osallistuminen ei vaadi ennakkotietoja ohjelmoinnista.
·big-data-platforms-21.mooc.fi·
Course Information - Big Data Platforms, Autumn 2021
The Apache Cassandra Beginner Tutorial
The Apache Cassandra Beginner Tutorial
There are lots of data-storage options available today. You have to choose between managed or unmanaged, relational or NoSQL, write- or read-optimized, proprietary or open-source — and it doesn't end there. Once you begin your search, you will end up in the universe that is database marketing. All of the vendors
·freecodecamp.org·
The Apache Cassandra Beginner Tutorial
What is a Data Warehouse
What is a Data Warehouse
This post goes over what the term data warehousing means. This post provides a simple e-commerce relational data model and how it has to be changed to fit analytical queries. It also covers the reasoning behind wanting to use a data warehouse and how to choose an appropriate database for your project.
·startdataengineering.com·
What is a Data Warehouse
What is a data warehouse?
What is a data warehouse?
The transformations to mold the data from an application in a form that is better suited for data analysis is done in a data warehouse.
·medium.com·
What is a data warehouse?
The Guide to Data Versioning
The Guide to Data Versioning
What is data versioning? When is data versioning appropriate? We review the various tools and use-cases needed for the best implementation.
·lakefs.io·
The Guide to Data Versioning
What is the difference between a data lake and a data warehouse?
What is the difference between a data lake and a data warehouse?
Confused by all the "data lake vs data warehouse" articles? Struggling to understand what the differences between data lakes and warehouses are? Then this post is for you. We go over what data lakes and warehouses are. We also cover the key points to consider when choosing your lake and warehouse tools.
·startdataengineering.com·
What is the difference between a data lake and a data warehouse?
Building an End-To-End Analytic solution in Power BI: Part 3 – Level Up with Data Modeling! | LinkedIn
Building an End-To-End Analytic solution in Power BI: Part 3 – Level Up with Data Modeling! | LinkedIn
When I talk to people who are not deep into the Power BI world, I often get the impression that they think of Power BI as a visualization tool exclusively. While that is true to a certain extent, it seems to me that they are not seeing the bigger picture – or maybe it’s better to say – they see just
·linkedin.com·
Building an End-To-End Analytic solution in Power BI: Part 3 – Level Up with Data Modeling! | LinkedIn