Data Engineering

Data Engineering

107 bookmarks
Newest
Data Pipeline Design Patterns - #1. Data flow patterns
Data Pipeline Design Patterns - #1. Data flow patterns
Data pipelines built (and added on to) without a solid foundation will suffer from poor efficiency, slow development speed, long times to triage production issues, and hard testability. What if your data pipelines are elegant and enable you to deliver features quickly? An easy-to-maintain and extendable data pipeline significantly increase developer morale, stakeholder trust, and the business bottom line! Using the correct design pattern will increase feature delivery speed and developer value (allowing devs to do more in less time), decrease toil during pipeline failures, and build trust with stakeholders. This post goes over the most commonly used data flow design patterns, what they do, when to use them, and, more importantly, when not to use them. By the end of this post, you will have an overview of the typical data flow patterns and be able to choose the right one for your use case.
·startdataengineering.com·
Data Pipeline Design Patterns - #1. Data flow patterns
About this Book — The Data Science Interview Book
About this Book — The Data Science Interview Book
Preparing for interviews is a stressful task. There is an enormous amount of resources available in the internet, multiple repositories, there are even companies that help students prepare for interviews at the Big Tech companies. The idea here is to create an accessible version of these resources so people on this journey can benefit from it.
·dipranjan.github.io·
About this Book — The Data Science Interview Book
NoSQL databases sample models: MongoDB, Neo4j, Swagger, Cassandra
NoSQL databases sample models: MongoDB, Neo4j, Swagger, Cassandra
Get the sample models for MongoDB, Neo4j, Cassandra, Swagger, Avro, Parquet, Glue, and more! After download, open the models using Hackolade, and learn through the examples how to leverage the modeling power of the software.
·hackolade.com·
NoSQL databases sample models: MongoDB, Neo4j, Swagger, Cassandra
How to install Apache Spark on Ubuntu using Apache Bigtop
How to install Apache Spark on Ubuntu using Apache Bigtop
Want to install Apache Spark using Apache Bigtop? Step by step tutorial. Bigtop is a package manager for Spark, HBase, Hadoop and other Apache projects related to big data. This tutorial is for Machine Learning engineers and Data Scientists looking for a convenient way to manage big data components of their ecosystem.
·blog.miz.space·
How to install Apache Spark on Ubuntu using Apache Bigtop
Hadoop ecosystem with docker-compose
Hadoop ecosystem with docker-compose
Description Construct Hadoop-ecosystem cluster composed of 1 master, 1 DB, and n of slaves, using docker-compose. Get experience of hadoop map-reduce routine and hive, sqoop, and hbase system, among the hadoop ecosystem.
·hjben.github.io·
Hadoop ecosystem with docker-compose
Hadoop Yarn Configuration on Cluster
Hadoop Yarn Configuration on Cluster
This post explains how to setup Yarn master on hadoop 3.1 cluster and run a map reduce program.Before you proceed this document, please make sure you have Hadoop3.1 cluster up and running. if you do not have a setup, please follow below link to setup your cluster and come back to this page.
·sparkbyexamples.com·
Hadoop Yarn Configuration on Cluster
Spark Step-by-Step Setup on Hadoop Yarn Cluster
Spark Step-by-Step Setup on Hadoop Yarn Cluster
This post explains how to setup Apache Spark and run Spark applications on the Hadoop with the Yarn cluster manager that is used to run spark examples as deployment mode client and master as yarn. You can also try running the Spark application in cluster mode. Prerequisites : If you don't have Hadoop & Yarn installed, please Install and Setup Hadoop cluster and setup Yarn on Cluster before proceeding with this article.. Spark Install and Setup In order to install and setup Apache Spark on Hadoop cluster, access Apache Spark Download site and go to the Download Apache Spark section
·sparkbyexamples.com·
Spark Step-by-Step Setup on Hadoop Yarn Cluster
Apache Hadoop Installation on Ubuntu (multi-node cluster).
Apache Hadoop Installation on Ubuntu (multi-node cluster).
Below are the steps of Apache Hadoop Installation on a Linux Ubuntu server, if you have a windows laptop with enough memory, you can create 4 virtual machines by using Oracle Virtual Box and install Ubuntu on these VM's. This article assumes you have Ubuntu OS running and doesn't explain how to create VM's and install Ubuntu. Apache Hadoop is an open-source distributed storing and processing framework that is used to execute large data sets on commodity hardware; Hadoop natively runs on Linux operating system, in this article I will explain step by step Apache Hadoop installation version (Hadoop 3.1.1)
·sparkbyexamples.com·
Apache Hadoop Installation on Ubuntu (multi-node cluster).
Multinode Hadoop installation steps - DBACLASS
Multinode Hadoop installation steps - DBACLASS
Multi Node Cluster in Hadoop 2.x Here, we are taking two machines – master and slave. On both the machines, a datanode will be running. Let us start with the setup of Multi Node Cluster in Hadoop. PREREQUISITES: Cent OS 6.5 Hadoop-2.7.3 JAVA 8 SSH We have two machines (master and slave) with IP: Master […]
·dbaclass.com·
Multinode Hadoop installation steps - DBACLASS
Hive - Installation
Hive - Installation
Hive - Installation, All Hadoop sub-projects such as Hive, Pig, and HBase support Linux operating system. Therefore, you need to install any Linux flavored OS. The following simple
·tutorialspoint.com·
Hive - Installation
Onehouse
Onehouse
·onehouse.ai·
Onehouse
What is the difference between a data lake and a data warehouse?
What is the difference between a data lake and a data warehouse?
Confused by all the "data lake vs data warehouse" articles? Struggling to understand what the differences between data lakes and warehouses are? Then this post is for you. We go over what data lakes and warehouses are. We also cover the key points to consider when choosing your lake and warehouse tools.
·startdataengineering.com·
What is the difference between a data lake and a data warehouse?