Data Engineering

107 bookmarks

Custom sorting

Apache Spark Cluster on Docker (ft. a JuyterLab Interface)

Build your own Apache Spark cluster in standalone mode on Docker with a JupyterLab interface

Big Data

·towardsdatascience.com·Sep 28, 2022

Apache Spark Cluster on Docker (ft. a JuyterLab Interface)

How to Build a Spark Cluster with Docker, JupyterLab, and Apache Livy—a REST API for Apache Spark

Read our step-by-step guide to building an Apache Spark cluster based on the Docker virtual environment with JupyterLab and the Apache Livy REST interface.

Big Data

·stxnext.com·Sep 28, 2022

How to Build a Spark Cluster with Docker, JupyterLab, and Apache Livy—a REST API for Apache Spark

DIY: Apache Spark & Docker

Set up a Spark cluster in Docker from scratch

Big Data

·towardsdatascience.com·Sep 28, 2022

DIY: Apache Spark & Docker

How to build and run Bigtop Sandbox (Experimental) - Apache Bigtop - Apache Software Foundation

Big Data

·cwiki.apache.org·Sep 28, 2022

How to build and run Bigtop Sandbox (Experimental) - Apache Bigtop - Apache Software Foundation

How to install Apache Spark on Ubuntu using Apache Bigtop

Want to install Apache Spark using Apache Bigtop? Step by step tutorial. Bigtop is a package manager for Spark, HBase, Hadoop and other Apache projects related to big data. This tutorial is for Machine Learning engineers and Data Scientists looking for a convenient way to manage big data components of their ecosystem.

Big Data

·blog.miz.space·Sep 28, 2022

How to install Apache Spark on Ubuntu using Apache Bigtop

Hadoop%20cluster

Big Data

·docs.deistercloud.com·Sep 25, 2022

Hadoop%20cluster

Hadoop ecosystem with docker-compose

Description Construct Hadoop-ecosystem cluster composed of 1 master, 1 DB, and n of slaves, using docker-compose. Get experience of hadoop map-reduce routine and hive, sqoop, and hbase system, among the hadoop ecosystem.

Big Data

·hjben.github.io·Sep 24, 2022

Hadoop ecosystem with docker-compose

Building an Apache Airflow configured with Local Executor and Spark Standalone Cluster with Docker

A guide on how to set up an environment to work with Airflow and Spark

Big Data

·mbvyn.medium.com·Sep 23, 2022

Building an Apache Airflow configured with Local Executor and Spark Standalone Cluster with Docker

How to setup Simple Hadoop Cluster on Docker

How to setup a Hadoop cluster on Docker in simplest way

Big Data

·selectfrom.dev·Sep 23, 2022

How to setup Simple Hadoop Cluster on Docker

Hadoop Yarn Configuration on Cluster

This post explains how to setup Yarn master on hadoop 3.1 cluster and run a map reduce program.Before you proceed this document, please make sure you have Hadoop3.1 cluster up and running. if you do not have a setup, please follow below link to setup your cluster and come back to this page.

Big Data

·sparkbyexamples.com·Sep 23, 2022

Hadoop Yarn Configuration on Cluster

Spark Step-by-Step Setup on Hadoop Yarn Cluster

This post explains how to setup Apache Spark and run Spark applications on the Hadoop with the Yarn cluster manager that is used to run spark examples as deployment mode client and master as yarn. You can also try running the Spark application in cluster mode. Prerequisites : If you don't have Hadoop & Yarn installed, please Install and Setup Hadoop cluster and setup Yarn on Cluster before proceeding with this article.. Spark Install and Setup In order to install and setup Apache Spark on Hadoop cluster, access Apache Spark Download site and go to the Download Apache Spark section

Big Data

·sparkbyexamples.com·Sep 23, 2022

Spark Step-by-Step Setup on Hadoop Yarn Cluster

Apache Hadoop Installation on Ubuntu (multi-node cluster).

Below are the steps of Apache Hadoop Installation on a Linux Ubuntu server, if you have a windows laptop with enough memory, you can create 4 virtual machines by using Oracle Virtual Box and install Ubuntu on these VM's. This article assumes you have Ubuntu OS running and doesn't explain how to create VM's and install Ubuntu. Apache Hadoop is an open-source distributed storing and processing framework that is used to execute large data sets on commodity hardware; Hadoop natively runs on Linux operating system, in this article I will explain step by step Apache Hadoop installation version (Hadoop 3.1.1)

Big Data

·sparkbyexamples.com·Sep 23, 2022

Apache Hadoop Installation on Ubuntu (multi-node cluster).

How To Set Up a Hadoop 3.2.1 Multi-Node Cluster on Ubuntu 18.04 (2 Nodes)

To start: What is Hadoop?

Big Data

·medium.com·Sep 21, 2022

How To Set Up a Hadoop 3.2.1 Multi-Node Cluster on Ubuntu 18.04 (2 Nodes)

apache hadoop-3.2.0 multi-node cluster with alternative backup recovery

deploy the latest version of Apache Hadoop (Stable release: 3.2.0) on the multi-node cluster to store unstructured data in a distributed manner.

Big Data

·dataview.in·Sep 21, 2022

apache hadoop-3.2.0 multi-node cluster with alternative backup recovery

Hive server2

Big Data

·dataview.in·Sep 21, 2022

Hive server2

Install and Configuration of Apache Hive on multi-node Hadoop cluster

The apache Hive is a data warehouse system. to install and configure the latest version of Apache Hive on top of the existing multi-node Hadoop cluster.

Big Data

·dataview.in·Sep 20, 2022

Install and Configuration of Apache Hive on multi-node Hadoop cluster

Multinode Hadoop installation steps - DBACLASS

Multi Node Cluster in Hadoop 2.x Here, we are taking two machines – master and slave. On both the machines, a datanode will be running. Let us start with the setup of Multi Node Cluster in Hadoop. PREREQUISITES: Cent OS 6.5 Hadoop-2.7.3 JAVA 8 SSH We have two machines (master and slave) with IP: Master […]

Big Data

·dbaclass.com·Sep 20, 2022

Multinode Hadoop installation steps - DBACLASS

Hive - Installation

Hive - Installation, All Hadoop sub-projects such as Hive, Pig, and HBase support Linux operating system. Therefore, you need to install any Linux flavored OS. The following simple

Big Data

·tutorialspoint.com·Sep 20, 2022

Hive - Installation

Kafka, for your data pipeline? Why not?

Create a streaming pipeline using Docker, Kafka, and Kafka Connect

Streaming

·towardsdatascience.com·Sep 5, 2022

Kafka, for your data pipeline? Why not?

Onehouse

·onehouse.ai·Aug 22, 2022

Onehouse

How to Put a Database in Kubernetes - DZone Cloud

Learn the key steps of deploying databases and stateful workloads in Kubernetes and meet cloud-native technologies that can streamline Apache Cassandra for K8s.

Tutorials

·dzone.com·Mar 25, 2022

How to Put a Database in Kubernetes - DZone Cloud

Ultimate CI Pipeline for All of Your Python Projects

Everything you ever wanted for your Python project continuous integration pipeline — up-and-running in matter of minutes

Tutorials

·towardsdatascience.com·Mar 23, 2022

Ultimate CI Pipeline for All of Your Python Projects

15+ Data Engineering Projects for Beginners with Source Code

Explore top 15 real-world data engineering projects ideas for beginners with source code to gain hands-on experience on diverse data engineering skills.

Projects

·projectpro.io·Mar 2, 2022

15+ Data Engineering Projects for Beginners with Source Code

Starting your journey with Microsoft Azure Data Factory

In this article, we will go through the Microsoft Azure Data Factory service, that can be used to ingest, copy and transform data generated from various data sources

Tutorials

·sqlshack.com·Jan 20, 2022

Starting your journey with Microsoft Azure Data Factory

What Skills Do You Need to Become a Data Engineer?

What skills do you need to become a data engineer? Learn how to grow your data engineer skillset with this introductory guide.

Roadmaps

·springboard.com·Mar 24, 2021

What Skills Do You Need to Become a Data Engineer?

Roadmap: Data Infrastructure

Roadmaps

·bvp.com·Jul 19, 2021

Roadmap: Data Infrastructure

What Skills Do Data Engineers Need - The Data Engineering Skill Pyramid

With an extensive background in data science, analytics, and cloud computing, I am consistently asked...

Roadmaps

·dev.to·Apr 18, 2021

What Skills Do Data Engineers Need - The Data Engineering Skill Pyramid

How to prepare for the Azure Data Engineer Associate Certification

Resources and tips to help you nail down the DP-203 exam to earn the Azure Data Engineer Associate Certificate

Roadmaps

·dhyanintech.medium.com·Jan 17, 2022

How to prepare for the Azure Data Engineer Associate Certification

10 Skills to Ace Your Data Engineering Interviews

Preparing for a data engineering interview and are overwhelmed by all the tools and concepts?. Then this post is for you, in this post we go over the most common tools and concepts you need to know to ace your data engineering interviews.

Roadmaps

·startdataengineering.com·Jan 1, 2022

10 Skills to Ace Your Data Engineering Interviews

Whats the difference between ETL & ELT?

This post goes over what the ETL and ELT data pipeline paradigms are. It tries to address the inconsistency in naming conventions and how to understand what they really mean. Finally ends with a comparison of the 2 paradigms and how to use these concepts to build efficient and scalable data pipelines.

Tutorials

·startdataengineering.com·Nov 22, 2021

Whats the difference between ETL & ELT?