Data-Lab

11 bookmarks

Custom sorting

Create Your Very Own Apache Spark/Hadoop Cluster....then do something with it? - Confessions of a Data Guy

I’ve never seen so many posts about Apache Spark before, not sure if it’s 3.0, or because the world is burning down. I’ve written about Spark a few times, even 2 years ago, but it still seems to be steadily increasing in popularity, albeit still missing from many companies tech stacks. With the continued rise […]

·confessionsofadataguy.com·Sep 8, 2022

Create Your Very Own Apache Spark/Hadoop Cluster....then do something with it? - Confessions of a Data Guy

panovvv/hadoop-hive-spark-docker: Base Docker image with just essentials: Hadoop, Hive and Spark.

Base Docker image with just essentials: Hadoop, Hive and Spark. - GitHub - panovvv/hadoop-hive-spark-docker: Base Docker image with just essentials: Hadoop, Hive and Spark.

·github.com·Sep 19, 2022

panovvv/hadoop-hive-spark-docker: Base Docker image with just essentials: Hadoop, Hive and Spark.

spark_hive_test/src/main/scala/tests/SparkHiveTest.scala at master · arempter/spark_hive_test · GitHub

Example for article Running Spark 3 with standalone Hive Metastore 3.0

·github.com·Jul 5, 2023

spark_hive_test/src/main/scala/tests/SparkHiveTest.scala at master · arempter/spark_hive_test · GitHub

Running Spark 3 with standalone Hive Metastore 3.0

Intro

·medium.com·Jul 5, 2023

Running Spark 3 with standalone Hive Metastore 3.0

pyspark connect to aws s3a filesystem

jar dependencies are very finicky

·codelovingyogi.medium.com·Jun 29, 2023

pyspark connect to aws s3a filesystem

Reading and Writing Data from/to MinIO using Spark

MinIO is a cloud object storage that offers high-performance, S3 compatible. Native to Kubernetes, MinIO is the only object storage suite…

·medium.com·Jun 29, 2023

Reading and Writing Data from/to MinIO using Spark

java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics

I'm trying to run a simple spark to s3 app from a server but I keep getting the below error because the server has hadoop 2.7.3 installed and it looks like it doesn't include the GlobalStorageStati...

·stackoverflow.com·Jun 29, 2023

java.lang.NoClassDefFoundError: org/apache/hadoop/fs/StorageStatistics

cookbook/docs/apache-spark-with-minio.md at master · nitisht/cookbook · GitHub

Collection of Minio recipes. Contribute to nitisht/cookbook development by creating an account on GitHub.

·github.com·Jun 29, 2023

cookbook/docs/apache-spark-with-minio.md at master · nitisht/cookbook · GitHub

Add Jar to standalone pyspark

I'm launching a pyspark program: $ export SPARK_HOME= $ export PYTHONPATH=$SPARK_HOME/python:$SPARK_HOME/python/lib/py4j-0.9-src.zip $ python And the py code: from pyspark import SparkContext,

.config('spark.jars.packages', 'org.apache.spark:spark-sql-kafka-0-10_2.12:3.0.1')

·stackoverflow.com·Jun 29, 2023

Add Jar to standalone pyspark

Adding some MinIO to your standalone Apache Spark cluster

Disaggregated compute and storage for the apprentice Data Engineer

·fithis2001.medium.com·Jun 29, 2023

Adding some MinIO to your standalone Apache Spark cluster

DataOps 02: Spawn up Apache Spark infrastructure by using Docker

When working on real data products, we will register an account on cloud providers such as Amazon, Azure, or Google so that we are able to…

·medium.com·Jun 29, 2023

DataOps 02: Spawn up Apache Spark infrastructure by using Docker