Tutorials

8 bookmarks

Custom sorting

RDD Programming Guide - Spark 3.5.0 Documentation

Shuffle operations explained

·spark.apache.org·Feb 29, 2024

RDD Programming Guide - Spark 3.5.0 Documentation

Faster PySpark Unit Tests

TL;DR: A PySpark unit test setup for pytest that uses efficient default settings and utilizes all CPU cores via pytest-xdist is available…

shuffle.partitions

#testing

·medium.com·Feb 29, 2024

Faster PySpark Unit Tests

Populate current date and current timestamp in pyspark - DataScience Made Simple

In order to populate current date and current timestamp in pyspark we will be using current_date() and current_timestamp() function respectively - example

·datasciencemadesimple.com·Oct 8, 2021

Populate current date and current timestamp in pyspark - DataScience Made Simple

How to connect to remote hive server from spark

I'm running spark locally and want to to access Hive tables, which are located in the remote Hadoop cluster. I'm able to access the hive tables by lauching beeline under SPARK_HOME [ml@master spa...

·stackoverflow.com·Sep 23, 2022

How to connect to remote hive server from spark

How to read data from HDFS in Pyspark -

This recipe helps you read data from HDFS in Pyspark

·projectpro.io·Sep 23, 2022

How to read data from HDFS in Pyspark -

One-hot encoding in PySpark

To perform one-hot encoding in PySpark, we must convert the categorical column into a numeric column (0, 1, ...) using StringIndexer, and then convert the numeric column into one-hot encoded columns using OneHotEncoder.

·skytowner.com·Feb 16, 2023

One-hot encoding in PySpark

pyspark.SparkContext.setLogLevel — PySpark 3.3.0 documentation

·spark.apache.org·Sep 29, 2022

pyspark.SparkContext.setLogLevel — PySpark 3.3.0 documentation

Getting started with MongoDB, PySpark, and Jupyter Notebook | MongoDB Blog

Learn how to leverage MongoDB data in your Jupyter notebooks via the MongoDB Spark Connector and PySpark. We will load financial security data from MongoDB, calculate a moving average, and then update the data in MongoDB with the new data.

·mongodb.com·Sep 29, 2022

Getting started with MongoDB, PySpark, and Jupyter Notebook | MongoDB Blog