Tutorials

Tutorials

8 bookmarks
Custom sorting
Faster PySpark Unit Tests
Faster PySpark Unit Tests
TL;DR: A PySpark unit test setup for pytest that uses efficient default settings and utilizes all CPU cores via pytest-xdist is available…
shuffle.partitions
·medium.com·
Faster PySpark Unit Tests
How to connect to remote hive server from spark
How to connect to remote hive server from spark
I'm running spark locally and want to to access Hive tables, which are located in the remote Hadoop cluster. I'm able to access the hive tables by lauching beeline under SPARK_HOME [ml@master spa...
·stackoverflow.com·
How to connect to remote hive server from spark
One-hot encoding in PySpark
One-hot encoding in PySpark
To perform one-hot encoding in PySpark, we must convert the categorical column into a numeric column (0, 1, ...) using StringIndexer, and then convert the numeric column into one-hot encoded columns using OneHotEncoder.
·skytowner.com·
One-hot encoding in PySpark
Getting started with MongoDB, PySpark, and Jupyter Notebook | MongoDB Blog
Getting started with MongoDB, PySpark, and Jupyter Notebook | MongoDB Blog
Learn how to leverage MongoDB data in your Jupyter notebooks via the MongoDB Spark Connector and PySpark. We will load financial security data from MongoDB, calculate a moving average, and then update the data in MongoDB with the new data.
·mongodb.com·
Getting started with MongoDB, PySpark, and Jupyter Notebook | MongoDB Blog