Skip to content

Latest commit

 

History

History
 
 

05-batch

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

Week 5: Batch Processing

5.1 Introduction

5.2 Installation

Follow these intructions to install Spark:

And follow this to run PySpark in Jupyter

5.3 Spark SQL and DataFrames

Script to prepare the Dataset download_data.sh

Note: The other way to infer the schema (apart from pandas) for the csv files, is to set the inferSchema option to true while reading the files in Spark.

5.4 Spark Internals

5.5 (Optional) Resilient Distributed Datasets

5.6 Running Spark in the Cloud

Homework

Community notes

Did you take notes? You can share them here.