** PySpark Certification Training: https://www.edureka.co/pyspark-certification-training **
This Edureka tutorial on "PySpark RDD"" will provide you with a detailed and comprehensive knowledge of RDD, which are considered the backbone of Apache Spark. You will learn about the various Transformations and actions that can be performed on RDDs. This tutorial covers the following topics:
1. Need for RDDs
2. What are RDDs
3. PySpark RDD features
4. PySpark RDD Operations
5. Finding Page Rank - PySpark Demo
2. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Today’s Training Topics
❖ Need for RDD
❖ What are RDD?
❖ PySpark RDD Operations
❖ Features of RDD
❖ PySpark RDD Operations - Demo
❖ Finding Page Rank – PySpark RDD Demo
3. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Why RDD?
Stable Storage
(HDFS)
Stable Storage
(HDFS)
Stable Storage
(HDFS)
JOB #1 JOB #2
Iterative Process
Reusing Data
Sharing Data
S L O W
4. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
What are RDD?
R
D
D
esilient
istributed
ataset
Backbone of Apache Spark
One of the First Fundamental Data Structures
Transformations
Actions
5. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Transformations and Actions
Transformations
map
flatMap
filter
distinct
reduceByKey
mapPartitions
Actions
collect
collectAsMap
reduce
countByKey
take
countByValue
6. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
Features of RDDs
In-Memory
Computation
Lazy
Evaluations
Fault
Tolerance
Immutability
Partitioning
Persistence
Coarse
Grained
Operations
12. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
How it works ?
Page Rank of Site
Page Rank of Inbound Link
13. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
How it works ?
Page Rank of Site
Page Rank of Inbound Link
Number of Links
On that Page
14. PYSPARK CERTIFICATION TRAINING www.edureka.co/pyspark-certification-training
How it works ?
Netflix
Amazon
Wikipedia
Google
Iter - 0 Iter - 1 Iter - 2 Rank
1/4
1/4
1/4
1/4
1/12
2.5/12
4.5/12
4/12
1.5/12
2/12
4.5/12
4/12
1
2
3
4