1ZB = 1000EB
All of Wikipedia (text and thumbnails) fits on a 128GB USB stick!
Core modules
Same execution engine for all three
Spark SQL interfaces provide more information about both structure and computation being performed than basic Spark RDD API
Richer optimizations (significantly faster than RDDs)
Underneath is an RDD
Extra:
To create DataFrames from existing RDDs use toDF() function
Spark-ML is a uniform API for building and tuning ML pipelines
Built on top of DataFrames
Data exploration, visualization, and discovery
Deeply integrated with Spark and Hadoop
Pluggable interpreters
Multiple languages in one notebook: R, Python, Scala