Ce diaporama a bien été signalé.
Nous utilisons votre profil LinkedIn et vos données d’activité pour vous proposer des publicités personnalisées et pertinentes. Vous pouvez changer vos préférences de publicités à tout moment.
www.edureka.co/r-for-analytics
www.edureka.co/apache-spark-scala-training
Apache Spark: Beyond Hadoop MapReduce
Presenter:...
Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training
What will you learn today?
 Strength of MapReduce
 Limi...
Strength of MapReduce
Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training
Simple
Scalable
Fault
Tolerant
Minimal
data
motion
Streng...
Limitations of MapReduce
Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training
Real
Time
Complex
Algorithm
Re-reading
and parsing
Data
M...
Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training
Feature Comparison with Spark
Fast 100x faster than MapRe...
What are the MR limitations and
how Spark overcomes it?
Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
By Cutting down on the number
o...
Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training
Spark tries to keep things in-memory of its distribute...
Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Libraries for Machine
Learni...
Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training
Libraries For ML, Graph Programming …
Machine Learning...
Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training
Overcoming MR limitations
Cyclic data flows
Random
acc...
Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training
Cyclic Data Flows
• All jobs in spark comprise a serie...
Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training
Spark Features makes its Architecture better
than MR
Other Spark Features In Demand
Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training
Spark Features/Modules In Demand
Source: Typesafe
Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training
New Features In 2015
Data Frames 
• Similar API to da...
Slide 19Slide 19Slide 19 www.edureka.co/apache-spark-scala-training
Get Certified in Spark from Edureka
Edureka's Spark an...
Thank You
Questions/Queries/Feedback/Survey
Recording and presentation will be made available to you within 24 hours
Prochain SlideShare
Chargement dans…5
×

Apache Spark beyond Hadoop MapReduce

934 vues

Publié le

Apache Spark beyond Hadoop MapReduce

Publié dans : Technologie
  • Soyez le premier à commenter

Apache Spark beyond Hadoop MapReduce

  1. 1. www.edureka.co/r-for-analytics www.edureka.co/apache-spark-scala-training Apache Spark: Beyond Hadoop MapReduce Presenter: Vishal
  2. 2. Slide 2Slide 2Slide 2 www.edureka.co/apache-spark-scala-training What will you learn today?  Strength of MapReduce  Limitations of MapReduce  How MapReduce limitations can be overcome  How Spark fits the bill  Other exciting features in Spark
  3. 3. Strength of MapReduce
  4. 4. Slide 4Slide 4Slide 4 www.edureka.co/apache-spark-scala-training Simple Scalable Fault Tolerant Minimal data motion Strength of MapReduce Independent of a programming language, such as Java, C++ or Python. It can process petabytes of data, stored in HDFS on one cluster MapReduce takes care of failures using the replicated copies. Process moves towards data to minimize Disk I/O
  5. 5. Limitations of MapReduce
  6. 6. Slide 6Slide 6Slide 6 www.edureka.co/apache-spark-scala-training Real Time Complex Algorithm Re-reading and parsing Data Minimal Data Motion Graph Processing Iterative Tasks Random Access Limitations Of MR
  7. 7. Slide 7Slide 7Slide 7 www.edureka.co/apache-spark-scala-training Feature Comparison with Spark Fast 100x faster than MapReduce Batch Processing Batch and Real-time Processing Stores Data on Disk Stores Data in Memory Written in Java Written in Scala Hadoop MapReduce Hadoop Spark Source: Databrix
  8. 8. What are the MR limitations and how Spark overcomes it?
  9. 9. Slide 9Slide 9Slide 9 www.edureka.co/apache-spark-scala-training Overcoming MR limitations By Cutting down on the number of Reads and Writes to the disc Real time
  10. 10. Slide 10Slide 10Slide 10 www.edureka.co/apache-spark-scala-training Spark tries to keep things in-memory of its distributed workers, allowing for significantly faster/lower-latency computations, whereas MapReduce keeps shuffling things in and out of disk. Spark Cuts Down Read/Write I/O To Disk
  11. 11. Slide 11Slide 11Slide 11 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Libraries for Machine Learning & Streaming Graph processing Complex algorithm
  12. 12. Slide 12Slide 12Slide 12 www.edureka.co/apache-spark-scala-training Libraries For ML, Graph Programming … Machine Learning Library Graph programming Spark interface For RDBMS lovers Utility for continuous ingestion of data
  13. 13. Slide 13Slide 13Slide 13 www.edureka.co/apache-spark-scala-training Overcoming MR limitations Cyclic data flows Random access
  14. 14. Slide 14Slide 14Slide 14 www.edureka.co/apache-spark-scala-training Cyclic Data Flows • All jobs in spark comprise a series of operators and run on a set of data. • All the operators in a job are used to construct a DAG (Directed Acyclic Graph). • The DAG is optimized by rearranging and combining operators where possible.
  15. 15. Slide 15Slide 15Slide 15 www.edureka.co/apache-spark-scala-training Spark Features makes its Architecture better than MR
  16. 16. Other Spark Features In Demand
  17. 17. Slide 17Slide 17Slide 17 www.edureka.co/apache-spark-scala-training Spark Features/Modules In Demand Source: Typesafe
  18. 18. Slide 18Slide 18Slide 18 www.edureka.co/apache-spark-scala-training New Features In 2015 Data Frames  • Similar API to data frames in R and Pandas • Automatically optimised via Spark SQL • Released in Spark 1.3 SparkR  • Released in Spark 1.4 • Exposes DataFrames, RDD’s & MLlibrary in R Machine Learning Pipelines  • High Level API • Featurization • Evaluation • Model Tuning External Data Sources  • Platform API to plug Data-Sources into Spark • Pushes logic into sources Source: Databrix
  19. 19. Slide 19Slide 19Slide 19 www.edureka.co/apache-spark-scala-training Get Certified in Spark from Edureka Edureka's Spark and Scala course: • Learn large-scale data processing by mastering the concepts of Scala, RDD, Traits, OOPS and Spark SQL • Online Live Courses: 24 hours • Assignments: 32 hours • Project: 20 hours • Lifetime Access + 24 X 7 Support Go to www.edureka.co/apache-spark-scala-training Batch starts from 10th October (Weekend Batch)
  20. 20. Thank You Questions/Queries/Feedback/Survey Recording and presentation will be made available to you within 24 hours

×