SlideShare une entreprise Scribd logo
1  sur  23
Scala & Spark
Scala & Spark
The following topics will be covered in our
Scala & Spark Online Training:
Copyright @ 2015 Learntek. All Rights Reserved. 2
What is Scala?
Scala & spark Training – Scala is a modern multi-paradigm programming
language designed to express common programming patterns in a concise,
elegant, and type-safe way. Scala, the word came from “Scalable Language”, is
a hybrid functional programming language which smoothly integrates the
features of objected oriented and functional programming languages and it is
compiled to run on the Java Virtual Machine. Scala has been created by
Martin Odersky and released in 2003.
Why Scala?
• Scala is a type-safe JVM language that incorporates both object oriented and
functional programming features into an extremely concise, logical, simple and
extremely powerful language.
• Scala creates a “better Java” alternative by remaining its syntax very close to the Java
language syntax, so that to minimize the learning difficulty.
• Scala was created specifically with the goal of creating a better language, in contrast
with those restrictive, overly tedious, or frustrating features of Java.
Copyright @ 2015 Learntek. All Rights Reserved. 4
What is Spark?
• Spark is a fast cluster computing technology, designed for fast computation in
Hadoop clusters. It is based on Hadoop MapReduce programming and it
extends the MapReduce model to efficiently use it for more types of
computations, like interactive queries and stream processing. Spark uses
Hadoop in two different ways – one is storage and another one
is processing. As Spark is having its own cluster management computation, it
uses Hadoop for storage purpose only.
Why Spark?
• Spark was introduced by Apache Software Foundation for speeding up the Hadoop
software computing process.
• The main feature of Spark is its in-memory cluster computing that highly increases
the speed of an application processing.
• Spark is designed to cover a wide range of workloads such as batch applications,
iterative algorithms, interactive queries and streaming applications by reducing the
management burden of maintaining separate tools.
Copyright @ 2015 Learntek. All Rights Reserved. 6
Introduction to Scala
• Scala & spark Training – Overview of Scala
• Installing Scala
• Scala Basics
• IDE for Scala
Copyright @ 2015 Learntek. All Rights Reserved. 7
Scala Programming
• Variables & Methods
• Literals
• Reserved Words
• Operators
• Precedence Rules
• If Expression
• For Expression
• Exception handling with Try
Expression
• Match Expression
• While Loops
• Do-While Loops
• Implicit Conversion
Copyright @ 2015 Learntek. All Rights Reserved. 8
Functions in Scala
• Methods
• First class Function
• Higher Order Methods
• Function Literal
• Partially Applied Function
• Tail Recursion
• Closure
• Currying
• Control Abstraction
Copyright @ 2015 Learntek. All Rights Reserved. 9
Traits & OOPs in Scala
• Traits
• Classes & Objects
• Abstract Class
• Access Modifiers
• Functional Programming
• Scala Class Hierarchy
• Package and Imports
Copyright @ 2015 Learntek. All Rights Reserved. 10
Case Class & Pattern Matching
• Pattern type
• Pattern Guard
• Sealed Class
• Option Type
• Extractor
Copyright @ 2015 Learntek. All Rights Reserved. 11
Scala Collection
• Immutable And Mutable collection
• Array
• Sets
• Lists
• Tuples
• Maps
Copyright @ 2015 Learntek. All Rights Reserved. 12
Introduction to Spark
• Scala & spark Training – Problems with Traditional Large-Scale
Systems
• Introducing Spark
• What is Spark?
Copyright @ 2015 Learntek. All Rights Reserved. 13
Spark Basics
• Spark Installation
• Configure HDP 2.4 (or 2.5) on local machine
• Spark Shell
• Storage layers for Spark
• Overview of Spark architecture
• Initialize a Spark Context and building applications
Copyright @ 2015 Learntek. All Rights Reserved. 14
IDEs for Spark Applications
• SBT and its overview
• Intellij
• Eclipse
• Resolving dependencies for Spark applications
Copyright @ 2015 Learntek. All Rights Reserved. 15
RDDs
• RDD Basics
• RDD transformations and Actions
• Lazy evaluation
• Element wise transformations
Copyright @ 2015 Learntek. All Rights Reserved. 16
Pair RDDs
• Key-Value Pair RDD
• Creating Pair RDDs
• Transformations on Pair RDD
• Grouping , Joining, Sorting on
Pair RDD
• Data Partitioning
• Determining a partition of Pair
RDD
• Operations that Benefit from
Partitioning
• Operations those affect the
partitioning
• Page Rank Example
Copyright @ 2015 Learntek. All Rights Reserved. 17
Advance concepts in Spark
• Accumulator
• Broadcast
• Working on per-partition basis
Copyright @ 2015 Learntek. All Rights Reserved. 18
Launching Spark on cluster
• Configure and launch Spark Cluster on AWS
• Configure and launch Spark Cluster on Microsoft Azure
Copyright @ 2015 Learntek. All Rights Reserved. 19
Running Spark on Cluster
• Spark Runtime Architecture
• Driver
• Executor
• Cluster Manager
• Components of Execution :
Job, Stage and Task
• Spark Web URL
• Driver and Executor logs
• Spark-submit command
Copyright @ 2015 Learntek. All Rights Reserved. 20
Caching and Persistence
• RDD Lineage
• Caching Overview
• Distributed Persistence
Copyright @ 2015 Learntek. All Rights Reserved. 21
Spark Algorithms
• Spark SQL
• Spark Streaming
• MLlib
• GraphX
Copyright @ 2015 Learntek. All Rights Reserved. 22
Copyright @ 2015 Learntek. All Rights Reserved. 23

Contenu connexe

Tendances

Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Alfredo Krieg
 

Tendances (20)

Using the Java Client Library by Noah Crowley, DevRel | InfluxData
Using the Java Client Library by Noah Crowley, DevRel | InfluxDataUsing the Java Client Library by Noah Crowley, DevRel | InfluxData
Using the Java Client Library by Noah Crowley, DevRel | InfluxData
 
Oracle JET: Enterprise-Ready JavaScript Toolkit
Oracle JET: Enterprise-Ready JavaScript ToolkitOracle JET: Enterprise-Ready JavaScript Toolkit
Oracle JET: Enterprise-Ready JavaScript Toolkit
 
Introduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database ProfessionalsIntroduction to Machine Learning for Oracle Database Professionals
Introduction to Machine Learning for Oracle Database Professionals
 
Spark SQL & Machine Learning - A Practical Demonstration
Spark SQL & Machine Learning - A Practical DemonstrationSpark SQL & Machine Learning - A Practical Demonstration
Spark SQL & Machine Learning - A Practical Demonstration
 
Oracle SOA Suite for High availability Enterprises
Oracle SOA Suite for High availability EnterprisesOracle SOA Suite for High availability Enterprises
Oracle SOA Suite for High availability Enterprises
 
Build and Manage Your APIs with Amazon API Gateway
Build and Manage Your APIs with Amazon API GatewayBuild and Manage Your APIs with Amazon API Gateway
Build and Manage Your APIs with Amazon API Gateway
 
Avoid the Oracle SE2 Trap with EnterpriseDB & Palisade Compliance
Avoid the Oracle SE2 Trap with EnterpriseDB & Palisade ComplianceAvoid the Oracle SE2 Trap with EnterpriseDB & Palisade Compliance
Avoid the Oracle SE2 Trap with EnterpriseDB & Palisade Compliance
 
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
Under The Hood of Pluggable Databases by Alex Gorbachev, Pythian, Oracle OpeW...
 
用Serverless技術快速開發line聊天機器人
用Serverless技術快速開發line聊天機器人用Serverless技術快速開發line聊天機器人
用Serverless技術快速開發line聊天機器人
 
Streaming Solutions for Real time problems
Streaming Solutions for Real time problemsStreaming Solutions for Real time problems
Streaming Solutions for Real time problems
 
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ..."Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
"Introduction to Sparkling Water" — Jakub Hava, Senior Software Engineer, at ...
 
SpringPeople Introduction to JAVA Web Services
SpringPeople Introduction to JAVA Web ServicesSpringPeople Introduction to JAVA Web Services
SpringPeople Introduction to JAVA Web Services
 
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita IvanovGridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
GridGain 6.0: Open Source In-Memory Computing Platform - Nikita Ivanov
 
AMIS Beyond the Horizon - High density deployments using weblogic multitenancy
AMIS Beyond the Horizon - High density deployments using weblogic multitenancyAMIS Beyond the Horizon - High density deployments using weblogic multitenancy
AMIS Beyond the Horizon - High density deployments using weblogic multitenancy
 
Data Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudDataData Engineering Course Syllabus - WeCloudData
Data Engineering Course Syllabus - WeCloudData
 
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
Monitor Engineered Systems from a Single Pane of Glass: Oracle Enterprise Man...
 
Peer council 2013_presentation
Peer council 2013_presentationPeer council 2013_presentation
Peer council 2013_presentation
 
Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018Oracle - Continuous Delivery NYC meetup, June 07, 2018
Oracle - Continuous Delivery NYC meetup, June 07, 2018
 
Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)Database As A Service: OEM + ODA (OOW 15 Presentation)
Database As A Service: OEM + ODA (OOW 15 Presentation)
 
Getting Started, Low Hanging Fruit: Our First Experiences with Oracle Managem...
Getting Started, Low Hanging Fruit: Our First Experiences with Oracle Managem...Getting Started, Low Hanging Fruit: Our First Experiences with Oracle Managem...
Getting Started, Low Hanging Fruit: Our First Experiences with Oracle Managem...
 

Similaire à Scala & Spark Online Training

Similaire à Scala & Spark Online Training (20)

Scala and spark
Scala and sparkScala and spark
Scala and spark
 
Scala Introduction - Meetup Scaladores RJ
Scala Introduction - Meetup Scaladores RJScala Introduction - Meetup Scaladores RJ
Scala Introduction - Meetup Scaladores RJ
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Spark and Hadoop Technology
Spark and Hadoop Technology Spark and Hadoop Technology
Spark and Hadoop Technology
 
Learn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive GuideLearn Apache Spark: A Comprehensive Guide
Learn Apache Spark: A Comprehensive Guide
 
Getting Started with Spark Scala
Getting Started with Spark ScalaGetting Started with Spark Scala
Getting Started with Spark Scala
 
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
Introduction to Big Data Analytics using Apache Spark and Zeppelin on HDInsig...
 
Ten Compelling Reasons to Go the Scala Development Way - Metadesign Solutions
Ten Compelling Reasons to Go the Scala Development Way - Metadesign SolutionsTen Compelling Reasons to Go the Scala Development Way - Metadesign Solutions
Ten Compelling Reasons to Go the Scala Development Way - Metadesign Solutions
 
Apache Spark for Beginners
Apache Spark for BeginnersApache Spark for Beginners
Apache Spark for Beginners
 
spark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark examplespark example spark example spark examplespark examplespark examplespark example
spark example spark example spark examplespark examplespark examplespark example
 
Apache Spark Fundamentals
Apache Spark FundamentalsApache Spark Fundamentals
Apache Spark Fundamentals
 
Oracle RAD stack REST, APEX, Database
Oracle RAD stack REST, APEX, DatabaseOracle RAD stack REST, APEX, Database
Oracle RAD stack REST, APEX, Database
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Apache spark with java 8
Apache spark with java 8Apache spark with java 8
Apache spark with java 8
 
Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014Big Data Processing with Apache Spark 2014
Big Data Processing with Apache Spark 2014
 
Spark SQL
Spark SQLSpark SQL
Spark SQL
 
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability MeetupApache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
Apache Spark - Lightning Fast Cluster Computing - Hyderabad Scalability Meetup
 
Dec6 meetup spark presentation
Dec6 meetup spark presentationDec6 meetup spark presentation
Dec6 meetup spark presentation
 

Plus de Learntek1

Plus de Learntek1 (7)

Aws sys ops administrator
Aws sys ops administratorAws sys ops administrator
Aws sys ops administrator
 
Big data - Online Training
Big data - Online TrainingBig data - Online Training
Big data - Online Training
 
Angular js Online Training
Angular js Online TrainingAngular js Online Training
Angular js Online Training
 
Selenium Online Training
Selenium  Online TrainingSelenium  Online Training
Selenium Online Training
 
React js Online Training
React js Online TrainingReact js Online Training
React js Online Training
 
Machine learning using spark Online Training
Machine learning using spark Online TrainingMachine learning using spark Online Training
Machine learning using spark Online Training
 
Apache Flink Online Training
Apache Flink Online TrainingApache Flink Online Training
Apache Flink Online Training
 

Dernier

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
negromaestrong
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
QucHHunhnh
 

Dernier (20)

1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Seal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptxSeal of Good Local Governance (SGLG) 2024Final.pptx
Seal of Good Local Governance (SGLG) 2024Final.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Advance Mobile Application Development class 07
Advance Mobile Application Development class 07Advance Mobile Application Development class 07
Advance Mobile Application Development class 07
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
APM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across SectorsAPM Welcome, APM North West Network Conference, Synergies Across Sectors
APM Welcome, APM North West Network Conference, Synergies Across Sectors
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1Código Creativo y Arte de Software | Unidad 1
Código Creativo y Arte de Software | Unidad 1
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
Mattingly "AI & Prompt Design: Structured Data, Assistants, & RAG"
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 

Scala & Spark Online Training

  • 2. Scala & Spark The following topics will be covered in our Scala & Spark Online Training: Copyright @ 2015 Learntek. All Rights Reserved. 2
  • 3. What is Scala? Scala & spark Training – Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. Scala, the word came from “Scalable Language”, is a hybrid functional programming language which smoothly integrates the features of objected oriented and functional programming languages and it is compiled to run on the Java Virtual Machine. Scala has been created by Martin Odersky and released in 2003.
  • 4. Why Scala? • Scala is a type-safe JVM language that incorporates both object oriented and functional programming features into an extremely concise, logical, simple and extremely powerful language. • Scala creates a “better Java” alternative by remaining its syntax very close to the Java language syntax, so that to minimize the learning difficulty. • Scala was created specifically with the goal of creating a better language, in contrast with those restrictive, overly tedious, or frustrating features of Java. Copyright @ 2015 Learntek. All Rights Reserved. 4
  • 5. What is Spark? • Spark is a fast cluster computing technology, designed for fast computation in Hadoop clusters. It is based on Hadoop MapReduce programming and it extends the MapReduce model to efficiently use it for more types of computations, like interactive queries and stream processing. Spark uses Hadoop in two different ways – one is storage and another one is processing. As Spark is having its own cluster management computation, it uses Hadoop for storage purpose only.
  • 6. Why Spark? • Spark was introduced by Apache Software Foundation for speeding up the Hadoop software computing process. • The main feature of Spark is its in-memory cluster computing that highly increases the speed of an application processing. • Spark is designed to cover a wide range of workloads such as batch applications, iterative algorithms, interactive queries and streaming applications by reducing the management burden of maintaining separate tools. Copyright @ 2015 Learntek. All Rights Reserved. 6
  • 7. Introduction to Scala • Scala & spark Training – Overview of Scala • Installing Scala • Scala Basics • IDE for Scala Copyright @ 2015 Learntek. All Rights Reserved. 7
  • 8. Scala Programming • Variables & Methods • Literals • Reserved Words • Operators • Precedence Rules • If Expression • For Expression • Exception handling with Try Expression • Match Expression • While Loops • Do-While Loops • Implicit Conversion Copyright @ 2015 Learntek. All Rights Reserved. 8
  • 9. Functions in Scala • Methods • First class Function • Higher Order Methods • Function Literal • Partially Applied Function • Tail Recursion • Closure • Currying • Control Abstraction Copyright @ 2015 Learntek. All Rights Reserved. 9
  • 10. Traits & OOPs in Scala • Traits • Classes & Objects • Abstract Class • Access Modifiers • Functional Programming • Scala Class Hierarchy • Package and Imports Copyright @ 2015 Learntek. All Rights Reserved. 10
  • 11. Case Class & Pattern Matching • Pattern type • Pattern Guard • Sealed Class • Option Type • Extractor Copyright @ 2015 Learntek. All Rights Reserved. 11
  • 12. Scala Collection • Immutable And Mutable collection • Array • Sets • Lists • Tuples • Maps Copyright @ 2015 Learntek. All Rights Reserved. 12
  • 13. Introduction to Spark • Scala & spark Training – Problems with Traditional Large-Scale Systems • Introducing Spark • What is Spark? Copyright @ 2015 Learntek. All Rights Reserved. 13
  • 14. Spark Basics • Spark Installation • Configure HDP 2.4 (or 2.5) on local machine • Spark Shell • Storage layers for Spark • Overview of Spark architecture • Initialize a Spark Context and building applications Copyright @ 2015 Learntek. All Rights Reserved. 14
  • 15. IDEs for Spark Applications • SBT and its overview • Intellij • Eclipse • Resolving dependencies for Spark applications Copyright @ 2015 Learntek. All Rights Reserved. 15
  • 16. RDDs • RDD Basics • RDD transformations and Actions • Lazy evaluation • Element wise transformations Copyright @ 2015 Learntek. All Rights Reserved. 16
  • 17. Pair RDDs • Key-Value Pair RDD • Creating Pair RDDs • Transformations on Pair RDD • Grouping , Joining, Sorting on Pair RDD • Data Partitioning • Determining a partition of Pair RDD • Operations that Benefit from Partitioning • Operations those affect the partitioning • Page Rank Example Copyright @ 2015 Learntek. All Rights Reserved. 17
  • 18. Advance concepts in Spark • Accumulator • Broadcast • Working on per-partition basis Copyright @ 2015 Learntek. All Rights Reserved. 18
  • 19. Launching Spark on cluster • Configure and launch Spark Cluster on AWS • Configure and launch Spark Cluster on Microsoft Azure Copyright @ 2015 Learntek. All Rights Reserved. 19
  • 20. Running Spark on Cluster • Spark Runtime Architecture • Driver • Executor • Cluster Manager • Components of Execution : Job, Stage and Task • Spark Web URL • Driver and Executor logs • Spark-submit command Copyright @ 2015 Learntek. All Rights Reserved. 20
  • 21. Caching and Persistence • RDD Lineage • Caching Overview • Distributed Persistence Copyright @ 2015 Learntek. All Rights Reserved. 21
  • 22. Spark Algorithms • Spark SQL • Spark Streaming • MLlib • GraphX Copyright @ 2015 Learntek. All Rights Reserved. 22
  • 23. Copyright @ 2015 Learntek. All Rights Reserved. 23