http://www.learntek.org/product/scala-spark-training/
Scala is a modern multi-paradigm programming language designed to express common programming patterns in a concise, elegant, and type-safe way. Scala, the word came from “Scalable Language”, is a hybrid functional programming language which smoothly integrates the features of objected oriented and functional programming languages and it is compiled to run on the Java Virtual Machine.
Spark is a fast cluster computing technology, designed for fast computation in Hadoop clusters. It is based on Hadoop MapReduce programming and it extends the MapReduce model to efficiently use it for more types of computations, like interactive queries and stream processing. Spark uses Hadoop in two different ways – one is storage and another one is processing.
http://www.learntek.org/
Learntek is global online training provider on Big Data Analytics, Hadoop, Machine Learning, Deep Learning, IOT, AI, Cloud Technology, DEVOPS, Digital Marketing and other IT and Management courses. We are dedicated to designing, developing and implementing training programs for students, corporate employees and business professional.
2. Scala & Spark
The following topics will be covered in our
Scala & Spark Online Training:
Copyright @ 2015 Learntek. All Rights Reserved. 2
3. What is Scala?
Scala & spark Training – Scala is a modern multi-paradigm programming
language designed to express common programming patterns in a concise,
elegant, and type-safe way. Scala, the word came from “Scalable Language”, is
a hybrid functional programming language which smoothly integrates the
features of objected oriented and functional programming languages and it is
compiled to run on the Java Virtual Machine. Scala has been created by
Martin Odersky and released in 2003.
4. Why Scala?
• Scala is a type-safe JVM language that incorporates both object oriented and
functional programming features into an extremely concise, logical, simple and
extremely powerful language.
• Scala creates a “better Java” alternative by remaining its syntax very close to the Java
language syntax, so that to minimize the learning difficulty.
• Scala was created specifically with the goal of creating a better language, in contrast
with those restrictive, overly tedious, or frustrating features of Java.
Copyright @ 2015 Learntek. All Rights Reserved. 4
5. What is Spark?
• Spark is a fast cluster computing technology, designed for fast computation in
Hadoop clusters. It is based on Hadoop MapReduce programming and it
extends the MapReduce model to efficiently use it for more types of
computations, like interactive queries and stream processing. Spark uses
Hadoop in two different ways – one is storage and another one
is processing. As Spark is having its own cluster management computation, it
uses Hadoop for storage purpose only.
6. Why Spark?
• Spark was introduced by Apache Software Foundation for speeding up the Hadoop
software computing process.
• The main feature of Spark is its in-memory cluster computing that highly increases
the speed of an application processing.
• Spark is designed to cover a wide range of workloads such as batch applications,
iterative algorithms, interactive queries and streaming applications by reducing the
management burden of maintaining separate tools.
Copyright @ 2015 Learntek. All Rights Reserved. 6
7. Introduction to Scala
• Scala & spark Training – Overview of Scala
• Installing Scala
• Scala Basics
• IDE for Scala
Copyright @ 2015 Learntek. All Rights Reserved. 7
8. Scala Programming
• Variables & Methods
• Literals
• Reserved Words
• Operators
• Precedence Rules
• If Expression
• For Expression
• Exception handling with Try
Expression
• Match Expression
• While Loops
• Do-While Loops
• Implicit Conversion
Copyright @ 2015 Learntek. All Rights Reserved. 8
9. Functions in Scala
• Methods
• First class Function
• Higher Order Methods
• Function Literal
• Partially Applied Function
• Tail Recursion
• Closure
• Currying
• Control Abstraction
Copyright @ 2015 Learntek. All Rights Reserved. 9
10. Traits & OOPs in Scala
• Traits
• Classes & Objects
• Abstract Class
• Access Modifiers
• Functional Programming
• Scala Class Hierarchy
• Package and Imports
Copyright @ 2015 Learntek. All Rights Reserved. 10
11. Case Class & Pattern Matching
• Pattern type
• Pattern Guard
• Sealed Class
• Option Type
• Extractor
Copyright @ 2015 Learntek. All Rights Reserved. 11
12. Scala Collection
• Immutable And Mutable collection
• Array
• Sets
• Lists
• Tuples
• Maps
Copyright @ 2015 Learntek. All Rights Reserved. 12
13. Introduction to Spark
• Scala & spark Training – Problems with Traditional Large-Scale
Systems
• Introducing Spark
• What is Spark?
Copyright @ 2015 Learntek. All Rights Reserved. 13
14. Spark Basics
• Spark Installation
• Configure HDP 2.4 (or 2.5) on local machine
• Spark Shell
• Storage layers for Spark
• Overview of Spark architecture
• Initialize a Spark Context and building applications
Copyright @ 2015 Learntek. All Rights Reserved. 14
15. IDEs for Spark Applications
• SBT and its overview
• Intellij
• Eclipse
• Resolving dependencies for Spark applications
Copyright @ 2015 Learntek. All Rights Reserved. 15
16. RDDs
• RDD Basics
• RDD transformations and Actions
• Lazy evaluation
• Element wise transformations
Copyright @ 2015 Learntek. All Rights Reserved. 16
17. Pair RDDs
• Key-Value Pair RDD
• Creating Pair RDDs
• Transformations on Pair RDD
• Grouping , Joining, Sorting on
Pair RDD
• Data Partitioning
• Determining a partition of Pair
RDD
• Operations that Benefit from
Partitioning
• Operations those affect the
partitioning
• Page Rank Example
Copyright @ 2015 Learntek. All Rights Reserved. 17
18. Advance concepts in Spark
• Accumulator
• Broadcast
• Working on per-partition basis
Copyright @ 2015 Learntek. All Rights Reserved. 18
19. Launching Spark on cluster
• Configure and launch Spark Cluster on AWS
• Configure and launch Spark Cluster on Microsoft Azure
Copyright @ 2015 Learntek. All Rights Reserved. 19
20. Running Spark on Cluster
• Spark Runtime Architecture
• Driver
• Executor
• Cluster Manager
• Components of Execution :
Job, Stage and Task
• Spark Web URL
• Driver and Executor logs
• Spark-submit command
Copyright @ 2015 Learntek. All Rights Reserved. 20
21. Caching and Persistence
• RDD Lineage
• Caching Overview
• Distributed Persistence
Copyright @ 2015 Learntek. All Rights Reserved. 21