Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
An introduction to Apache Spark
1. Apache Spark
● What is it ?
● How does it work ?
● Benefits
● Tuning
● Examples
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
2. Spark – What is it ?
● Open Source
● Alternative to Map Reduce for certain applications
● A low latency cluster computing system
● For very large data sets
● May be 100 times faster than Map Reduce for
– Iterative algorithms
– Interactive data mining
● Used with Hadoop / HDFS
● Released under BSD License
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
3. Spark – How does it work ?
● Uses in memory cluster computing
● Memory access faster than disk access
● Has API's written in
– Scala
– Java
– Python
● Can be accessed from Scala and Python shells
● Currently an Apache incubator project
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
4. Spark – Benefits
● Scales to very large clusters
● Uses in memory processing for increased speed
● High Level API's
– Java, Scala, Python
● Low latency shell access
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
5. Spark – Tuning
● Bottlenecks can occur in the cluster via
– CPU, memory or network bandwidth
● Tune data serialization method i.e.
– Java ObjectOutputStream vs Kryo
● Memory Tuning
– Use primitive types
– Set JVM Flags
– Store objects in serialized form i.e.
● RDD Persistence
● MEMORY_ONLY_SER
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
6. Spark – Examples
Example from spark-project.org, Spark job in Scala.
Showing a simple text count from a system log.
/*** SimpleJob.scala ***/
import spark.SparkContext
import SparkContext._
object SimpleJob {
def main(args: Array[String]) {
val logFile = "/var/log/syslog" // Should be some file on your system
val sc = new SparkContext("local", "Simple Job", "$YOUR_SPARK_HOME",
List("target/scala-2.9.3/simple-project_2.9.3-1.0.jar"))
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
www.semtech-solutions.co.nz info@semtech-solutions.co.nz
7. Contact Us
● Feel free to contact us at
– www.semtech-solutions.co.nz
– info@semtech-solutions.co.nz
● We offer IT project consultancy
● We are happy to hear about your problems
● You can just pay for those hours that you need
● To solve your problems