SlideShare une entreprise Scribd logo
1  sur  43
INTERACTIVE
SPARK IN YOUR
BROWSER
Romain Rigaux romain@cloudera.com
Erick Tryzelaar erickt@cloudera.com
GOAL
OF HUE
WEB INTERFACE FOR ANALYZING DATA
WITH APACHE HADOOP
SIMPLIFY AND INTEGRATE
FREE AND OPEN SOURCE
—> WEB “EXCEL” FOR HADOOP
VIEW FROM
30K FEET
Hadoop Web Server
You, your colleagues and even that
friend that uses IE9 ;)
WHY SPARK?
SIMPLER (PYTHON, STREAMING,
INTERACTIVE…)
OPENS UP DATA TO SCIENCE
SPARK —> MR
Apache Spark
Spark
Streaming
MLlib
(machine learning)
GraphX
(graph)
Spark SQL
WHY
IN HUE?
MARRIED WITH FULL HADOOP ECOSYSTEM
(Hive Tables, HDFS, Job Browser…)
WHY
IN HUE?
Multi user, YARN, Impersonation/Security
Not yet-another-app-to-install
...
• It works
HISTORY
V1: OOZIE
THE GOOD
• Submit through Oozie
• Slow
THE BAD
• It works better
HISTORY
V2: SPARK IGNITER
THE GOOD
• Compiler Jar
• Batch
THE BAD
• It works even better
• Scala / Python / R shells
• Jar / Py batches
• Notebook UI
• YARN
HISTORY
V3: NOTEBOOK
THE GOOD
• Still new
THE BAD
GENERAL
ARCHITECTURE
Livy
Spark
Spark
Spark
YARN
Backend partWeb part
GENERAL
ARCHITECTURE
Livy
Spark
Spark
Spark
YARN
Backend partWeb part
Notebook with snippets
WEB
ARCHITECTURE
Server
Spark
Scala
Common API
Pig Hive
Livy … HS2
Scala
Hive
Specific APIs
AJAX
create_session()
execute()
…
REST Thrift
OpenSession()
ExecuteStatement()
/session
/sessions/{sessionId}/statements
LIVY SPARK SERVER
• REST Web server in Scala
• Interactive Spark Sessions and Batch Jobs
• Type Introspection for Visualization
• Running sessions in YARN local
• Backends: Scala, Python, R
• Open Source:
https://github.com/cloudera/hue/tree/master/app
s/spark/java
LIVY
SPARK SERVER
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
Livy Server
YARN
Master
Scalatra
Spark Client
Session Manager
Session
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
3
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
3
4
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
3
4
5
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1
2
3
4
5
6
Livy Server
Scalatra
Session Manager
Session
LIVY WEB SERVER
ARCHITECTURE
YARN
Master
Spark Client
YARN
Node
Spark
Interpreter
Spark
Context
YARN
Node
Spark
Worker
YARN
Node
Spark
Worker
1 7
2
3
4
5
6
Livy Server
Scalatra
Session Manager
Session
SESSION CREATION
AND EXECUTION
% curl -XPOST localhost:8998/sessions 
-d '{"kind": "spark"}'
{
"id": 0,
"kind": "spark",
"log": [...],
"state": "idle"
}
% curl -XPOST localhost:8998/sessions/0/statements -d '{"code": "1+1"}'
{
"id": 0,
"output": {
"data": { "text/plain": "res0: Int = 2" },
"execution_count": 0,
"status": "ok"
},
"state": "available"
}
LIVY INTERPRETERS
Scala, Python, R…
INTERPRETERS
• Pipe stdin/stdout to a running shell
• Execute the code / send to Spark workers
• Perform magic operations
• One interpreter by language
• “Swappable” with other kernels (python,
spark..)
Interpreter
> println(1 + 1)
2
println(1 + 1)
2
INTERPRETER FLOW
CURL
Hue
Livy Server Livy Session Interpreter
1+1
2
{
“data”: {
“application/json”: “2”
}
}
1+1
2
INTERPRETER FLOW CHART
Receive lines Split lines
Send output
to server
Success
Incomplete
Merge with
next line
Error
Execute LineMagic!
Lines
left?
Magic line?
No
Yes
NoYes
LIVY INTERPRETERS
trait Interpreter {
def state: State
def execute(code: String): Future[JValue]
def close(): Unit
}
sealed trait State
case class NotStarted() extends State
case class Starting() extends State
case class Idle() extends State
case class Running() extends State
case class Busy() extends State
case class Error() extends State
case class ShuttingDown() extends State
case class Dead() extends State
LIVY INTERPRETERS
trait Interpreter {
def state: State
def execute(code: String): Future[JValue]
def close(): Unit
}
sealed trait State
case class NotStarted() extends State
case class Starting() extends State
case class Idle() extends State
case class Running() extends State
case class Busy() extends State
case class Error() extends State
case class ShuttingDown() extends State
case class Dead() extends State
SPARK INTERPRETER
class SparkInterpeter extends Interpreter {
…
private var _state: State = NotStarted()
private val outputStream = new ByteArrayOutputStream()
private var sparkIMain: SparkIMain = _
def start() = {
...
_state = Starting()
sparkIMain = new SparkIMain(new Settings(), new JPrintWriter(outputStream, true))
sparkIMain.initializeSynchronous()
...
Interpreter
new SparkIMain(new Settings(), new JPrintWriter(outputStream, true))
SPARK INTERPRETER
private var sparkContext: SparkContext = _
def start() = {
...
val sparkConf = new SparkConf(true)
sparkContext = new SparkContext(sparkConf)
sparkIMain.beQuietDuring {
sparkIMain.bind("sc", "org.apache.spark.SparkContext",
sparkContext, List("""@transient"""))
}
_state = Idle()
}
sparkIMain.bind("sc", "org.apache.spark.SparkContext",
sparkContext, List("""@transient"""))
EXECUTING SPARK
private def executeLine(code: String): ExecuteResult = {
code match {
case MAGIC_REGEX(magic, rest) =>
executeMagic(magic, rest)
case _ =>
scala.Console.withOut(outputStream) {
sparkIMain.interpret(code) match {
case Results.Success => ExecuteComplete(readStdout())
case Results.Incomplete => ExecuteIncomplete(readStdout())
case Results.Error => ExecuteError(readStdout())
}
...
case MAGIC_REGEX(magic, rest) =>
case _ =>
INTERPRETER MAGIC
private val MAGIC_REGEX = "^%(w+)W*(.*)".r
private def executeMagic(magic: String, rest: String): ExecuteResponse = {
magic match {
case "json" => executeJsonMagic(rest)
case "table" => executeTableMagic(rest)
case _ => ExecuteError(f"Unknown magic command $magic")
}
}
case "json" => executeJsonMagic(rest)
case "table" => executeTableMagic(rest)
case _ => ExecuteError(f"Unknown magic command $magic")
INTERPRETER MAGIC
private def executeJsonMagic(name: String): ExecuteResponse = {
sparkIMain.valueOfTerm(name) match {
case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map(
"application/json" -> value.asInstanceOf[RDD[_]].take(10))))
case Some(value) => ExecuteMagic(Extraction.decompose(Map(
"application/json" -> value)))
case None => ExecuteError(f"Value $name does not exist")
}
}
case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map(
"application/json" -> value.asInstanceOf[RDD[_]].take(10))))
case Some(value) => ExecuteMagic(Extraction.decompose(Map(
"application/json" -> value)))
TABLE MAGIC
"application/vnd.livy.table.v1+json": {
"headers": [
{ "name": "count", "type": "BIGINT_TYPE" },
{ "name": "name", "type": "STRING_TYPE" }
],
"data": [
[ 23407, "the" ],
[ 19540, "I" ],
[ 18358, "and" ],
...
]
}
val lines = sc.textFile("shakespeare.txt");
val counts = lines.
flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _).
sortBy(-_._2).
map { case (w, c) =>
Map("word" -> w, "count" -> c)
}
%table counts%table counts
TABLE MAGIC
"application/vnd.livy.table.v1+json": {
"headers": [
{ "name": "count", "type": "BIGINT_TYPE" },
{ "name": "name", "type": "STRING_TYPE" }
],
"data": [
[ 23407, "the" ],
[ 19540, "I" ],
[ 18358, "and" ],
...
]
}
val lines = sc.textFile("shakespeare.txt");
val counts = lines.
flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _).
sortBy(-_._2).
map { case (w, c) =>
Map("word" -> w, "count" -> c)
}
%table counts
JSON MAGIC
val lines = sc.textFile("shakespeare.txt");
val counts = lines.
flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _).
sortBy(-_._2).
map { case (w, c) =>
Map("word" -> w, "count" -> c)
}
%json counts
{
"id": 0,
"output": {
"application/json": [
{ "count": 506610, "word": "" },
{ "count": 23407, "word": "the" },
{ "count": 19540, "word": "I" },
...
]
...
}
%json counts
JSON MAGIC
val lines = sc.textFile("shakespeare.txt");
val counts = lines.
flatMap(line => line.split(" ")).
map(word => (word, 1)).
reduceByKey(_ + _).
sortBy(-_._2).
map { case (w, c) =>
Map("word" -> w, "count" -> c)
}
%json counts
{
"id": 0,
"output": {
"application/json": [
{ "count": 506610, "word": "" },
{ "count": 23407, "word": "the" },
{ "count": 19540, "word": "I" },
...
]
...
}
• Stability and Scaling
• Security
• iPython/Jupyter backends
and file format
COMING SOON
DEMO
TIME
TWITTER
@gethue
USER GROUP
hue-user@
WEBSITE
http://gethue.com
LEARN
http://learn.gethue.com
THANKS!

Contenu connexe

Tendances

Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Lucidworks
 
5分で説明する Play! scala
5分で説明する Play! scala5分で説明する Play! scala
5分で説明する Play! scala
masahitojp
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
Alexei Gorobets
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
Erik Hatcher
 

Tendances (20)

Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, LucidworksVisualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
Visualize Solr Data with Banana: Presented by Andrew Thanalertvisuti, Lucidworks
 
Faster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache SolrFaster Data Analytics with Apache Spark using Apache Solr
Faster Data Analytics with Apache Spark using Apache Solr
 
Solr as a Spark SQL Datasource
Solr as a Spark SQL DatasourceSolr as a Spark SQL Datasource
Solr as a Spark SQL Datasource
 
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
Solr and Spark for Real-Time Big Data Analytics: Presented by Tim Potter, Luc...
 
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
Faster Data Analytics with Apache Spark using Apache Solr - Kiran Chitturi, L...
 
5分で説明する Play! scala
5分で説明する Play! scala5分で説明する Play! scala
5分で説明する Play! scala
 
Ingesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScriptIngesting and Manipulating Data with JavaScript
Ingesting and Manipulating Data with JavaScript
 
2011/10/08_Playframework_GAE_to_Heroku
2011/10/08_Playframework_GAE_to_Heroku2011/10/08_Playframework_GAE_to_Heroku
2011/10/08_Playframework_GAE_to_Heroku
 
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, AirbnbAirbnb Search Architecture: Presented by Maxim Charkov, Airbnb
Airbnb Search Architecture: Presented by Maxim Charkov, Airbnb
 
Solr Indexing and Analysis Tricks
Solr Indexing and Analysis TricksSolr Indexing and Analysis Tricks
Solr Indexing and Analysis Tricks
 
MongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquareMongoSF - mongodb @ foursquare
MongoSF - mongodb @ foursquare
 
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, ClouderaParallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
Parallel SQL and Analytics with Solr: Presented by Yonik Seeley, Cloudera
 
Burn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websitesBurn down the silos! Helping dev and ops gel on high availability websites
Burn down the silos! Helping dev and ops gel on high availability websites
 
Apache SolrCloud
Apache SolrCloudApache SolrCloud
Apache SolrCloud
 
Oak Lucene Indexes
Oak Lucene IndexesOak Lucene Indexes
Oak Lucene Indexes
 
Search@airbnb
Search@airbnbSearch@airbnb
Search@airbnb
 
Real-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @MoldcampReal-time search in Drupal with Elasticsearch @Moldcamp
Real-time search in Drupal with Elasticsearch @Moldcamp
 
Rapid Prototyping with Solr
Rapid Prototyping with SolrRapid Prototyping with Solr
Rapid Prototyping with Solr
 
Node.js and Parse
Node.js and ParseNode.js and Parse
Node.js and Parse
 
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
Rebuilding Solr 6 Examples - Layer by Layer: Presented by Alexandre Rafalovit...
 

En vedette

Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
gethue
 

En vedette (20)

Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
Harness the power of Spark and Solr in Hue: Big Data Amsterdam v.2.0
 
Building a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a ServiceBuilding a REST Job Server for Interactive Spark as a Service
Building a REST Job Server for Interactive Spark as a Service
 
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
Spark Summit Europe: Building a REST Job Server for interactive Spark as a se...
 
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In SparkYggdrasil: Faster Decision Trees Using Column Partitioning In Spark
Yggdrasil: Faster Decision Trees Using Column Partitioning In Spark
 
Interactive Analytics using Apache Spark
Interactive Analytics using Apache SparkInteractive Analytics using Apache Spark
Interactive Analytics using Apache Spark
 
Huawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark StreamingHuawei Advanced Data Science With Spark Streaming
Huawei Advanced Data Science With Spark Streaming
 
Building Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemMLBuilding Custom Machine Learning Algorithms With Apache SystemML
Building Custom Machine Learning Algorithms With Apache SystemML
 
Apache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault ToleranceApache Spark Streaming: Architecture and Fault Tolerance
Apache Spark Streaming: Architecture and Fault Tolerance
 
Huohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For SparkHuohua: A Distributed Time Series Analysis Framework For Spark
Huohua: A Distributed Time Series Analysis Framework For Spark
 
Hadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS DeveloperHadoop and Spark for the SAS Developer
Hadoop and Spark for the SAS Developer
 
Low Latency Execution For Apache Spark
Low Latency Execution For Apache SparkLow Latency Execution For Apache Spark
Low Latency Execution For Apache Spark
 
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...Building a REST Job Server for interactive Spark as a service by Romain Rigau...
Building a REST Job Server for interactive Spark as a service by Romain Rigau...
 
Bolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarrayBolt: Building A Distributed ndarray
Bolt: Building A Distributed ndarray
 
Recent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced AnalyticsRecent Developments In SparkR For Advanced Analytics
Recent Developments In SparkR For Advanced Analytics
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Spark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 FuriousSpark And Cassandra: 2 Fast, 2 Furious
Spark And Cassandra: 2 Fast, 2 Furious
 
Airstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At AirbnbAirstream: Spark Streaming At Airbnb
Airstream: Spark Streaming At Airbnb
 
Re-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance UnderstandabilityRe-Architecting Spark For Performance Understandability
Re-Architecting Spark For Performance Understandability
 
Livy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache SparkLivy: A REST Web Service For Apache Spark
Livy: A REST Web Service For Apache Spark
 
A Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons LearnedA Journey into Databricks' Pipelines: Journey and Lessons Learned
A Journey into Databricks' Pipelines: Journey and Lessons Learned
 

Similaire à Big Data Scala by the Bay: Interactive Spark in your Browser

JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
David Padbury
 

Similaire à Big Data Scala by the Bay: Interactive Spark in your Browser (20)

Introduction to Scalding and Monoids
Introduction to Scalding and MonoidsIntroduction to Scalding and Monoids
Introduction to Scalding and Monoids
 
Introduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processingIntroduction to Apache Flink - Fast and reliable big data processing
Introduction to Apache Flink - Fast and reliable big data processing
 
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Iterative Spark Developmen...
 
Osd ctw spark
Osd ctw sparkOsd ctw spark
Osd ctw spark
 
Mist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache SparkMist - Serverless proxy to Apache Spark
Mist - Serverless proxy to Apache Spark
 
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
Data Summer Conf 2018, “Mist – Serverless proxy for Apache Spark (RUS)” — Vad...
 
Hadoop trainingin bangalore
Hadoop trainingin bangaloreHadoop trainingin bangalore
Hadoop trainingin bangalore
 
Apache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster ComputingApache Spark, the Next Generation Cluster Computing
Apache Spark, the Next Generation Cluster Computing
 
Things about Functional JavaScript
Things about Functional JavaScriptThings about Functional JavaScript
Things about Functional JavaScript
 
Scala @ TechMeetup Edinburgh
Scala @ TechMeetup EdinburghScala @ TechMeetup Edinburgh
Scala @ TechMeetup Edinburgh
 
Apache Flink & Graph Processing
Apache Flink & Graph ProcessingApache Flink & Graph Processing
Apache Flink & Graph Processing
 
JavaScript Growing Up
JavaScript Growing UpJavaScript Growing Up
JavaScript Growing Up
 
Nodejs Explained with Examples
Nodejs Explained with ExamplesNodejs Explained with Examples
Nodejs Explained with Examples
 
Nodejsexplained 101116115055-phpapp02
Nodejsexplained 101116115055-phpapp02Nodejsexplained 101116115055-phpapp02
Nodejsexplained 101116115055-phpapp02
 
NoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love StoryNoSQL and JavaScript: a Love Story
NoSQL and JavaScript: a Love Story
 
Exactly once with spark streaming
Exactly once with spark streamingExactly once with spark streaming
Exactly once with spark streaming
 
Advanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xinAdvanced spark training advanced spark internals and tuning reynold xin
Advanced spark training advanced spark internals and tuning reynold xin
 
Testing batch and streaming Spark applications
Testing batch and streaming Spark applicationsTesting batch and streaming Spark applications
Testing batch and streaming Spark applications
 
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
[QE 2018] Łukasz Gawron – Testing Batch and Streaming Spark Applications
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 

Plus de gethue

Plus de gethue (9)

Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop MeetupSqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
Sqoop2 refactoring for generic data transfer - NYC Sqoop Meetup
 
Hadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in HueHadoop Israel - HBase Browser in Hue
Hadoop Israel - HBase Browser in Hue
 
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop MeetupIntegrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
Integrate Hue with your Hadoop cluster - Yahoo! Hadoop Meetup
 
Hue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop SingaporeHue: The Hadoop UI - Hadoop Singapore
Hue: The Hadoop UI - Hadoop Singapore
 
SF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDKSF Dev Meetup - Hue SDK
SF Dev Meetup - Hue SDK
 
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF Hue: The Hadoop UI - Where we stand, Hue Meetup SF
Hue: The Hadoop UI - Where we stand, Hue Meetup SF
 
HBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User GroupHBase + Hue - LA HBase User Group
HBase + Hue - LA HBase User Group
 
Hue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG FranceHue: The Hadoop UI - HUG France
Hue: The Hadoop UI - HUG France
 
Hue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUGHue: The Hadoop UI - Stockholm HUG
Hue: The Hadoop UI - Stockholm HUG
 

Dernier

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
vexqp
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Klinik kandungan
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
nirzagarg
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
nirzagarg
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
Health
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
gajnagarg
 

Dernier (20)

怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Tumkur [ 7014168258 ] Call Me For Genuine Models We...
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
怎样办理伦敦大学城市学院毕业证(CITY毕业证书)成绩单学校原版复制
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptxThe-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
The-boAt-Story-Navigating-the-Waves-of-Innovation.pptx
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
Jual obat aborsi Bandung ( 085657271886 ) Cytote pil telat bulan penggugur ka...
 
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Satna [ 7014168258 ] Call Me For Genuine Models We ...
 
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
Top profile Call Girls In Purnia [ 7014168258 ] Call Me For Genuine Models We...
 
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
Top profile Call Girls In Hapur [ 7014168258 ] Call Me For Genuine Models We ...
 
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
+97470301568>>weed for sale in qatar ,weed for sale in dubai,weed for sale in...
 
Dubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls DubaiDubai Call Girls Peeing O525547819 Call Girls Dubai
Dubai Call Girls Peeing O525547819 Call Girls Dubai
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In bhavnagar [ 7014168258 ] Call Me For Genuine Models...
 

Big Data Scala by the Bay: Interactive Spark in your Browser

  • 1. INTERACTIVE SPARK IN YOUR BROWSER Romain Rigaux romain@cloudera.com Erick Tryzelaar erickt@cloudera.com
  • 2. GOAL OF HUE WEB INTERFACE FOR ANALYZING DATA WITH APACHE HADOOP SIMPLIFY AND INTEGRATE FREE AND OPEN SOURCE —> WEB “EXCEL” FOR HADOOP
  • 3. VIEW FROM 30K FEET Hadoop Web Server You, your colleagues and even that friend that uses IE9 ;)
  • 4. WHY SPARK? SIMPLER (PYTHON, STREAMING, INTERACTIVE…) OPENS UP DATA TO SCIENCE SPARK —> MR Apache Spark Spark Streaming MLlib (machine learning) GraphX (graph) Spark SQL
  • 5.
  • 6.
  • 7. WHY IN HUE? MARRIED WITH FULL HADOOP ECOSYSTEM (Hive Tables, HDFS, Job Browser…)
  • 8. WHY IN HUE? Multi user, YARN, Impersonation/Security Not yet-another-app-to-install ...
  • 9. • It works HISTORY V1: OOZIE THE GOOD • Submit through Oozie • Slow THE BAD
  • 10. • It works better HISTORY V2: SPARK IGNITER THE GOOD • Compiler Jar • Batch THE BAD
  • 11. • It works even better • Scala / Python / R shells • Jar / Py batches • Notebook UI • YARN HISTORY V3: NOTEBOOK THE GOOD • Still new THE BAD
  • 14. Notebook with snippets WEB ARCHITECTURE Server Spark Scala Common API Pig Hive Livy … HS2 Scala Hive Specific APIs AJAX create_session() execute() … REST Thrift OpenSession() ExecuteStatement() /session /sessions/{sessionId}/statements
  • 16. • REST Web server in Scala • Interactive Spark Sessions and Batch Jobs • Type Introspection for Visualization • Running sessions in YARN local • Backends: Scala, Python, R • Open Source: https://github.com/cloudera/hue/tree/master/app s/spark/java LIVY SPARK SERVER
  • 17. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker Livy Server Scalatra Session Manager Session
  • 18. LIVY WEB SERVER ARCHITECTURE Livy Server YARN Master Scalatra Spark Client Session Manager Session YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1
  • 19. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 Livy Server Scalatra Session Manager Session
  • 20. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 3 Livy Server Scalatra Session Manager Session
  • 21. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 3 4 Livy Server Scalatra Session Manager Session
  • 22. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 3 4 5 Livy Server Scalatra Session Manager Session
  • 23. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 2 3 4 5 6 Livy Server Scalatra Session Manager Session
  • 24. LIVY WEB SERVER ARCHITECTURE YARN Master Spark Client YARN Node Spark Interpreter Spark Context YARN Node Spark Worker YARN Node Spark Worker 1 7 2 3 4 5 6 Livy Server Scalatra Session Manager Session
  • 25. SESSION CREATION AND EXECUTION % curl -XPOST localhost:8998/sessions -d '{"kind": "spark"}' { "id": 0, "kind": "spark", "log": [...], "state": "idle" } % curl -XPOST localhost:8998/sessions/0/statements -d '{"code": "1+1"}' { "id": 0, "output": { "data": { "text/plain": "res0: Int = 2" }, "execution_count": 0, "status": "ok" }, "state": "available" }
  • 27. INTERPRETERS • Pipe stdin/stdout to a running shell • Execute the code / send to Spark workers • Perform magic operations • One interpreter by language • “Swappable” with other kernels (python, spark..) Interpreter > println(1 + 1) 2 println(1 + 1) 2
  • 28. INTERPRETER FLOW CURL Hue Livy Server Livy Session Interpreter 1+1 2 { “data”: { “application/json”: “2” } } 1+1 2
  • 29. INTERPRETER FLOW CHART Receive lines Split lines Send output to server Success Incomplete Merge with next line Error Execute LineMagic! Lines left? Magic line? No Yes NoYes
  • 30. LIVY INTERPRETERS trait Interpreter { def state: State def execute(code: String): Future[JValue] def close(): Unit } sealed trait State case class NotStarted() extends State case class Starting() extends State case class Idle() extends State case class Running() extends State case class Busy() extends State case class Error() extends State case class ShuttingDown() extends State case class Dead() extends State
  • 31. LIVY INTERPRETERS trait Interpreter { def state: State def execute(code: String): Future[JValue] def close(): Unit } sealed trait State case class NotStarted() extends State case class Starting() extends State case class Idle() extends State case class Running() extends State case class Busy() extends State case class Error() extends State case class ShuttingDown() extends State case class Dead() extends State
  • 32. SPARK INTERPRETER class SparkInterpeter extends Interpreter { … private var _state: State = NotStarted() private val outputStream = new ByteArrayOutputStream() private var sparkIMain: SparkIMain = _ def start() = { ... _state = Starting() sparkIMain = new SparkIMain(new Settings(), new JPrintWriter(outputStream, true)) sparkIMain.initializeSynchronous() ... Interpreter new SparkIMain(new Settings(), new JPrintWriter(outputStream, true))
  • 33. SPARK INTERPRETER private var sparkContext: SparkContext = _ def start() = { ... val sparkConf = new SparkConf(true) sparkContext = new SparkContext(sparkConf) sparkIMain.beQuietDuring { sparkIMain.bind("sc", "org.apache.spark.SparkContext", sparkContext, List("""@transient""")) } _state = Idle() } sparkIMain.bind("sc", "org.apache.spark.SparkContext", sparkContext, List("""@transient"""))
  • 34. EXECUTING SPARK private def executeLine(code: String): ExecuteResult = { code match { case MAGIC_REGEX(magic, rest) => executeMagic(magic, rest) case _ => scala.Console.withOut(outputStream) { sparkIMain.interpret(code) match { case Results.Success => ExecuteComplete(readStdout()) case Results.Incomplete => ExecuteIncomplete(readStdout()) case Results.Error => ExecuteError(readStdout()) } ... case MAGIC_REGEX(magic, rest) => case _ =>
  • 35. INTERPRETER MAGIC private val MAGIC_REGEX = "^%(w+)W*(.*)".r private def executeMagic(magic: String, rest: String): ExecuteResponse = { magic match { case "json" => executeJsonMagic(rest) case "table" => executeTableMagic(rest) case _ => ExecuteError(f"Unknown magic command $magic") } } case "json" => executeJsonMagic(rest) case "table" => executeTableMagic(rest) case _ => ExecuteError(f"Unknown magic command $magic")
  • 36. INTERPRETER MAGIC private def executeJsonMagic(name: String): ExecuteResponse = { sparkIMain.valueOfTerm(name) match { case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value.asInstanceOf[RDD[_]].take(10)))) case Some(value) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value))) case None => ExecuteError(f"Value $name does not exist") } } case Some(value: RDD[_]) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value.asInstanceOf[RDD[_]].take(10)))) case Some(value) => ExecuteMagic(Extraction.decompose(Map( "application/json" -> value)))
  • 37. TABLE MAGIC "application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ] } val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts%table counts
  • 38. TABLE MAGIC "application/vnd.livy.table.v1+json": { "headers": [ { "name": "count", "type": "BIGINT_TYPE" }, { "name": "name", "type": "STRING_TYPE" } ], "data": [ [ 23407, "the" ], [ 19540, "I" ], [ 18358, "and" ], ... ] } val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %table counts
  • 39. JSON MAGIC val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts { "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ... } %json counts
  • 40. JSON MAGIC val lines = sc.textFile("shakespeare.txt"); val counts = lines. flatMap(line => line.split(" ")). map(word => (word, 1)). reduceByKey(_ + _). sortBy(-_._2). map { case (w, c) => Map("word" -> w, "count" -> c) } %json counts { "id": 0, "output": { "application/json": [ { "count": 506610, "word": "" }, { "count": 23407, "word": "the" }, { "count": 19540, "word": "I" }, ... ] ... }
  • 41. • Stability and Scaling • Security • iPython/Jupyter backends and file format COMING SOON

Notes de l'éditeur

  1. Why do we want to do this? Currently it’s difficult to visualize results from Spark. Spark has a great interactive tool called “spark-shell” that allows you to interact with large datasets on the commandline. For example, here is a session where we are counting the words used by shakespeare. Running this computation is easy, but spark-shell doesn’t provide any tools for visualizing the results.
  2. One option is to save the output to a file, then use a tool like Hue to import it into a Hive table and visualize it. We are obviously big fans of Hue, but there are still too many steps to go through to get to this point. If we want to change the script, say to filter out words like “the” and “and”, we need to go back to the shell, rerun our code snippet, save it to a file, then reimport it into the UI. It’s a slow process.
  3. Multi languages Inherit Hue’s sharing, export/import
  4. Hello, I’m Erick Tryzelaar, and I’m going to talk about the Livy Spark Server, which is our backend for Hue’s Notebook application.
  5. Livy is a REST web server that allows a tool like Hue to interactively execute scala and spark commands, just like spark-shell. It goes beyond it by adding type introspection, which allows a frontend like Hue to render results in interactive visualizations. Furthermore it allows sessions to be run inside YARN to support horizontally scaling out to hundreds of active sessions. It also supports a Python and R backend. Finally, it’s fully open source, and currently being developed in Hue.
  6. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  7. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  8. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  9. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  10. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  11. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  12. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  13. The Livy server is built upon Scalatra and Jetty. Creating a session is as simple as POSTing to a particular URL. Behind the scenes, livy will communicate with the YARN master to allocate some nodes to launch the interactive sessions. This is all done asynchronously as there’s not telling when there will be resources available to run the sessions. Once the nodes have been allocated, Livy will start an intepreter on one of the nodes which takes care of creating the Spark Context, which actually runs the spark operations. After it’s setup, the session signals to the Livy Server that it’s ready for commands. At that point, the Client can simply POST their code to a url on the livy server.
  14. Let’s see it in action. On the left we see creating a “spark” session. You could also fill in “pyspark” and “sparkR” here if you want those sessions. On the right is us executing simple math in the session itself.
  15. We don’t have too much time to drill down into the code, but we did want to take this moment to at least dive into how the interpreters work.
  16. Livy’s interpreters are conceptually very simple devices. They take in one or more lines of code and execute them in a shell environment. These shells perform the computation and interact with the spark environment. They’re also abstract. As I mentioned earlier, Livy currently has 3 languages built into it: Scala, Python and R, with more to come.
  17. Here is the interpreter loop that livy manages. First is to split up the lines and feed them one at a time into the interpreter. If the line is a regular, non-magic line, it gets executed and the result can be of three states. Success, where we’ll continue to execute the next line, incomplete, where the input is not a complete statement, such as an “if” statement with an open bracket. Or an error, which stops the execution of these lines. The other case are magic lines, which are special commands to the interpreter itself. For example, asking the interpreter to convert a value into a json type.
  18. Now for some code. As we saw earlier, the interpreter is a simple state machine that executes code and eventually produces JSON responses by way of a Future.
  19. Now for some code. As we saw earlier, the interpreter is a simple state machine that executes code and eventually produces JSON responses by way of a Future.
  20. In order to implement this interface, the spark interpreter needs to first create the real interpreter, SparkIMain. It’s pretty simple to create. We just need to construct it with a buffer that acts as the interpreters Standard Output.
  21. Once the SparkIMain has been initialized, we need to create the Spark Context that communicates with all of the spark workers. Injecting this variable into the interpreter is quite simple with this “bind” method.
  22. Now that the session is up and running we can execute code inside of it. I’ve skipped some of the other book keeping in order to show the actual heart of the execution here. Ignore the magic case at the moment. Execution is also quite simple, we first temporarily replace standard out with our buffer, and then have the interpreter execute the code. There are three conditions for the response. First the command executed. Second, this code is incomplete because maybe it has an open parenthesis. Finally, an error if some exception occurred. Altogether quite simple and doesn’t require any changes to Spark to do this.
  23. And now the magic. I mentioned earlier that livy supports type introspection. The way it does it is through these in-band magic commands which start with the percent command. The spark interpreter currently supports two magic commands, “json” and “table”. The “json” will convert any type into a json value, and “table” will convert any type into a table-ish object that’s used for our visualization.
  24. Here is our json magic. it takes advantage of json4s’s Extraction.decompose to try to convert values. We special case RDDs since they can’t be directly transformed into json. Instead we just pull out the first 10 items so we can at least show something.
  25. The table magic does something similar, but it’s a bit large to compress into slides. We’ll see it’s results next.
  26. The table magic does something similar, but it’s a bit large to compress into slides. We’ll see it’s results next.
  27. Finally here it is in action. Here we’re taking our shakespeare code from earlier. If we run this snippet inside livy, it returns an output mimetype of application/json, with the results inlined without encoding in the output.
  28. Finally here it is in action. Here we’re taking our shakespeare code from earlier. If we run this snippet inside livy, it returns an output mimetype of application/json, with the results inlined without encoding in the output.
  29. Fingers crossed for a lot of reasons, it’s master and the VM was broken till 4 AM. Next: learn more