SlideShare une entreprise Scribd logo
1  sur  39
Télécharger pour lire hors ligne
@DTAIEB55
Taking Jupyter Notebooks and
Apache Spark to the next level with
PixieDust
David Taieb
Distinguished Engineer
IBM Watson Data Platform, Developer Advocacy
@DTAIEB55
@DTAIEB55
•  More companies making bet-the-business data driven decisions
•  Good news: they are drowning in Data
•  Bad news: they are drowning in Data
•  Solving the Data problems of tomorrow cannot be done by data
scientists alone.
•  Developers are getting more involved with Data Science, moving
from stovepipe applications to “data pipelines” that integrate data
and analytics.
WHY ARE YOU HERE?
@DTAIEB55
Disclaimer: All characters and events depicted in this story are entirely
fictitious. Any similarity to actual use cases, events or persons is actually
intentional.
Let’s start with a story… we all know too well.
How do we blur the lines between
developers and data scientists?
@DTAIEB55
•  Hold a master degree in computer science
•  10 year experience, 6 years with the company
•  Languages of choice: Java, Node.js, HTML5/CSS3
•  Data: No SQL (Cloudant, Mongo), relational
•  No major experience with Big Data
T H E D E V E L O P E R
“The best line of code is the one I didn't have to write!”
MEET BEN
@DTAIEB55
•  Hold a PHD in data science
•  5 year experience, 2 years with the company
•  Experienced in Python and R
•  Expert in Machine Learning and Data visualization
•  Software engineering is not her thing
T H E D A T A S C I E N T I S T
“In God we trust. All others bring data.”
— W. Edwards Deming
MEET NATASHA
@DTAIEB55
SURPRISE MEETING
With the VP of Development
“We have an urgent need for our marketing department
to build an application that can provide real-time sentiment
analysis on Twitter data.”
https://unsplash.com/search/meeting?photo=3fPXt37X6UQ
@DTAIEB55
•  You only have 6 weeks to build the application
•  Target consumer is the business-focused user
•  Must be easy to use even for non technical people
•  It must scale out of the box
•  I want you to look at Apache Spark
KEY CONSTRAINTS
@DTAIEB55
— NATASHA
“What exactly is Apache Spark?”
SOME LEARNING TO DO…
@DTAIEB55
Great Question Natasha!
B e s t w a y t o a n s w e r i t i s t o a r r a n g e a t i c k e t t o t h e
S p a r k S u m m i t f o r y o u t o f i n d o u t
@DTAIEB55
Batch Job
(spark-submit)
Interactive
NotebookSpark
Application
(driver)
Master
(cluster manager)
Spark Cluster
Worker Node Worker Node
...
Notebook Server
Browser
Kernel
Master
(cluster manager)
Spark Cluster
Worker Node Worker Node
...
RDD Partitioning
Task packaging and
dispatching
Worker node scheduling
CONSUMING SPARK
@DTAIEB55
— NATASHA
“What exactly is a Notebook?”
— BEN
CAN WE COLLABORATE USING NOTEBOOKS?
@DTAIEB55
http://ia.media-imdb.com/images/M/MV5BMTk3OTM5Njg5M15BMl5BanBnXkFtZTYwMzA0ODI3._V1_SX640_SY720_.jpg
RYAN
GOSLING
MOVIE?
@DTAIEB55
WHAT IS A NOTEBOOK?
•  Web based UI for
running Apache
Spark console
commands
•  Easy, no install spark
accelerator
•  Best way to start
working with spark
•  Multiple flavors
•  Jupyter
•  Zeppelin
•  Local or cloud hosted
•  IBM Data Science
Experience
•  Databricks
Text
Annotations
Code
Data
Visualizations
Widgets
Output
@DTAIEB55
What is Jupyter?
"Open	source,	interac0ve	data	science	and	scien0fic	
compu0ng"	
– Formerly	IPython	
– Large,	open,	growing	community	and	ecosystem	
Very	popular	
– “~2	million	users	for	IPython”	[1]	
– $6m	in	funding	in	2015	[3]	
– 200	contributors	to	notebook	subproject	alone	[4]	
– 275,000	public	notebooks	on	GitHub	[2]	
	
Browser
Kernel
Code Output
https://www.bluetrack.com/uploads/items_images/kernel-of-corn-stress-balls1_thumb.jpg?r=1
@DTAIEB55
BIG DATA ANALYSIS
code
results
Kernel with Spark
support
Services:
Congitive, …
Libraries:
Statistics, Math,
Machine Learning,
Plotting,
Data
(flat files, relational database,
NoSQL database, …)
Worker
Worker
...
Worker
@DTAIEB55
— BEN
“But they seem complicated for
developers like me”
NOTEBOOKS ARE POWERFUL TOOLS FOR DATA SCIENTISTS
@DTAIEB55
•  Visualize data (e.g., Table, Charts, Map, etc)
•  Data Management with PixieApps
•  Download/export data (e.g., File, Cloudant, etc.)
•  Use Scala directly in a Python notebook
•  Install Spark packages into Python notebook
•  Spark job progress monitor
•  Extensible
Open Source Python helper library for Jupyter Notebooks
ENTER PIXIEDUST
https://github.com/ibm-cds-labs/pixiedust
@DTAIEB55
DEMO: PIXIEDUST DATA
VISUALIZATION
Easy Visualizations
https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/PixieDust%201%20-%20Easy%20Visualizations.ipynb
Working with External Data
https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/PixieDust%202%20-%20Working%20with%20External%20Data.ipynb
@DTAIEB55
— BEN
“But I am really more comfortable with Scala”
I AM OK TO USE PYTHON
@DTAIEB55
SCALA NOTEBOOKS
PixieDust also
works with Scala
Notebooks
Same PixieDust Scala
APIs as in Python
@DTAIEB55
— NATASHA
“Expressing everything in code is
nice but LOB users will not be
able to linearly run large number
of cells”
WHAT ABOUT THE LINE OF BUSINESS USER?
@DTAIEB55
Enter PixieApps
•  PixieApps are Python classes used to write UI for your analytics that runs directly in a Jupyter
Notebook
•  Easy to build: mostly HTML and CSS with some custom attributes (micro-format style)
•  Leverage PixieDust Display visualization for charting
•  With PixieApps you can:
•  Create different html views with routes to invoke them
•  Invoke Python Scripts from user interactions
•  Run in the notebook cell output or in a Dialog
•  and much more…
•  Use cases:
•  Dashboards
•  Data Browsers
•  Data Pipeline Management
@DTAIEB55
Demo: PeaceTech GroundTruth Global Dashboard
@DTAIEB55
Demo: PeaceTech GroundTruth Global Dashboard
@DTAIEB55
Demo: Data Browser for Cloudant/CouchDB
@DTAIEB55
— BEN
“Do I need to learn yet another framework?”
WHAT DOES IT TAKE TO BUILD A PIXIEAPP?
@DTAIEB55
PIXIEAPP HELLO WORLD
Import app package to start things off
Simple annotation to tell PixieDust it’s an app
Define the default route (no args).
Method will return the view’s html fragment
Html fragment for the view.
Allows Jinja2 template macros
Define a new route that triggers when option
clicked is set to true
set option clicked to true when button is
pushed, so that correct route is loaded next
Import app package to start things off
@DTAIEB55
PIXIEAPP HELLO WORLD WITH DATA
Placeholder div for displaying data
Display the output in the specified target
Entity binding: use the df passed by user
Allows binding of any entity created by the app
Specify Display options for visualization
Pass data to the app
@DTAIEB55
OK, I’M SOLD…
L E T ’ S A G R E E O N T H E A R C H I T E C T U R E
@DTAIEB55
•  I’ll work on data acquisition from Twitter
and enrichment with sentiment analysis
scores using Spark Streaming
•  I know Java very well, but I don’t have
time to learn Python.
•  However, I am willing to learn Scala if
that helps improve my productivity
S T A R T B R A I N S T O R M I N G
•  I’ll perform the data exploration and
analysis
•  I know Python and R, but I am not
familiar enough with Java or Scala
•  I like pandas and numpy. I’m ok to learn
Spark but expect the same level of apis
•  I need to work iteratively with the data
I’ll need to do some data exploration too. I’ll need APIs to access my data.
BEN and NATASHA
@DTAIEB55
•  Implement a Spark Streaming
connector to Twitter
•  Call Watson Tone Analyzer for each
tweets
•  Return a Spark DataFrame with the
tweets enriched with Tone scores
•  Code written in Scala, delivered as a
Jar
D I V I D I N G T H E T A S K S
BEN and NATASHA
•  Works in a Python Notebook
•  Using PixieDust PackageManager,
install the Scala library delivered by
Ben to load the twitter data with Tone
scores
•  Using PixieDust display() api, perform
the data exploration and analysis:
trending hashtags and sentiments
•  Produce visualizations to LOB Users
@DTAIEB55
http://www.ibm.com/watson/developercloud/tone-analyzer.html
WATSON TONE ANALYZER
•  Uses linguistic analysis to detect 3
types of tones
•  Emotion
•  Social Tendencies
•  Language Styles
•  Available as a cloud service on IBM
Bluemix
@DTAIEB55
DEMO
Twitter Sentiment Analysis Dashboard with PixieApp
Sentiment Analysis of Twitter Hashtags with Spark
https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/Twitter%20Sentiment%20with%20Watson%20and%20Pixiedust.ipynb
https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with-spark-7ee6ca5c1585
@DTAIEB55
MEETING WITH THE VP
“SUCCESS!!”
https://unsplash.com/search/meeting?photo=3fPXt37X6UQ
@DTAIEB55
What’s next for PixieDust
•  Support Visualization for Streaming data
–  Start with Structured Streaming and IBM Streams
•  Ability to publish/embed PixieApps into Web Application (Nodejs to
begin with)
•  PixieDust visualization enhancements
–  Custom colors
–  Custom GeoJSON layers for maps
–  Sorting/filtering
–  More renderers: Brunel, ArcGIS, etc.
–  …
•  Ability to run Node.js code to load and visualize data
•  Support for Jupyter Labs and Jupyter Hub
@DTAIEB55
As always…
We look forward for your feedback
and pull requests on GitHub
https://github.com/ibm-cds-labs/pixiedust
@DTAIEB55
CONCLUSION
•  Solving the Data problems of tomorrow cannot be done by
data scientists alone.
•  Notebooks, considered by most to be the domain of data
scientists, can help break down traditional silos and help
team of all types who are working on data problems
Try it for yourself today:
•  IBM Data Science Experience
http://datascience.ibm.com/
•  Locally using PixieDust automated installer
https://ibm-cds-labs.github.io/pixiedust/install.html
@DTAIEB55
RESOURCES
•  https://github.com/ibm-cds-labs/pixiedust
•  https://ibm-cds-labs.github.io/pixiedust
•  https://medium.com/ibm-watson-data-lab/i-am-not-a-data-scientist-efe7ca6ceba2
•  https://spark.apache.org
•  https://www.ibm.com/us-en/marketplace/spark-as-a-service
•  http://datascience.ibm.com
•  https://www.ibm.com/watson/developercloud/tone-analyzer.html
•  https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with-
spark-7ee6ca5c1585
•  https://ibm.biz/pixiedustvis
•  https://ibm.biz/pixiedustlab
@DTAIEB55
Questions

Contenu connexe

Tendances

Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Databricks
 
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Databricks
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetCarl W. Handlin
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIOJozo Kovac
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at TwitterPrasad Wagle
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Databricks
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Databricks
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoSpark Summit
 
Saving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AISaving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AIDatabricks
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)Albert Wong
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaDatabricks
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Data Con LA
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High GearSpark Summit
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at TwitterPrasad Wagle
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseDatabricks
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsDatabricks
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeSpark Summit
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)Jasjeet Thind
 

Tendances (20)

Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
Semantic Search: Fast Results from Large, Non-Native Language Corpora with Ro...
 
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
Monitoring Half a Million ML Models, IoT Streaming Data, and Automated Qualit...
 
Open Source DataViz with Apache Superset
Open Source DataViz with Apache SupersetOpen Source DataViz with Apache Superset
Open Source DataViz with Apache Superset
 
Realtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIORealtime streaming architecture in INFINARIO
Realtime streaming architecture in INFINARIO
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
Extracting Insights from Data at Twitter
Extracting Insights from Data at TwitterExtracting Insights from Data at Twitter
Extracting Insights from Data at Twitter
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
Debugging Big Data Analytics in Apache Spark with BigDebug with Muhammad Gulz...
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
Mastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott CordoMastering Your Customer Data on Apache Spark by Elliott Cordo
Mastering Your Customer Data on Apache Spark by Elliott Cordo
 
Saving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AISaving Energy in Homes with a Unified Approach to Data and AI
Saving Energy in Homes with a Unified Approach to Data and AI
 
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
2016 Tableau in the Cloud - A Netflix Original (AWS Re:invent)
 
Building Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks DeltaBuilding Robust Production Data Pipelines with Databricks Delta
Building Robust Production Data Pipelines with Databricks Delta
 
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
Big Data Day LA 2015 - Machine Learning on Largish Data by Szilard Pafka of E...
 
Shifting Data Science into High Gear
Shifting Data Science into High GearShifting Data Science into High Gear
Shifting Data Science into High Gear
 
Zeppelin at Twitter
Zeppelin at TwitterZeppelin at Twitter
Zeppelin at Twitter
 
Building the Artificially Intelligent Enterprise
Building the Artificially Intelligent EnterpriseBuilding the Artificially Intelligent Enterprise
Building the Artificially Intelligent Enterprise
 
Bring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science WorkflowsBring Satellite and Drone Imagery into your Data Science Workflows
Bring Satellite and Drone Imagery into your Data Science Workflows
 
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo LeeData Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
Data Science lifecycle with Apache Zeppelin and Spark by Moonsoo Lee
 
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)DevOps and Machine Learning (Geekwire Cloud Tech Summit)
DevOps and Machine Learning (Geekwire Cloud Tech Summit)
 

Similaire à Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with David Taieb

Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and PythonTravis Oliphant
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Rehgan Avon
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with SparkKrishna Sankar
 
Eye-catching science: free tools to create data visualizations and infographics
Eye-catching science: free tools to create data visualizations and infographicsEye-catching science: free tools to create data visualizations and infographics
Eye-catching science: free tools to create data visualizations and infographicsFuture Earth
 
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Thiago de Faria
 
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Codemotion
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascienceAdam Muise
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...Big Data Spain
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production Paolo Platter
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 KeynotePeter Wang
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkKenny Bastani
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data ScienceDataWorks Summit
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017Rittman Analytics
 
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 Dataiku
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)stelligence
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014Tom LaGatta
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Simplilearn
 

Similaire à Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with David Taieb (20)

Continuum Analytics and Python
Continuum Analytics and PythonContinuum Analytics and Python
Continuum Analytics and Python
 
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
Kelly O'Briant - DataOps in the Cloud: How To Supercharge Data Science with a...
 
Data Science with Spark
Data Science with SparkData Science with Spark
Data Science with Spark
 
Eye-catching science: free tools to create data visualizations and infographics
Eye-catching science: free tools to create data visualizations and infographicsEye-catching science: free tools to create data visualizations and infographics
Eye-catching science: free tools to create data visualizations and infographics
 
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
Codemotion Milan 2018 - AI with a devops mindset: experimentation, sharing an...
 
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
Thiago de Faria - AI with a devops mindset - experimentation, sharing and eas...
 
Paytm labs soyouwanttodatascience
Paytm labs soyouwanttodatasciencePaytm labs soyouwanttodatascience
Paytm labs soyouwanttodatascience
 
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
State of Play. Data Science on Hadoop in 2015 by SEAN OWEN at Big Data Spain ...
 
Bringing Deep Learning into production
Bringing Deep Learning into production Bringing Deep Learning into production
Bringing Deep Learning into production
 
PyData Texas 2015 Keynote
PyData Texas 2015 KeynotePyData Texas 2015 Keynote
PyData Texas 2015 Keynote
 
Big Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache SparkBig Graph Analytics on Neo4j with Apache Spark
Big Graph Analytics on Neo4j with Apache Spark
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
The Future of Data Science
The Future of Data ScienceThe Future of Data Science
The Future of Data Science
 
How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017How a Tweet Went Viral - BIWA Summit 2017
How a Tweet Went Viral - BIWA Summit 2017
 
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
apidays LIVE Australia 2021 - Tracing across your distributed process boundar...
 
The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016 The Rise of the DataOps - Dataiku - J On the Beach 2016
The Rise of the DataOps - Dataiku - J On the Beach 2016
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data Science
 
Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...Data Science With Python | Python For Data Science | Python Data Science Cour...
Data Science With Python | Python For Data Science | Python Data Science Cour...
 

Plus de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 

Plus de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 

Dernier

(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxolyaivanovalion
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Callshivangimorya083
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxolyaivanovalion
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxMohammedJunaid861692
 

Dernier (20)

꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
꧁❤ Aerocity Call Girls Service Aerocity Delhi ❤꧂ 9999965857 ☎️ Hard And Sexy ...
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
Smarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptxSmarteg dropshipping via API with DroFx.pptx
Smarteg dropshipping via API with DroFx.pptx
 
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
꧁❤ Greater Noida Call Girls Delhi ❤꧂ 9711199171 ☎️ Hard And Sexy Vip Call
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Zuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptxZuja dropshipping via API with DroFx.pptx
Zuja dropshipping via API with DroFx.pptx
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptxBPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
BPAC WITH UFSBI GENERAL PRESENTATION 18_05_2017-1.pptx
 

Taking Jupyter Notebooks and Apache Spark to the Next Level PixieDust with David Taieb

  • 1. @DTAIEB55 Taking Jupyter Notebooks and Apache Spark to the next level with PixieDust David Taieb Distinguished Engineer IBM Watson Data Platform, Developer Advocacy @DTAIEB55
  • 2. @DTAIEB55 •  More companies making bet-the-business data driven decisions •  Good news: they are drowning in Data •  Bad news: they are drowning in Data •  Solving the Data problems of tomorrow cannot be done by data scientists alone. •  Developers are getting more involved with Data Science, moving from stovepipe applications to “data pipelines” that integrate data and analytics. WHY ARE YOU HERE?
  • 3. @DTAIEB55 Disclaimer: All characters and events depicted in this story are entirely fictitious. Any similarity to actual use cases, events or persons is actually intentional. Let’s start with a story… we all know too well. How do we blur the lines between developers and data scientists?
  • 4. @DTAIEB55 •  Hold a master degree in computer science •  10 year experience, 6 years with the company •  Languages of choice: Java, Node.js, HTML5/CSS3 •  Data: No SQL (Cloudant, Mongo), relational •  No major experience with Big Data T H E D E V E L O P E R “The best line of code is the one I didn't have to write!” MEET BEN
  • 5. @DTAIEB55 •  Hold a PHD in data science •  5 year experience, 2 years with the company •  Experienced in Python and R •  Expert in Machine Learning and Data visualization •  Software engineering is not her thing T H E D A T A S C I E N T I S T “In God we trust. All others bring data.” — W. Edwards Deming MEET NATASHA
  • 6. @DTAIEB55 SURPRISE MEETING With the VP of Development “We have an urgent need for our marketing department to build an application that can provide real-time sentiment analysis on Twitter data.” https://unsplash.com/search/meeting?photo=3fPXt37X6UQ
  • 7. @DTAIEB55 •  You only have 6 weeks to build the application •  Target consumer is the business-focused user •  Must be easy to use even for non technical people •  It must scale out of the box •  I want you to look at Apache Spark KEY CONSTRAINTS
  • 8. @DTAIEB55 — NATASHA “What exactly is Apache Spark?” SOME LEARNING TO DO…
  • 9. @DTAIEB55 Great Question Natasha! B e s t w a y t o a n s w e r i t i s t o a r r a n g e a t i c k e t t o t h e S p a r k S u m m i t f o r y o u t o f i n d o u t
  • 10. @DTAIEB55 Batch Job (spark-submit) Interactive NotebookSpark Application (driver) Master (cluster manager) Spark Cluster Worker Node Worker Node ... Notebook Server Browser Kernel Master (cluster manager) Spark Cluster Worker Node Worker Node ... RDD Partitioning Task packaging and dispatching Worker node scheduling CONSUMING SPARK
  • 11. @DTAIEB55 — NATASHA “What exactly is a Notebook?” — BEN CAN WE COLLABORATE USING NOTEBOOKS?
  • 13. @DTAIEB55 WHAT IS A NOTEBOOK? •  Web based UI for running Apache Spark console commands •  Easy, no install spark accelerator •  Best way to start working with spark •  Multiple flavors •  Jupyter •  Zeppelin •  Local or cloud hosted •  IBM Data Science Experience •  Databricks Text Annotations Code Data Visualizations Widgets Output
  • 15. @DTAIEB55 BIG DATA ANALYSIS code results Kernel with Spark support Services: Congitive, … Libraries: Statistics, Math, Machine Learning, Plotting, Data (flat files, relational database, NoSQL database, …) Worker Worker ... Worker
  • 16. @DTAIEB55 — BEN “But they seem complicated for developers like me” NOTEBOOKS ARE POWERFUL TOOLS FOR DATA SCIENTISTS
  • 17. @DTAIEB55 •  Visualize data (e.g., Table, Charts, Map, etc) •  Data Management with PixieApps •  Download/export data (e.g., File, Cloudant, etc.) •  Use Scala directly in a Python notebook •  Install Spark packages into Python notebook •  Spark job progress monitor •  Extensible Open Source Python helper library for Jupyter Notebooks ENTER PIXIEDUST https://github.com/ibm-cds-labs/pixiedust
  • 18. @DTAIEB55 DEMO: PIXIEDUST DATA VISUALIZATION Easy Visualizations https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/PixieDust%201%20-%20Easy%20Visualizations.ipynb Working with External Data https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/PixieDust%202%20-%20Working%20with%20External%20Data.ipynb
  • 19. @DTAIEB55 — BEN “But I am really more comfortable with Scala” I AM OK TO USE PYTHON
  • 20. @DTAIEB55 SCALA NOTEBOOKS PixieDust also works with Scala Notebooks Same PixieDust Scala APIs as in Python
  • 21. @DTAIEB55 — NATASHA “Expressing everything in code is nice but LOB users will not be able to linearly run large number of cells” WHAT ABOUT THE LINE OF BUSINESS USER?
  • 22. @DTAIEB55 Enter PixieApps •  PixieApps are Python classes used to write UI for your analytics that runs directly in a Jupyter Notebook •  Easy to build: mostly HTML and CSS with some custom attributes (micro-format style) •  Leverage PixieDust Display visualization for charting •  With PixieApps you can: •  Create different html views with routes to invoke them •  Invoke Python Scripts from user interactions •  Run in the notebook cell output or in a Dialog •  and much more… •  Use cases: •  Dashboards •  Data Browsers •  Data Pipeline Management
  • 25. @DTAIEB55 Demo: Data Browser for Cloudant/CouchDB
  • 26. @DTAIEB55 — BEN “Do I need to learn yet another framework?” WHAT DOES IT TAKE TO BUILD A PIXIEAPP?
  • 27. @DTAIEB55 PIXIEAPP HELLO WORLD Import app package to start things off Simple annotation to tell PixieDust it’s an app Define the default route (no args). Method will return the view’s html fragment Html fragment for the view. Allows Jinja2 template macros Define a new route that triggers when option clicked is set to true set option clicked to true when button is pushed, so that correct route is loaded next Import app package to start things off
  • 28. @DTAIEB55 PIXIEAPP HELLO WORLD WITH DATA Placeholder div for displaying data Display the output in the specified target Entity binding: use the df passed by user Allows binding of any entity created by the app Specify Display options for visualization Pass data to the app
  • 29. @DTAIEB55 OK, I’M SOLD… L E T ’ S A G R E E O N T H E A R C H I T E C T U R E
  • 30. @DTAIEB55 •  I’ll work on data acquisition from Twitter and enrichment with sentiment analysis scores using Spark Streaming •  I know Java very well, but I don’t have time to learn Python. •  However, I am willing to learn Scala if that helps improve my productivity S T A R T B R A I N S T O R M I N G •  I’ll perform the data exploration and analysis •  I know Python and R, but I am not familiar enough with Java or Scala •  I like pandas and numpy. I’m ok to learn Spark but expect the same level of apis •  I need to work iteratively with the data I’ll need to do some data exploration too. I’ll need APIs to access my data. BEN and NATASHA
  • 31. @DTAIEB55 •  Implement a Spark Streaming connector to Twitter •  Call Watson Tone Analyzer for each tweets •  Return a Spark DataFrame with the tweets enriched with Tone scores •  Code written in Scala, delivered as a Jar D I V I D I N G T H E T A S K S BEN and NATASHA •  Works in a Python Notebook •  Using PixieDust PackageManager, install the Scala library delivered by Ben to load the twitter data with Tone scores •  Using PixieDust display() api, perform the data exploration and analysis: trending hashtags and sentiments •  Produce visualizations to LOB Users
  • 32. @DTAIEB55 http://www.ibm.com/watson/developercloud/tone-analyzer.html WATSON TONE ANALYZER •  Uses linguistic analysis to detect 3 types of tones •  Emotion •  Social Tendencies •  Language Styles •  Available as a cloud service on IBM Bluemix
  • 33. @DTAIEB55 DEMO Twitter Sentiment Analysis Dashboard with PixieApp Sentiment Analysis of Twitter Hashtags with Spark https://github.com/ibm-cds-labs/pixiedust/blob/master/notebook/Twitter%20Sentiment%20with%20Watson%20and%20Pixiedust.ipynb https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with-spark-7ee6ca5c1585
  • 34. @DTAIEB55 MEETING WITH THE VP “SUCCESS!!” https://unsplash.com/search/meeting?photo=3fPXt37X6UQ
  • 35. @DTAIEB55 What’s next for PixieDust •  Support Visualization for Streaming data –  Start with Structured Streaming and IBM Streams •  Ability to publish/embed PixieApps into Web Application (Nodejs to begin with) •  PixieDust visualization enhancements –  Custom colors –  Custom GeoJSON layers for maps –  Sorting/filtering –  More renderers: Brunel, ArcGIS, etc. –  … •  Ability to run Node.js code to load and visualize data •  Support for Jupyter Labs and Jupyter Hub
  • 36. @DTAIEB55 As always… We look forward for your feedback and pull requests on GitHub https://github.com/ibm-cds-labs/pixiedust
  • 37. @DTAIEB55 CONCLUSION •  Solving the Data problems of tomorrow cannot be done by data scientists alone. •  Notebooks, considered by most to be the domain of data scientists, can help break down traditional silos and help team of all types who are working on data problems Try it for yourself today: •  IBM Data Science Experience http://datascience.ibm.com/ •  Locally using PixieDust automated installer https://ibm-cds-labs.github.io/pixiedust/install.html
  • 38. @DTAIEB55 RESOURCES •  https://github.com/ibm-cds-labs/pixiedust •  https://ibm-cds-labs.github.io/pixiedust •  https://medium.com/ibm-watson-data-lab/i-am-not-a-data-scientist-efe7ca6ceba2 •  https://spark.apache.org •  https://www.ibm.com/us-en/marketplace/spark-as-a-service •  http://datascience.ibm.com •  https://www.ibm.com/watson/developercloud/tone-analyzer.html •  https://medium.com/ibm-watson-data-lab/real-time-sentiment-analysis-of-twitter-hashtags-with- spark-7ee6ca5c1585 •  https://ibm.biz/pixiedustvis •  https://ibm.biz/pixiedustlab