SlideShare une entreprise Scribd logo
1  sur  53
Télécharger pour lire hors ligne
Getting Insights from IoT data
with Apache Spark &
Apache Bahir
Luciano Resende
June 20th
, 2018
1
About me - Luciano Resende
2
Data Science Platform Architect – IBM – CODAIT
• Have been contributing to open source at ASF for over 10 years
• Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache
Toree, Apache Spark among other projects related to AI/ML platforms
lresende@apache.org
https://www.linkedin.com/in/lresende
@lresende1975
https://github.com/lresende
4
Open Source Leadership & Contributions
IBM generated open source innovation
• 137 Code Open (dWO) projects w/1000+ Github projects
• 4 graduates: Node-Red, OpenWhisk, SystemML,
Blockchain fabric to full open governance in the last year
• developer.ibm.com/code/open/code/
Community
• IBM focused on 18 strategic communities
• Drive open governance in “Centers of Gravity”
• IBM Leaders drive key technologies and assure freedom
of action
The IBM OS Way is now open sourced
• Training, Recognition, Tooling
• Organization, Consuming, Contributing
2018 / © 2018 IBM Corporation
5
IBM’s history of strong AI leadership
1997: Deep Blue
• Deep Blue became the first machine to beat a world chess
champion in tournament play
2011: Jeopardy!
• Watson beat two top
Jeopardy! champions
1968, 2001: A Space Odyssey
• IBM was a technical
advisor
• HAL is “the latest in
machine intelligence”
2018: Open Tech, AI & emerging
standards
• New IBM centers of gravity for AI
• OS projects increasing exponentially
• Emerging global standards in AI
Center for Open Source
Data and AI Technologies
CODAIT
codait.org
codait (French)
= coder/coded
https://m.interglot.com/fr/en/codait
CODAIT aims to make AI solutions
dramatically easier to create, deploy,
and manage in the enterprise
Relaunch of the Spark Technology
Center (STC) to reflect expanded
mission
6
Agenda
7
Internet of Things
IoT Use Cases
IoT Design Patterns
Apache Spark & Apache Bahir
Live Demo – Anomaly Detection
Summary
References
Q&A
Internet of
Things - IoT
8
What is THE INTERNET OF THINGS (IoT) ?
THE TERM IOT WAS FIRST TOUTED BY
KEVIN ASHTON, IN 1999. It’s more than
just machine to machine communications.
It’s about ecosystems of devices that form
relevant connections to people and other
devices to exchange data. It’s a useful term
to describe the world of connected and
wearable devices that’s emerging.
9
IoT - INTERACTIONS BETWEEN multiple ENTITIES
10
control
observe
inform
command
actuate
inform
PEOPLE
THINGS SOFTWARE
Some IoT EXAMPLES
11
SMART HOMES
TRANSPORT
WEARABLES INDUSTRY
DISPLAYS HEALTH
From thermostats
to smart switches
and remote
controls to security
systems
Self-driving cars,
drones for
delivering goods ,
etc
Smartwatches,
and other devices
enabling control
and providing
monitoring
capabilities
Robotics, sensors
for predicting
quality, failures,
etc
Not only VR, but
many new displays
enabling gesture
controls, haptic
interfaces, etc
Connected health,
in partnership with
wearables for
monitoring health
metrics, among
other examples
Some IoT PATTERNS
12
• Remote control
• Security analysis
• Edge analytics
• Historical data analysis
• Distributed Platforms
• Real-time decisions
IoT Applications
13
The Weather Company
The Weather Company data
- Surface observations
- precipitation
- radar
- satellite
- personal weather stations
- lightning sources
- data collected from planes
every day
- etc.
Home Automation & Security
- Multiple connected or
standalone devices
- Controlled by Voice
- Amazon Echo (Alexa)
- Google Home
- Apple HomePod (Siri)
15
TESLA connected cars
CONNECTED VEHICLES IS ONE
EXAMPLE OF THE IoT. It’s not
just about Google Maps in cars.
When Tesla finds a software
fault with their vehicle rather
than issuing an expensive and
damaging recall, they simply
updated the car’s operating
system over the air.
[hcp://www.wired.com/2014/02
/teslas- air-fix-best-example-
yet-internet-things/]
16
AMAZON Go
AMAZON GO – No lines, no
checkout, just grab and go
17
Industrial Internet of Things
18
- Smart factory
- Predictive and remote
maintenance
- Smart metering and
smart grid
- Industrial security
- Industrial heating,
ventilation and air
conditioning
- Asset tracking and
smart logistics
Reference: https://www.iiconsortium.org/
IoT
Design Patterns
19
LAMBDA Architecture
Lambda architecture is a data-
processing architecture designed to
handle massive quantities of data by
taking advantage of both batch-
and stream-processing methods. This
approach to architecture attempts to
balance latency, throughput, and fault-
tolerance by using batch processing to
provide comprehensive and accurate
views of batch data, while simultaneously
using real-time stream processing to
provide views of online data
20
Images: https://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry
KAPA Architecture
21
Images: https://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry
The Kappa architecture simplifies
the Lambda architecture by removing
the batch layer and replacing it with a
streaming layer.
REALTIME Data Processing best practice
22
Pub/Sub Component Data Processor Data Storage
One recommendation for processing
streaming of massive quantities of
data is to add a queue component to
front-end the processing of the data.
This enables a more fault-tolerant
solution, where in conjunction with
state management, the runtime
application can have failures and
subsequently continue processing
from the same data point.
Building an IoT
Application
23
MQTT – IoT Connectivity Protocol
24
• Constrained devices
• Low bandwidth connection
• Intermittent connections
MQTT – IoT Connectivity Protocol
25
Connect
+
Publish
+
Subscribe
~1990
IBM / Eurotech
2010
Published
2011
Eclipse M2M / Paho
2014
OASIS
Open spec
+ 40 client
implementations
Minimal
overhead
Tiny
Clients
(Java 170KB)
History
Header
2-4 bytes
(publish)
14 bytes
(connect)
V5
May 2018
MQTT – Quality of Service
26
MQTT
Broker
QoS0
QoS1
QoS2
At most once
At least once
Exactly once
. No connection failover
. Never duplicate
. Has connection failover
. Can duplicate
. Has connection failover
. Never duplicate
MQTT – World usage
Smart Home Automation
Messaging
Notable Mentions:
- IBM IoT Platform
- AWS IoT
- Microsoft IoT Hub
- Facebook Messenger
27
Apache Spark
28
Apache Spark Introduction
29
Spark Core
Spark
SQL
Spark
Streaming
Spark
ML
Spark
GraphX
executes SQL
statements
performs
streaming
analytics using
micro-batches
common
machine
learning and
statistical
algorithms
distributed
graph
processing
framework
general compute engine, handles
distributed task dispatching,
scheduling and basic I/O
functions
large variety of data sources
and formats can be supported,
both on-premise or cloud
BigInsights
(HDFS)
Cloudant
dashDB
SQL
DB
Apache Spark Evolution
30
Apache Spark – Spark SQL
31
Spark
SQL
Unified data access APIS: Query
structured data sets with SQL or
Dataset/DataFrame APIs
Fast, familiar query language across
all of your enterprise data
RDBMS
Data Sources
Structured
Streaming
Data Sources
Apache Spark – Spark SQL
32
You can run SQL statement with SparkSession.sql(…) interface:
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
spark.sql(“create table T1 (c1 int, c2 int) stored as parquet”)
val ds = spark.sql(“select * from T1”)
You can further transform the resultant dataset:
val ds1 = ds.groupBy(“c1”).agg(“c2”-> “sum”)
val ds2 = ds.orderBy(“c1”)
The result is a DataFrame / Dataset[Row]
ds.show() displays the rows
Apache Spark – Spark SQL
You can read from data sources using SparkSession.read.format(…)
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)
// loading csv data to a Dataset of Bank type
val bankFromCSV = spark.read.csv(“hdfs://localhost:9000/data/bank.csv").as[Bank]
// loading JSON data to a Dataset of Bank type
val bankFromJSON = spark.read.json(“hdfs://localhost:9000/data/bank.json").as[Bank]
// select a column value from the Dataset
bankFromCSV.select(‘age’).show() will return all rows of column “age” from this dataset.
33
Apache Spark – Spark SQL
You can also configure a specific data source with specific options
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer)
// loading csv data to a Dataset of Bank type
val bankFromCSV = sparkSession.read
.option("header", ”true") // Use first line of all files as header
.option("inferSchema", ”true") // Automatically infer data types
.option("delimiter", " ")
.csv("/users/lresende/data.csv”)
.as[Bank]
bankFromCSV.select(‘age).show() // will return all rows of column “age” from this dataset.
34
Apache Spark – Spark SQL Structured Streaming
Unified programming model for streaming, interactive and batch queries
35Image source: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html
Considers the data stream as an
unbounded table
Apache Spark – Spark SQL Structured Streaming
SQL regular APIs
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
val input = spark.read
.schema(schema)
.format(”csv")
.load(”input-path")
val result = input
.select(”age”)
.where(”age > 18”)
result.write
.format(”json”)
. save(” dest-path”)
36
Structured Streaming APIs
val spark = SparkSession.builder()
.appName(“Demo”)
.getOrCreate()
val input = spark.readStream
.schema(schema)
.format(”csv")
.load(”input-path")
val result = input
.select(”age”)
.where(”age > 18”)
result.write
.format(”json”)
. startStream(” dest-path”)
Apache Spark – Spark Streaming
37
Spark
Streaming
Micro-batch event processing for
near-real time analytics
e.g. Internet of Things (IoT) devices,
Twitter feeds, Kafka (event hub), etc.
No multi-threading or parallel process
programming required
Apache Spark – Spark Streaming
Also known as discretized stream or DStream
Abstracts a continuous stream of data
Based on micro-batching
Based on RDDs
38
Apache Spark – Spark Streaming
val sparkConf = new SparkConf()
.setAppName("MQTTWordCount")
val ssc = new StreamingContext(sparkConf, Seconds(2))
val lines = MQTTUtils.createStream(ssc, brokerUrl, topic, StorageLevel.MEMORY_ONLY_SER_2)
val words = lines.flatMap(x => x.split(" "))
val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _)
wordCounts.print()
ssc.start()
ssc.awaitTermination()
39
Apache Bahir
40
Apache Bahir Project
Apache Bahir provides
extensions to multiple
distributed analytic platforms,
extending their reach with a
diversity of streaming
connectors and SQL data
sources.
Currently, Bahir provides
extensions for Apache
Spark and Apache Flink.
Bahir extensions for Apache Spark
MQTT – Enables reading data from MQTT Servers using Spark Streaming or Structured streaming.
• http://bahir.apache.org/docs/spark/current/spark-sql-streaming-mqtt/
• http://bahir.apache.org/docs/spark/current/spark-streaming-mqtt/
Couch DB/Cloudant– Enables reading data from CouchDB/Cloudant using Spark SQL and Spark
Streaming.
Twitter – Enables reading social data from twitter using Spark Streaming.
• http://bahir.apache.org/docs/spark/current/spark-streaming-twitter/
Akka – Enables reading data from Akka Actors using Spark Streaming or Structured Streaming.
• http://bahir.apache.org/docs/spark/current/spark-streaming-akka/
ZeroMQ – Enables reading data from ZeroMQ using Spark Streaming.
• http://bahir.apache.org/docs/spark/current/spark-streaming-zeromq/
42
Bahir extensions for Apache Spark
Google Cloud Pub/Sub – Add spark streaming connector to Google Cloud Pub/Sub
43
Apache Spark extensions in Bahir
Adding Bahir extensions into your application
- Using SBT
libraryDependencies += "org.apache.bahir" %% "spark-streaming-mqtt" % "2.2.0”
- Using Maven
<dependency>
<groupId>org.apache.bahir</groupId>
<artifactId>spark-streaming-mqtt_2.11 </artifactId>
<version>2.2.0</version>
</dependency>
44
Apache Spark extensions in Bahir
Submitting applications with Bahir extensions to Spark
- Spark-shell
bin/spark-shell --packages org.apache.bahir:spark-streaming_mqtt_2.11:2.2.0 …..
- Spark-submit
bin/spark-submit --packages org.apache.bahir:spark-streaming_mqtt_2.11:2.2.0 …..
45
Live Demo
46
IoT Analytics – Anomaly Detection
The demo environment
https://github.com/lresende/bahir-iot-demo
47
Node.js Web app
Simulates Elevator IoT devices
Elevator simulator
Metrics:
• Weight
• Speed
• Power
• Temperature
• System
MQTT
Mosquitto
MQTT
JavaScript Client
MQTT
Streaming Connector
IoT Analytics – Anomaly Detection
The Moving Z-Score model scores
anomalies in a univariate sequential dataset,
often a time series.
The moving Z-score is a very simple model
for measuring the anomalousness of each
point in a sequential dataset like a time
series. Given a window size w, the moving
Z-score is the number of standard
deviations each observation is away from
the mean, where the mean and standard
deviation are computed only over the
previous w observations.
48
Reference: https://turi.com/learn/userguide/anomaly_detection/moving_zscore.html
Reference: https://www.isixsigma.com/tools-templates/statistical-analysis/improved-forecasting-moving-averages-and-z-scores/
Z-Score
Moving Z-Score
Moving mean and moving standard deviation
Summary
49
Summary – Take away points
IoT Design Patterns
- Lambda and Kappa Architecture
- Real Time data processing best practice
Apache Spark
- IoT Analytics Runtime with support for ”Continuous Applications”
Apache Bahir
- Bring access to IoT data via supported connectors (e.g. MQTT)
IoT Applications
- Using Spark and Bahir to start processing IoT data in near real time using Spark Streaming
- Anomaly detection using Moving Z-Scores model
50
Join the Apache
Bahir community
51
References
Apache Bahir
http://bahir.apache.org
Documentation for Apache Spark extensions
http://bahir.apache.org/docs/spark/current/documentation/
Source Repositories
https://github.com/apache/bahir
https://github.com/apache/bahir-website
Demo Repository
https://github.com/lresende/bahir-iot-demo
52Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif
53May 17, 2018 / © 2018 IBM Corporation
CALL FOR CODE KEYNOTE
With Jeffrey Borek, 10:00 AM
54March 30 2018 / © 2018 IBM Corporation

Contenu connexe

Tendances

Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerArtem Ervits
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDataWorks Summit
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache SparkEdureka!
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Yahoo Developer Network
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark Hortonworks
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupDatabricks
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationCraig Chao
 
Hadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconHadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconDataWorks Summit
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark OverviewairisData
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Sparkfelixcss
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...DataWorks Summit
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5Yan Zhou
 

Tendances (20)

Securing Spark Applications
Securing Spark ApplicationsSecuring Spark Applications
Securing Spark Applications
 
LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data LEGO: Data Driven Growth Hacking Powered by Big Data
LEGO: Data Driven Growth Hacking Powered by Big Data
 
Apache Zeppelin Helium and Beyond
Apache Zeppelin Helium and BeyondApache Zeppelin Helium and Beyond
Apache Zeppelin Helium and Beyond
 
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow ManagerBreathing new life into Apache Oozie with Apache Ambari Workflow Manager
Breathing new life into Apache Oozie with Apache Ambari Workflow Manager
 
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming DataDruid: Sub-Second OLAP queries over Petabytes of Streaming Data
Druid: Sub-Second OLAP queries over Petabytes of Streaming Data
 
Securing Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise ContextSecuring Hadoop in an Enterprise Context
Securing Hadoop in an Enterprise Context
 
Spark SQL | Apache Spark
Spark SQL | Apache SparkSpark SQL | Apache Spark
Spark SQL | Apache Spark
 
Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010Hadoop Frameworks Panel__HadoopSummit2010
Hadoop Frameworks Panel__HadoopSummit2010
 
Luciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conferenceLuciano Resende's keynote at Apache big data conference
Luciano Resende's keynote at Apache big data conference
 
Apache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, ScaleApache Hive 2.0: SQL, Speed, Scale
Apache Hive 2.0: SQL, Speed, Scale
 
YARN Ready: Apache Spark
YARN Ready: Apache Spark YARN Ready: Apache Spark
YARN Ready: Apache Spark
 
Spark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark MeetupSpark SQL Deep Dive @ Melbourne Spark Meetup
Spark SQL Deep Dive @ Melbourne Spark Meetup
 
Build a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimizationBuild a deep learning pipeline on apache spark for ads optimization
Build a deep learning pipeline on apache spark for ads optimization
 
Hadoop first ETL on Apache Falcon
Hadoop first ETL on Apache FalconHadoop first ETL on Apache Falcon
Hadoop first ETL on Apache Falcon
 
Apache Spark Overview
Apache Spark OverviewApache Spark Overview
Apache Spark Overview
 
Sparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with SparkSparkly Notebook: Interactive Analysis and Visualization with Spark
Sparkly Notebook: Interactive Analysis and Visualization with Spark
 
Apache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and BeyondApache Zeppelin, Helium and Beyond
Apache Zeppelin, Helium and Beyond
 
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
Real-Time Ingesting and Transforming Sensor Data and Social Data with NiFi an...
 
HiveWarehouseConnector
HiveWarehouseConnectorHiveWarehouseConnector
HiveWarehouseConnector
 
Spark meetup v2.0.5
Spark meetup v2.0.5Spark meetup v2.0.5
Spark meetup v2.0.5
 

Similaire à Getting insights from IoT data with Apache Spark and Apache Bahir

OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OW2
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware
 
OCCIware@OW2con 2016
OCCIware@OW2con 2016OCCIware@OW2con 2016
OCCIware@OW2con 2016Marc Dutoo
 
Role of cloud and analytics in IoT
Role of cloud and analytics in IoTRole of cloud and analytics in IoT
Role of cloud and analytics in IoTSelvaraj Kesavan
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE
 
Cloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesCloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesThoughtWorks Studios
 
Open Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - OverviewOpen Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - OverviewKrishna-Kumar
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshIanFurlong4
 
Cloud Computing Networks
Cloud Computing NetworksCloud Computing Networks
Cloud Computing Networksjayapal385
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Marc Dutoo
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3Robert Grossman
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015Krishna-Kumar
 
CHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csCHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csTSha7
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...Timothy Spann
 
Excellent slides on the new z13s announced on 16th Feb 2016
Excellent slides on the new z13s announced on 16th Feb 2016Excellent slides on the new z13s announced on 16th Feb 2016
Excellent slides on the new z13s announced on 16th Feb 2016Luigi Tommaseo
 

Similaire à Getting insights from IoT data with Apache Spark and Apache Bahir (20)

OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...OCCIware: extensible and standard-based XaaS platform to manage everything in...
OCCIware: extensible and standard-based XaaS platform to manage everything in...
 
OCCIware@OW2con 2016
OCCIware@OW2con 2016OCCIware@OW2con 2016
OCCIware@OW2con 2016
 
Cloud Computing
Cloud  ComputingCloud  Computing
Cloud Computing
 
Role of cloud and analytics in IoT
Role of cloud and analytics in IoTRole of cloud and analytics in IoT
Role of cloud and analytics in IoT
 
FIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE OverviewFIWARE Global Summit - FIWARE Overview
FIWARE Global Summit - FIWARE Overview
 
Computer project
Computer projectComputer project
Computer project
 
cloud computing models
cloud computing modelscloud computing models
cloud computing models
 
Cloud Computing - Benefits and Challenges
Cloud Computing - Benefits and ChallengesCloud Computing - Benefits and Challenges
Cloud Computing - Benefits and Challenges
 
Open Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - OverviewOpen Source Edge Computing Platforms - Overview
Open Source Edge Computing Platforms - Overview
 
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMeshThe Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
The Enterprise Guide to Building a Data Mesh - Introducing SpecMesh
 
The Enterprise Cloud
The Enterprise CloudThe Enterprise Cloud
The Enterprise Cloud
 
Cloud Computing Networks
Cloud Computing NetworksCloud Computing Networks
Cloud Computing Networks
 
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, SmileOCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
OCCIware presentation at EclipseDay in Lyon, November 2017, by Marc Dutoo, Smile
 
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
Model and pilot all cloud layers with OCCIware - Eclipse Day Lyon 2017
 
OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3OCC Overview OMG Clouds Meeting 07-13-09 v3
OCC Overview OMG Clouds Meeting 07-13-09 v3
 
PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015PaaS Emerging Technologies - October 2015
PaaS Emerging Technologies - October 2015
 
CHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in csCHAPTER 2 cloud computing technology in cs
CHAPTER 2 cloud computing technology in cs
 
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
2024 February 28 - NYC - Meetup Unlocking Financial Data with Real-Time Pipel...
 
Excellent slides on the new z13s announced on 16th Feb 2016
Excellent slides on the new z13s announced on 16th Feb 2016Excellent slides on the new z13s announced on 16th Feb 2016
Excellent slides on the new z13s announced on 16th Feb 2016
 

Plus de Luciano Resende

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfLuciano Resende
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsLuciano Resende
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Luciano Resende
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...Luciano Resende
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksLuciano Resende
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayLuciano Resende
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsLuciano Resende
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewLuciano Resende
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeLuciano Resende
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examplesLuciano Resende
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsLuciano Resende
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkLuciano Resende
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017Luciano Resende
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine LearningLuciano Resende
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayLuciano Resende
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gatewayLuciano Resende
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceLuciano Resende
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overviewLuciano Resende
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitionsLuciano Resende
 

Plus de Luciano Resende (20)

A Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdfA Jupyter kernel for Scala and Apache Spark.pdf
A Jupyter kernel for Scala and Apache Spark.pdf
 
Using Elyra for COVID-19 Analytics
Using Elyra for COVID-19 AnalyticsUsing Elyra for COVID-19 Analytics
Using Elyra for COVID-19 Analytics
 
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
Elyra - a set of AI-centric extensions to JupyterLab Notebooks.
 
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
From Data to AI - Silicon Valley Open Source projects come to you - Madrid me...
 
Ai pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooksAi pipelines powered by jupyter notebooks
Ai pipelines powered by jupyter notebooks
 
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise GatewayStrata - Scaling Jupyter with Jupyter Enterprise Gateway
Strata - Scaling Jupyter with Jupyter Enterprise Gateway
 
Scaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloadsScaling notebooks for Deep Learning workloads
Scaling notebooks for Deep Learning workloads
 
Jupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway OverviewJupyter Enterprise Gateway Overview
Jupyter Enterprise Gateway Overview
 
Inteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for CodeInteligencia artificial, open source e IBM Call for Code
Inteligencia artificial, open source e IBM Call for Code
 
Open Source AI - News and examples
Open Source AI - News and examplesOpen Source AI - News and examples
Open Source AI - News and examples
 
Building analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernelsBuilding analytical microservices powered by jupyter kernels
Building analytical microservices powered by jupyter kernels
 
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache SparkAn Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
An Enterprise Analytics Platform with Jupyter Notebooks and Apache Spark
 
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
The Analytic Platform behind IBM’s Watson Data Platform - Big Data Spain 2017
 
What's new in Apache SystemML - Declarative Machine Learning
What's new in Apache SystemML  - Declarative Machine LearningWhat's new in Apache SystemML  - Declarative Machine Learning
What's new in Apache SystemML - Declarative Machine Learning
 
Big analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel GatewayBig analytics meetup - Extended Jupyter Kernel Gateway
Big analytics meetup - Extended Jupyter Kernel Gateway
 
Jupyter con meetup extended jupyter kernel gateway
Jupyter con meetup   extended jupyter kernel gatewayJupyter con meetup   extended jupyter kernel gateway
Jupyter con meetup extended jupyter kernel gateway
 
How mentoring can help you start contributing to open source
How mentoring can help you start contributing to open sourceHow mentoring can help you start contributing to open source
How mentoring can help you start contributing to open source
 
Asf icfoss-mentoring
Asf icfoss-mentoringAsf icfoss-mentoring
Asf icfoss-mentoring
 
Open Source tools overview
Open Source tools overviewOpen Source tools overview
Open Source tools overview
 
Data access layer and schema definitions
Data access layer and schema definitionsData access layer and schema definitions
Data access layer and schema definitions
 

Dernier

9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSINGmarianagonzalez07
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectBoston Institute of Analytics
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfgstagge
 

Dernier (20)

9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
2006_GasProcessing_HB (1).pdf HYDROCARBON PROCESSING
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Heart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis ProjectHeart Disease Classification Report: A Data Analysis Project
Heart Disease Classification Report: A Data Analysis Project
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
RadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdfRadioAdProWritingCinderellabyButleri.pdf
RadioAdProWritingCinderellabyButleri.pdf
 

Getting insights from IoT data with Apache Spark and Apache Bahir

  • 1. Getting Insights from IoT data with Apache Spark & Apache Bahir Luciano Resende June 20th , 2018 1
  • 2. About me - Luciano Resende 2 Data Science Platform Architect – IBM – CODAIT • Have been contributing to open source at ASF for over 10 years • Currently contributing to : Jupyter Notebook ecosystem, Apache Bahir, Apache Toree, Apache Spark among other projects related to AI/ML platforms lresende@apache.org https://www.linkedin.com/in/lresende @lresende1975 https://github.com/lresende
  • 3. 4 Open Source Leadership & Contributions IBM generated open source innovation • 137 Code Open (dWO) projects w/1000+ Github projects • 4 graduates: Node-Red, OpenWhisk, SystemML, Blockchain fabric to full open governance in the last year • developer.ibm.com/code/open/code/ Community • IBM focused on 18 strategic communities • Drive open governance in “Centers of Gravity” • IBM Leaders drive key technologies and assure freedom of action The IBM OS Way is now open sourced • Training, Recognition, Tooling • Organization, Consuming, Contributing 2018 / © 2018 IBM Corporation
  • 4. 5 IBM’s history of strong AI leadership 1997: Deep Blue • Deep Blue became the first machine to beat a world chess champion in tournament play 2011: Jeopardy! • Watson beat two top Jeopardy! champions 1968, 2001: A Space Odyssey • IBM was a technical advisor • HAL is “the latest in machine intelligence” 2018: Open Tech, AI & emerging standards • New IBM centers of gravity for AI • OS projects increasing exponentially • Emerging global standards in AI
  • 5. Center for Open Source Data and AI Technologies CODAIT codait.org codait (French) = coder/coded https://m.interglot.com/fr/en/codait CODAIT aims to make AI solutions dramatically easier to create, deploy, and manage in the enterprise Relaunch of the Spark Technology Center (STC) to reflect expanded mission 6
  • 6. Agenda 7 Internet of Things IoT Use Cases IoT Design Patterns Apache Spark & Apache Bahir Live Demo – Anomaly Detection Summary References Q&A
  • 8. What is THE INTERNET OF THINGS (IoT) ? THE TERM IOT WAS FIRST TOUTED BY KEVIN ASHTON, IN 1999. It’s more than just machine to machine communications. It’s about ecosystems of devices that form relevant connections to people and other devices to exchange data. It’s a useful term to describe the world of connected and wearable devices that’s emerging. 9
  • 9. IoT - INTERACTIONS BETWEEN multiple ENTITIES 10 control observe inform command actuate inform PEOPLE THINGS SOFTWARE
  • 10. Some IoT EXAMPLES 11 SMART HOMES TRANSPORT WEARABLES INDUSTRY DISPLAYS HEALTH From thermostats to smart switches and remote controls to security systems Self-driving cars, drones for delivering goods , etc Smartwatches, and other devices enabling control and providing monitoring capabilities Robotics, sensors for predicting quality, failures, etc Not only VR, but many new displays enabling gesture controls, haptic interfaces, etc Connected health, in partnership with wearables for monitoring health metrics, among other examples
  • 11. Some IoT PATTERNS 12 • Remote control • Security analysis • Edge analytics • Historical data analysis • Distributed Platforms • Real-time decisions
  • 13. The Weather Company The Weather Company data - Surface observations - precipitation - radar - satellite - personal weather stations - lightning sources - data collected from planes every day - etc.
  • 14. Home Automation & Security - Multiple connected or standalone devices - Controlled by Voice - Amazon Echo (Alexa) - Google Home - Apple HomePod (Siri) 15
  • 15. TESLA connected cars CONNECTED VEHICLES IS ONE EXAMPLE OF THE IoT. It’s not just about Google Maps in cars. When Tesla finds a software fault with their vehicle rather than issuing an expensive and damaging recall, they simply updated the car’s operating system over the air. [hcp://www.wired.com/2014/02 /teslas- air-fix-best-example- yet-internet-things/] 16
  • 16. AMAZON Go AMAZON GO – No lines, no checkout, just grab and go 17
  • 17. Industrial Internet of Things 18 - Smart factory - Predictive and remote maintenance - Smart metering and smart grid - Industrial security - Industrial heating, ventilation and air conditioning - Asset tracking and smart logistics Reference: https://www.iiconsortium.org/
  • 19. LAMBDA Architecture Lambda architecture is a data- processing architecture designed to handle massive quantities of data by taking advantage of both batch- and stream-processing methods. This approach to architecture attempts to balance latency, throughput, and fault- tolerance by using batch processing to provide comprehensive and accurate views of batch data, while simultaneously using real-time stream processing to provide views of online data 20 Images: https://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry
  • 20. KAPA Architecture 21 Images: https://www.oreilly.com/ideas/applying-the-kappa-architecture-in-the-telco-industry The Kappa architecture simplifies the Lambda architecture by removing the batch layer and replacing it with a streaming layer.
  • 21. REALTIME Data Processing best practice 22 Pub/Sub Component Data Processor Data Storage One recommendation for processing streaming of massive quantities of data is to add a queue component to front-end the processing of the data. This enables a more fault-tolerant solution, where in conjunction with state management, the runtime application can have failures and subsequently continue processing from the same data point.
  • 23. MQTT – IoT Connectivity Protocol 24 • Constrained devices • Low bandwidth connection • Intermittent connections
  • 24. MQTT – IoT Connectivity Protocol 25 Connect + Publish + Subscribe ~1990 IBM / Eurotech 2010 Published 2011 Eclipse M2M / Paho 2014 OASIS Open spec + 40 client implementations Minimal overhead Tiny Clients (Java 170KB) History Header 2-4 bytes (publish) 14 bytes (connect) V5 May 2018
  • 25. MQTT – Quality of Service 26 MQTT Broker QoS0 QoS1 QoS2 At most once At least once Exactly once . No connection failover . Never duplicate . Has connection failover . Can duplicate . Has connection failover . Never duplicate
  • 26. MQTT – World usage Smart Home Automation Messaging Notable Mentions: - IBM IoT Platform - AWS IoT - Microsoft IoT Hub - Facebook Messenger 27
  • 28. Apache Spark Introduction 29 Spark Core Spark SQL Spark Streaming Spark ML Spark GraphX executes SQL statements performs streaming analytics using micro-batches common machine learning and statistical algorithms distributed graph processing framework general compute engine, handles distributed task dispatching, scheduling and basic I/O functions large variety of data sources and formats can be supported, both on-premise or cloud BigInsights (HDFS) Cloudant dashDB SQL DB
  • 30. Apache Spark – Spark SQL 31 Spark SQL Unified data access APIS: Query structured data sets with SQL or Dataset/DataFrame APIs Fast, familiar query language across all of your enterprise data RDBMS Data Sources Structured Streaming Data Sources
  • 31. Apache Spark – Spark SQL 32 You can run SQL statement with SparkSession.sql(…) interface: val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() spark.sql(“create table T1 (c1 int, c2 int) stored as parquet”) val ds = spark.sql(“select * from T1”) You can further transform the resultant dataset: val ds1 = ds.groupBy(“c1”).agg(“c2”-> “sum”) val ds2 = ds.orderBy(“c1”) The result is a DataFrame / Dataset[Row] ds.show() displays the rows
  • 32. Apache Spark – Spark SQL You can read from data sources using SparkSession.read.format(…) val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer) // loading csv data to a Dataset of Bank type val bankFromCSV = spark.read.csv(“hdfs://localhost:9000/data/bank.csv").as[Bank] // loading JSON data to a Dataset of Bank type val bankFromJSON = spark.read.json(“hdfs://localhost:9000/data/bank.json").as[Bank] // select a column value from the Dataset bankFromCSV.select(‘age’).show() will return all rows of column “age” from this dataset. 33
  • 33. Apache Spark – Spark SQL You can also configure a specific data source with specific options val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() case class Bank(age: Integer, job: String, marital: String, education: String, balance: Integer) // loading csv data to a Dataset of Bank type val bankFromCSV = sparkSession.read .option("header", ”true") // Use first line of all files as header .option("inferSchema", ”true") // Automatically infer data types .option("delimiter", " ") .csv("/users/lresende/data.csv”) .as[Bank] bankFromCSV.select(‘age).show() // will return all rows of column “age” from this dataset. 34
  • 34. Apache Spark – Spark SQL Structured Streaming Unified programming model for streaming, interactive and batch queries 35Image source: https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html Considers the data stream as an unbounded table
  • 35. Apache Spark – Spark SQL Structured Streaming SQL regular APIs val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() val input = spark.read .schema(schema) .format(”csv") .load(”input-path") val result = input .select(”age”) .where(”age > 18”) result.write .format(”json”) . save(” dest-path”) 36 Structured Streaming APIs val spark = SparkSession.builder() .appName(“Demo”) .getOrCreate() val input = spark.readStream .schema(schema) .format(”csv") .load(”input-path") val result = input .select(”age”) .where(”age > 18”) result.write .format(”json”) . startStream(” dest-path”)
  • 36. Apache Spark – Spark Streaming 37 Spark Streaming Micro-batch event processing for near-real time analytics e.g. Internet of Things (IoT) devices, Twitter feeds, Kafka (event hub), etc. No multi-threading or parallel process programming required
  • 37. Apache Spark – Spark Streaming Also known as discretized stream or DStream Abstracts a continuous stream of data Based on micro-batching Based on RDDs 38
  • 38. Apache Spark – Spark Streaming val sparkConf = new SparkConf() .setAppName("MQTTWordCount") val ssc = new StreamingContext(sparkConf, Seconds(2)) val lines = MQTTUtils.createStream(ssc, brokerUrl, topic, StorageLevel.MEMORY_ONLY_SER_2) val words = lines.flatMap(x => x.split(" ")) val wordCounts = words.map(x => (x, 1)).reduceByKey(_ + _) wordCounts.print() ssc.start() ssc.awaitTermination() 39
  • 40. Apache Bahir Project Apache Bahir provides extensions to multiple distributed analytic platforms, extending their reach with a diversity of streaming connectors and SQL data sources. Currently, Bahir provides extensions for Apache Spark and Apache Flink.
  • 41. Bahir extensions for Apache Spark MQTT – Enables reading data from MQTT Servers using Spark Streaming or Structured streaming. • http://bahir.apache.org/docs/spark/current/spark-sql-streaming-mqtt/ • http://bahir.apache.org/docs/spark/current/spark-streaming-mqtt/ Couch DB/Cloudant– Enables reading data from CouchDB/Cloudant using Spark SQL and Spark Streaming. Twitter – Enables reading social data from twitter using Spark Streaming. • http://bahir.apache.org/docs/spark/current/spark-streaming-twitter/ Akka – Enables reading data from Akka Actors using Spark Streaming or Structured Streaming. • http://bahir.apache.org/docs/spark/current/spark-streaming-akka/ ZeroMQ – Enables reading data from ZeroMQ using Spark Streaming. • http://bahir.apache.org/docs/spark/current/spark-streaming-zeromq/ 42
  • 42. Bahir extensions for Apache Spark Google Cloud Pub/Sub – Add spark streaming connector to Google Cloud Pub/Sub 43
  • 43. Apache Spark extensions in Bahir Adding Bahir extensions into your application - Using SBT libraryDependencies += "org.apache.bahir" %% "spark-streaming-mqtt" % "2.2.0” - Using Maven <dependency> <groupId>org.apache.bahir</groupId> <artifactId>spark-streaming-mqtt_2.11 </artifactId> <version>2.2.0</version> </dependency> 44
  • 44. Apache Spark extensions in Bahir Submitting applications with Bahir extensions to Spark - Spark-shell bin/spark-shell --packages org.apache.bahir:spark-streaming_mqtt_2.11:2.2.0 ….. - Spark-submit bin/spark-submit --packages org.apache.bahir:spark-streaming_mqtt_2.11:2.2.0 ….. 45
  • 46. IoT Analytics – Anomaly Detection The demo environment https://github.com/lresende/bahir-iot-demo 47 Node.js Web app Simulates Elevator IoT devices Elevator simulator Metrics: • Weight • Speed • Power • Temperature • System MQTT Mosquitto MQTT JavaScript Client MQTT Streaming Connector
  • 47. IoT Analytics – Anomaly Detection The Moving Z-Score model scores anomalies in a univariate sequential dataset, often a time series. The moving Z-score is a very simple model for measuring the anomalousness of each point in a sequential dataset like a time series. Given a window size w, the moving Z-score is the number of standard deviations each observation is away from the mean, where the mean and standard deviation are computed only over the previous w observations. 48 Reference: https://turi.com/learn/userguide/anomaly_detection/moving_zscore.html Reference: https://www.isixsigma.com/tools-templates/statistical-analysis/improved-forecasting-moving-averages-and-z-scores/ Z-Score Moving Z-Score Moving mean and moving standard deviation
  • 49. Summary – Take away points IoT Design Patterns - Lambda and Kappa Architecture - Real Time data processing best practice Apache Spark - IoT Analytics Runtime with support for ”Continuous Applications” Apache Bahir - Bring access to IoT data via supported connectors (e.g. MQTT) IoT Applications - Using Spark and Bahir to start processing IoT data in near real time using Spark Streaming - Anomaly detection using Moving Z-Scores model 50
  • 50. Join the Apache Bahir community 51
  • 51. References Apache Bahir http://bahir.apache.org Documentation for Apache Spark extensions http://bahir.apache.org/docs/spark/current/documentation/ Source Repositories https://github.com/apache/bahir https://github.com/apache/bahir-website Demo Repository https://github.com/lresende/bahir-iot-demo 52Image source: http://az616578.vo.msecnd.net/files/2016/03/21/6359412499310138501557867529_thank-you-1400x800-c-default.gif
  • 52. 53May 17, 2018 / © 2018 IBM Corporation CALL FOR CODE KEYNOTE With Jeffrey Borek, 10:00 AM
  • 53. 54March 30 2018 / © 2018 IBM Corporation