SlideShare une entreprise Scribd logo
1  sur  54
Télécharger pour lire hors ligne
Analytics in the age of
the Internet of Things
Ludwine Probst @nivdul
Thank you!
?
me
Data Engineer
@nivdul
nivdul.wordpress.com
Women in Tech
Duchess France
@duchessfr
duchess-france.org
Paris chapter Leader
Internet of Things (IoT)
more and more
aircraft
use case: sensor data from a cross-country flight
data points: several terabytes every hour per sensor
data analysis: batch mode or real time analysis
applications:
• flight performance (optimize plane fuel consumption,
• reduce maintenance costs…)
• detect anomalies
• prevent accidents
insurance
use case: data from a connected car key
applications:
• monitoring
• real time vehicle location
• drive safety
• driving score
Why should I care?
Because it can affect & change our business, our everyday life?
Collecting
Time series
112578291481000 -5.13
112578334541000 -5.05
112578339541000 -5.15
112578451484000 -5.48
112578491615000 -5.33
Some protocols…
• DDS – Device-to-Device communication – real-time
• MQTT – Device-to-Server – collect telemetry data
• XMPP – Device-to-Server – Instant Messaging scenarios
• AMQP – Server-to-Server – connecting devices to backend
…
Challenges
limited CPU
&
memory resources
low energy communication network
Storing
• flat file:
limited utility
• relational database:
limited design
rigidity
• NoSQL database:
scalability
faster & more flexible
Storing TS
IoT data pipeline
streaming
Storm
Now, the example!
http://www.cis.fordham.edu/wisdm/index.php
http://www.cis.fordham.edu/wisdm/includes/files/sensorKDD-2010.pdf
WISDM Lab’s study
The example
Goal: identify the physical activity that a user is performing
inspired by WISDM Lab’s study http://www.cis.fordham.edu/wisdm/index.php
The situation
The labeled data comes from an accelerometer (37 users)
Possible activities are:
walking, jogging, sitting, standing, downstairs and upstairs.
This is a classification problem here!
Some algorithms to use: Decision tree, Random Forest, Multinomial
logistic regression...
How can I predict the user’s
activity?
1. analyzing part:
collect & clean data from a csv file
store it in Cassandra
define & extract features using Spark
build the predictive model using MLlib
2. predicting part:
collect data in real-time (REST)
use the model to predict result
MLlib
https://github.com/nivdul/actitracker-cassandra-spark
Collect
&
store the data
The accelerometer
A sensor (in a smartphone)
compute acceleration over X,Y,Z
collect data every 50ms
Each acceleration contains:
• a timestamp (eg, 1428773040488)
• acceleration along the X axis (unit is m/s²)
• acceleration along the Y axis (unit is m/s²)
• acceleration along the Z axis (unit is m/s²)
Accelerometer Android app
REST Api collecting data coming from a phone application
Accelerometer Data Model
CREATE KEYSPACE actitracker WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 };
CREATE TABLE users (user_id int,
activity text,
timestamp bigint,
acc_x double,
acc_y double,
acc_z double,
PRIMARY KEY ((user_id,activity),timestamp));
COPY users FROM '/path_to_your_data/data.csv' WITH HEADER = true;
Accelerometer Data Model: logical view
8 walking 112578291481000 -5.13 8.15 1.31
8 walking 112578334541000 -5.05 8.16 1.31
8 walking 112578339541000 -5.15 8.16 1.36
8 walking 112578451484000 -5.48 8.17 1.31
8 walking 112578491615000 -5.33 8.16 1.18
activityuser_id
timestamp
acc_x acc_z
acc_y
graph from the Cityzen Data widget
Analyzing
https://github.com/nivdul/actitracker-cassandra-spark
is a large-scale in-memory data processing framework
• big data analytics in memory/disk
• complements Hadoop
• faster and more flexible
• Resilient Distributed Datasets (RDD)
interactive shell (scala & python)
Lambda
(Java 8)
Spark ecosystem
MLlib
• regression
• classification
• clustering
• optimization
• collaborative filtering
• feature extraction (TF-IDF, Word2Vec…)
is Apache Spark's scalable machine learning library
spark-cassandra-connector
Exposes Cassandra tables as Spark RDD
Identify features
repetitive static
VS
walking, jogging, up/down stairs standing, sitting
graph from the Cityzen Data widget
The activities : jogging
mean_x = 3.3
mean_y = -6.9
mean_z = 0.8
Y-axis: peaks spaced out
about 0.25 seconds
graph from the Cityzen Data widget
The activities : walking
mean_x = 1
mean_y = 10
mean_z = -0.3
Y-axis: peaks spaced
about 0.5 seconds
graph from the Cityzen Data widget
The activities : up/downstairs
Y-axis: peaks spaced about 0.75 seconds
graph from the Cityzen Data widget
up down
The activities : standing
graph from the Cityzen Data widget
standing
static activity: no peaks
sitting
The features
• Average acceleration (for each axis)
• Variance (for each axis)
• Average absolute difference (for each axis)
• Average resultant acceleration
• Average time between peaks (max) (for Y-axis)
Goal: compute these features for all the users (37) and activities (6) over few
seconds window
Clean
&
prepare the data
retrieve the data from Cassandra
// define Spark context
SparkConf sparkConf = new SparkConf()
.setAppName("User's physical activity recognition")
.set("spark.cassandra.connection.host", "127.0.0.1")
.setMaster("local[*]");
JavaSparkContext sc = new JavaSparkContext(sparkConf);
// retrieve data from Cassandra and create an CassandraRDD
CassandraJavaRDD<CassandraRow> cassandraRowsRDD =
javaFunctions(sc).cassandraTable("actitracker", "users");
Compute the features
MLlib
Feature: mean
import org.apache.spark.mllib.stat.MultivariateStatisticalSummary;
import org.apache.spark.mllib.stat.Statistics;
private MultivariateStatisticalSummary summary;
public ExtractFeature(JavaRDD<Vector> data) {
this.summary = Statistics.colStats(data.rdd());
}
// return a Vector (mean_acc_x, mean_acc_y, mean_acc_z)
public Vector computeAvgAcc() {
return this.summary.mean();
}
Feature: avg time between peaks
// define the maximum using the max function from MLlib
double max = this.summary.max().toArray()[1];
// keep the timestamp of data point for which the value is greater than 0.9 * max
// and sort it !
// Here: data = RDD (ts, acc_y)
JavaRDD<Long> peaks = data.filter(record -> record[1] > 0.9 * max)
.map(record -> record[0])
.sortBy(time -> time, true, 1);
Feature: avg time between peaks
// retrieve the first and last element of the RDD (sorted)
Long firstElement = peaks.first();
Long lastElement = peaks.sortBy(time -> time, false, 1).first();
// compute the delta between each timestamp
JavaRDD<Long> firstRDD = peaks.filter(record -> record > firstElement);
JavaRDD<Long> secondRDD = peaks.filter(record -> record < lastElement);
JavaRDD<Vector> product = firstRDD.zip(secondRDD)
.map(pair -> pair._1() - pair._2())
// and keep it if the delta is != 0
.filter(value -> value > 0)
.map(line -> Vectors.dense(line));
// compute the mean of the delta
return Statistics.colStats(product.rdd()).mean().toArray()[0];
Choose algorithms
Random Forests
Decision Trees
Multiclass Logistic Regression
MLlib
Goal: identify the physical activity that a user is performing
Decision Trees
// Split data into 2 sets : training (60%) and test (40%)
JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[]{0.6, 0.4});
JavaRDD<LabeledPoint> trainingData = splits[0].cache();
JavaRDD<LabeledPoint> testData = splits[1];
Decision Trees// Decision Tree parameters
Map<Integer, Integer> categoricalFeaturesInfo = new HashMap<>();
int numClasses = 4;
String impurity = "gini";
int maxDepth = 9;
int maxBins = 32;
// create model
final DecisionTreeModel model = DecisionTree.trainClassifier(
trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins);
// Evaluate model on training instances and compute training error
JavaPairRDD<Double, Double> predictionAndLabel =
testData.mapToPair(p -> new Tuple2<>(model.predict(p.features()), p.label()));
Double testErrDT = 1.0 * predictionAndLabel.filter(pl -> !pl._1().equals(pl._2())).count() / testData.count();
// Save model
model.save(sc, "actitracker");
Results
Predictions
http://www.commitstrip.com/en/2014/04/08/the-demo-effect-dear-old-murphy/?setLocale=1
Accelerometer Android app
REST Api collecting data coming from a phone application
An example: https://github.com/MiraLak/accelerometer-rest-to-cassandra
Predictions!
// load the model saved before
DecisionTreeModel model = DecisionTreeModel.load(sc.sc(), "actitracker");
// connection between Spark and Cassandra using the spark-cassandra-connector
CassandraJavaRDD<CassandraRow> cassandraRowsRDD = javaFunctions(sc).cassandraTable("accelerations",
"acceleration");
// retrieve data from Cassandra and create an CassandraRDD
JavaRDD<CassandraRow> data = cassandraRowsRDD.select("timestamp", "acc_x", "acc_y", "acc_z")
.where("user_id=?", "TEST_USER")
.withDescOrder()
.limit(250);
Vector feature = computeFeature(sc);
double prediction = model.predict(feature);
How can I use my
computations?
possible applications:
• adapt the music over your speed
• detects lack of activity
• smarter pacemakers
• smarter oxygen therapy
Conclusion
• http://cassandra.apache.org/
• http://planetcassandra.org/getting-started-with-time-series-data-modeling/
• https://github.com/datastax/spark-cassandra-connector
• https://github.com/MiraLak/AccelerometerAndroidApp
• https://github.com/MiraLak/accelerometer-rest-to-cassandra
• https://spark.apache.org/docs/1.3.0/
• https://github.com/nivdul/actitracker-cassandra-spark
• http://www.duchess-france.org/analyze-accelerometer-data-with-apache-spark-and-mllib/
http://www.cis.fordham.edu/wisdm/index.php
Some references
Thank you!

Contenu connexe

Similaire à Analytics with Spark

Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandFrançois Garillot
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQLYousun Jeong
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...InfluxData
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profilerIhor Bobak
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...Chester Chen
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's comingDatabricks
 
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EnginePrashant Vats
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Mark Smith
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayQAware GmbH
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent MonitoringIntelie
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdfsash236
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every dayVadym Khondar
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkShashank Gautam
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Keshav Murthy
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013BertrandDrouvot
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionChetan Khatri
 

Similaire à Analytics with Spark (20)

Mobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in SwitzerlandMobility insights at Swisscom - Understanding collective mobility in Switzerland
Mobility insights at Swisscom - Understanding collective mobility in Switzerland
 
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed KafsiSpark Summit EU talk by Francois Garillot and Mohamed Kafsi
Spark Summit EU talk by Francois Garillot and Mohamed Kafsi
 
Spark streaming , Spark SQL
Spark streaming , Spark SQLSpark streaming , Spark SQL
Spark streaming , Spark SQL
 
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
Samantha Wang [InfluxData] | Best Practices on How to Transform Your Data Usi...
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
Hadoop cluster performance profiler
Hadoop cluster performance profilerHadoop cluster performance profiler
Hadoop cluster performance profiler
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
SF Big Analytics 20191112: How to performance-tune Spark applications in larg...
 
Spark what's new what's coming
Spark what's new what's comingSpark what's new what's coming
Spark what's new what's coming
 
Apache spark core
Apache spark coreApache spark core
Apache spark core
 
Kapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing EngineKapacitor - Real Time Data Processing Engine
Kapacitor - Real Time Data Processing Engine
 
Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016Tulsa techfest Spark Core Aug 5th 2016
Tulsa techfest Spark Core Aug 5th 2016
 
Dataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice WayDataservices: Processing (Big) Data the Microservice Way
Dataservices: Processing (Big) Data the Microservice Way
 
Intelligent Monitoring
Intelligent MonitoringIntelligent Monitoring
Intelligent Monitoring
 
112 portfpres.pdf
112 portfpres.pdf112 portfpres.pdf
112 portfpres.pdf
 
Reactive programming every day
Reactive programming every dayReactive programming every day
Reactive programming every day
 
Fabric - Realtime stream processing framework
Fabric - Realtime stream processing frameworkFabric - Realtime stream processing framework
Fabric - Realtime stream processing framework
 
Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data. Accelerating analytics on the Sensor and IoT Data.
Accelerating analytics on the Sensor and IoT Data.
 
Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013Example R usage for oracle DBA UKOUG 2013
Example R usage for oracle DBA UKOUG 2013
 
No more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in productionNo more struggles with Apache Spark workloads in production
No more struggles with Apache Spark workloads in production
 

Dernier

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...gajnagarg
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...amitlee9823
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 

Dernier (20)

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 

Analytics with Spark

  • 1. Analytics in the age of the Internet of Things Ludwine Probst @nivdul
  • 4. Women in Tech Duchess France @duchessfr duchess-france.org Paris chapter Leader
  • 6.
  • 8. aircraft use case: sensor data from a cross-country flight data points: several terabytes every hour per sensor data analysis: batch mode or real time analysis applications: • flight performance (optimize plane fuel consumption, • reduce maintenance costs…) • detect anomalies • prevent accidents
  • 9. insurance use case: data from a connected car key applications: • monitoring • real time vehicle location • drive safety • driving score
  • 10. Why should I care? Because it can affect & change our business, our everyday life?
  • 12. Time series 112578291481000 -5.13 112578334541000 -5.05 112578339541000 -5.15 112578451484000 -5.48 112578491615000 -5.33
  • 13. Some protocols… • DDS – Device-to-Device communication – real-time • MQTT – Device-to-Server – collect telemetry data • XMPP – Device-to-Server – Instant Messaging scenarios • AMQP – Server-to-Server – connecting devices to backend …
  • 14. Challenges limited CPU & memory resources low energy communication network
  • 16. • flat file: limited utility • relational database: limited design rigidity • NoSQL database: scalability faster & more flexible Storing TS
  • 20. The example Goal: identify the physical activity that a user is performing inspired by WISDM Lab’s study http://www.cis.fordham.edu/wisdm/index.php
  • 21. The situation The labeled data comes from an accelerometer (37 users) Possible activities are: walking, jogging, sitting, standing, downstairs and upstairs. This is a classification problem here! Some algorithms to use: Decision tree, Random Forest, Multinomial logistic regression...
  • 22. How can I predict the user’s activity? 1. analyzing part: collect & clean data from a csv file store it in Cassandra define & extract features using Spark build the predictive model using MLlib 2. predicting part: collect data in real-time (REST) use the model to predict result MLlib https://github.com/nivdul/actitracker-cassandra-spark
  • 24. The accelerometer A sensor (in a smartphone) compute acceleration over X,Y,Z collect data every 50ms Each acceleration contains: • a timestamp (eg, 1428773040488) • acceleration along the X axis (unit is m/s²) • acceleration along the Y axis (unit is m/s²) • acceleration along the Z axis (unit is m/s²)
  • 25. Accelerometer Android app REST Api collecting data coming from a phone application
  • 26. Accelerometer Data Model CREATE KEYSPACE actitracker WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 }; CREATE TABLE users (user_id int, activity text, timestamp bigint, acc_x double, acc_y double, acc_z double, PRIMARY KEY ((user_id,activity),timestamp)); COPY users FROM '/path_to_your_data/data.csv' WITH HEADER = true;
  • 27. Accelerometer Data Model: logical view 8 walking 112578291481000 -5.13 8.15 1.31 8 walking 112578334541000 -5.05 8.16 1.31 8 walking 112578339541000 -5.15 8.16 1.36 8 walking 112578451484000 -5.48 8.17 1.31 8 walking 112578491615000 -5.33 8.16 1.18 activityuser_id timestamp acc_x acc_z acc_y graph from the Cityzen Data widget
  • 29. is a large-scale in-memory data processing framework • big data analytics in memory/disk • complements Hadoop • faster and more flexible • Resilient Distributed Datasets (RDD) interactive shell (scala & python) Lambda (Java 8) Spark ecosystem
  • 30. MLlib • regression • classification • clustering • optimization • collaborative filtering • feature extraction (TF-IDF, Word2Vec…) is Apache Spark's scalable machine learning library
  • 32. Identify features repetitive static VS walking, jogging, up/down stairs standing, sitting graph from the Cityzen Data widget
  • 33. The activities : jogging mean_x = 3.3 mean_y = -6.9 mean_z = 0.8 Y-axis: peaks spaced out about 0.25 seconds graph from the Cityzen Data widget
  • 34. The activities : walking mean_x = 1 mean_y = 10 mean_z = -0.3 Y-axis: peaks spaced about 0.5 seconds graph from the Cityzen Data widget
  • 35. The activities : up/downstairs Y-axis: peaks spaced about 0.75 seconds graph from the Cityzen Data widget up down
  • 36. The activities : standing graph from the Cityzen Data widget standing static activity: no peaks sitting
  • 37. The features • Average acceleration (for each axis) • Variance (for each axis) • Average absolute difference (for each axis) • Average resultant acceleration • Average time between peaks (max) (for Y-axis) Goal: compute these features for all the users (37) and activities (6) over few seconds window
  • 39.
  • 40. retrieve the data from Cassandra // define Spark context SparkConf sparkConf = new SparkConf() .setAppName("User's physical activity recognition") .set("spark.cassandra.connection.host", "127.0.0.1") .setMaster("local[*]"); JavaSparkContext sc = new JavaSparkContext(sparkConf); // retrieve data from Cassandra and create an CassandraRDD CassandraJavaRDD<CassandraRow> cassandraRowsRDD = javaFunctions(sc).cassandraTable("actitracker", "users");
  • 42. Feature: mean import org.apache.spark.mllib.stat.MultivariateStatisticalSummary; import org.apache.spark.mllib.stat.Statistics; private MultivariateStatisticalSummary summary; public ExtractFeature(JavaRDD<Vector> data) { this.summary = Statistics.colStats(data.rdd()); } // return a Vector (mean_acc_x, mean_acc_y, mean_acc_z) public Vector computeAvgAcc() { return this.summary.mean(); }
  • 43. Feature: avg time between peaks // define the maximum using the max function from MLlib double max = this.summary.max().toArray()[1]; // keep the timestamp of data point for which the value is greater than 0.9 * max // and sort it ! // Here: data = RDD (ts, acc_y) JavaRDD<Long> peaks = data.filter(record -> record[1] > 0.9 * max) .map(record -> record[0]) .sortBy(time -> time, true, 1);
  • 44. Feature: avg time between peaks // retrieve the first and last element of the RDD (sorted) Long firstElement = peaks.first(); Long lastElement = peaks.sortBy(time -> time, false, 1).first(); // compute the delta between each timestamp JavaRDD<Long> firstRDD = peaks.filter(record -> record > firstElement); JavaRDD<Long> secondRDD = peaks.filter(record -> record < lastElement); JavaRDD<Vector> product = firstRDD.zip(secondRDD) .map(pair -> pair._1() - pair._2()) // and keep it if the delta is != 0 .filter(value -> value > 0) .map(line -> Vectors.dense(line)); // compute the mean of the delta return Statistics.colStats(product.rdd()).mean().toArray()[0];
  • 45. Choose algorithms Random Forests Decision Trees Multiclass Logistic Regression MLlib Goal: identify the physical activity that a user is performing
  • 46. Decision Trees // Split data into 2 sets : training (60%) and test (40%) JavaRDD<LabeledPoint>[] splits = data.randomSplit(new double[]{0.6, 0.4}); JavaRDD<LabeledPoint> trainingData = splits[0].cache(); JavaRDD<LabeledPoint> testData = splits[1];
  • 47. Decision Trees// Decision Tree parameters Map<Integer, Integer> categoricalFeaturesInfo = new HashMap<>(); int numClasses = 4; String impurity = "gini"; int maxDepth = 9; int maxBins = 32; // create model final DecisionTreeModel model = DecisionTree.trainClassifier( trainingData, numClasses, categoricalFeaturesInfo, impurity, maxDepth, maxBins); // Evaluate model on training instances and compute training error JavaPairRDD<Double, Double> predictionAndLabel = testData.mapToPair(p -> new Tuple2<>(model.predict(p.features()), p.label())); Double testErrDT = 1.0 * predictionAndLabel.filter(pl -> !pl._1().equals(pl._2())).count() / testData.count(); // Save model model.save(sc, "actitracker");
  • 50. Accelerometer Android app REST Api collecting data coming from a phone application An example: https://github.com/MiraLak/accelerometer-rest-to-cassandra
  • 51. Predictions! // load the model saved before DecisionTreeModel model = DecisionTreeModel.load(sc.sc(), "actitracker"); // connection between Spark and Cassandra using the spark-cassandra-connector CassandraJavaRDD<CassandraRow> cassandraRowsRDD = javaFunctions(sc).cassandraTable("accelerations", "acceleration"); // retrieve data from Cassandra and create an CassandraRDD JavaRDD<CassandraRow> data = cassandraRowsRDD.select("timestamp", "acc_x", "acc_y", "acc_z") .where("user_id=?", "TEST_USER") .withDescOrder() .limit(250); Vector feature = computeFeature(sc); double prediction = model.predict(feature);
  • 52. How can I use my computations? possible applications: • adapt the music over your speed • detects lack of activity • smarter pacemakers • smarter oxygen therapy
  • 54. • http://cassandra.apache.org/ • http://planetcassandra.org/getting-started-with-time-series-data-modeling/ • https://github.com/datastax/spark-cassandra-connector • https://github.com/MiraLak/AccelerometerAndroidApp • https://github.com/MiraLak/accelerometer-rest-to-cassandra • https://spark.apache.org/docs/1.3.0/ • https://github.com/nivdul/actitracker-cassandra-spark • http://www.duchess-france.org/analyze-accelerometer-data-with-apache-spark-and-mllib/ http://www.cis.fordham.edu/wisdm/index.php Some references Thank you!