SlideShare une entreprise Scribd logo
1  sur  34
COURBOSPARK:
DECISION TREE FOR
TIME-SERIES ON SPARK
Christophe Salperwyck – EDF R&D
Simon Maby – OCTO Technology - @simonmaby
Xdata project: www.xdata.fr, grants from
"Investissement d'Avenir" program, 'Big Data' call
| 2
AGENDA
1. PROBLEM DESCRIPTION
2. IMPLEMENTATION
• Courbotree: presentation of the algorithm
• From mllib to courbospark
3. PERFORMANCES
• Configuration (cluster description, spark config…)
4. FEEDBACK ON SPARK/MLLIB
| 3
FRENCH METERS DATA
| 4
• 1 measure every 10 min
• 35 million customers
• Time-series: 144 points x 365 days
 Annual data volume: 1800 billion records, 120 TB
of raw data
BIG DATA!
| 5
LOAD CURVES CLASSIFICATION
Contract type Region … Equipment type Load Curve
9KVA 75 … Elec
6KVA 22 … Gas
… … … … …
12KVA 34 … Elec
| 6
WHY A DECISION TREE?
• Easy to understand
• Ability to explore the model
• Ability to choose the
expressivity of the model
| 7
Goal: find the most different curves depending on an explanatory
feature
How to split? we can either:
• Minimize curves dispersion (intra inertia)
or
• Maximize differences between average curves (inter inertia)
SPLIT CRITERIA: INERTIA
| 8
MAXIMIZE DIFFERENCES BETWEEN AVERAGE
CURVES (feature: Equipment Type)
Electrical
Gas
Hour
PinW
ArgMax(d)
mean
| 9
EXISTING DISTRIBUTED DECISION TREE
Scalable Distributed Decision Trees in Spark MLLib
Manish Amde (Origami Logic), Hirakendu Das (Yahoo! Inc.), Evan Sparks (UC Berkeley), Ameet
Talwalkar (UC Berkeley). Spark Summit 2014. http://spark-summit.org/wp-
content/uploads/2014/07/Scalable-Distributed-Decision-Trees-in-Spark-Made-Das-Sparks-Talwalkar.pdf
A MapReduce Implementation of C4.5 Decision Tree Algorithm
Wei Dai, Wei Ji. International Journal of Database Theory and Application. Vol. 7, No. 1, 2014, pages 49-
60. http://www.chinacloud.cn/upload/2014-03/14031920373451.pdf
PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce
Biswanath Panda, Joshua S. Herbach, Sugato Basu, Roberto J. Bayardo. VLDB 2009.
http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36296.pdf
Distributed Decision Tree Learning for Mining Big Data Streams
Arinto Murdopo, Master thesis, Yahoo ! Labs Barcelona, July 2013.
http://people.ac.upc.edu/leandro/emdc/arinto-emdc-thesis.pdf
| 10
MLLIB DECISION TREE PARALLELIZATION
| 11
Step 1:
compute average
curves
[0:10[ [10:20[ [0:10[ [10:20[ [0:10[ [10:20[
Host 1 Host 2 Host 3
[0:10[ [10:20[
Host 1
Step 2:
collect and find
the best split
HORIZONTAL STRATEGY
| 12
To build the tree:
• Criteria: entropy, Gini, variance
• Data structure: LabelPoint
FROM MLLIB TO COURBOSPARK
| 13
To build the tree:
• Criteria: entropy, Gini, variance, inertia (to compare time-series)
• Data structure: LabelPoint, TimeSeries
• Finding split point for nominal features
For data visualization of the tree:
• Quantile on the nodes and leaves
• Lost of inertia
• Number of curves per nodes, leaves
FROM MLLIB TO COURBOSPARK
| 14
DEALING WITH NOMINAL FEATURES
Current implementation for regression:
 order the categories by their mean on the target
A BC D
Partitions tested: {A}/{CBD}, {AC}/{BD}, {ACB}/{C}
| 15
NOMINAL VALUES: TYPE OF CONTRACT
4 CATEGORIES {A, B, C, D}
A B
C D?
| 16
DEALING WITH NOMINAL FEATURES
Hard to order curves…
Solution 1:
Compare curves 2 by 2  {A}/{BCD}, {AB}/{CD}, {ABC}/{D},
{AC}/{BD}…
Problem:
Combinatory problem depending on n the number of
different categories. Complexity is O(2n)
| 17
DEALING WITH NOMINAL FEATURES
Solution 2:
Agglomerative Hierarchical Clustering. Bottom up approach.
Complexity is O(n3) - we don’t expect n > 100
| 18
HOW TO
Algorithm parameters
Configure spark context
Load the data file
Learn the model
| 19
LOOKING FOR THE TEST CONFIGURATION
For a constant global capacity on 12 nodes:
•120 cores + 120 GB RAM
#Executors RAM per exec. Cores per exec. Performance on
100Gb data
12 10 GB 10 22 minutes
24 5 GB 5 17 minutes
60 2 GB 2 12 minutes
120 1 GB 1 15 minutes
| 20
SCALABILITY TO #CONTAINERS
| 21
SCALABILITY TO #CONTAINERS
| 22
SCALABILITY TO #CONTAINERS
| 23
SCALABILITY TO #LINES
| 24
FRAMEWORK STABILITY
Tested on:
• 10GB, 100GB, 200GB, 300GB,
400GB, 500GB, 1TB
• Categorical and continuous
variables
• Bin sizes from 100 to 1000
| 25
SCALABILITY TO #COLUMNS
| 26
SCALABILITY TO #CATEGORIES
| 27
| 28
REAL LIFE DATASET
0
50
100
150
200
250
300
350
400
0 200 400 600 800 1000 1200 1400
Timeinminutes
Data in GB
• 9 executors with 20 GB and 8 cores
• 10 to 1000 millions load curves (10 numerical and 10 categorical features)
| 29
• spark.default.parallelism
• spark.executor.memory
• spark.storage.memoryfraction
• spark.akka.framesize
TUNING
| 30
Developers view
• Flawless transition from local to cluster mode
• Debug mode with an IDE
• Good performances need knowledge
FEEDBACKS
| 31
HEY SCALA <3
| 32
Data Scientists view
• The API is not very data oriented
• …but now we have SparkSQL and Dataframes!
• IPython + pySpark
• Feature engineering VS model engineering
FEEDBACKS
| 33
OPS view
• Better than mapReduce
• Performances are predictable for tested code
• YARNed
• Lots of releases, MlLib code is evolving quickly
FEEDBACKS
| 34
FUTURE WORKS
• Unbalanced trees
• Improve performance
• Other criteria for time-series comparison
• Missing values in explanatory features

Contenu connexe

Tendances

Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache SparkIndicThreads
 
Creating fishnet using_arc_gis
Creating fishnet using_arc_gisCreating fishnet using_arc_gis
Creating fishnet using_arc_gisAshok Peddi
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechRob Emanuele
 
DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会Masashi Shibata
 
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S..."Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...Edge AI and Vision Alliance
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
 
Kaggle boschコンペ振り返り
Kaggle boschコンペ振り返りKaggle boschコンペ振り返り
Kaggle boschコンペ振り返りKeisuke Hosaka
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechRob Emanuele
 
Neo4j spatial-nosql-frankfurt
Neo4j spatial-nosql-frankfurtNeo4j spatial-nosql-frankfurt
Neo4j spatial-nosql-frankfurtPeter Neubauer
 
Geo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDXGeo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDXLuis Bermudez
 
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2Keisuke Hosaka
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DCCCRinc
 
Advanced Cartographic Map Rendering In GeoServer
Advanced Cartographic Map Rendering In GeoServerAdvanced Cartographic Map Rendering In GeoServer
Advanced Cartographic Map Rendering In GeoServerGeoSolutions
 
GeoMesa: Scalable Geospatial Analytics
GeoMesa:  Scalable Geospatial AnalyticsGeoMesa:  Scalable Geospatial Analytics
GeoMesa: Scalable Geospatial AnalyticsVisionGEOMATIQUE2014
 

Tendances (14)

Scrap Your MapReduce - Apache Spark
 Scrap Your MapReduce - Apache Spark Scrap Your MapReduce - Apache Spark
Scrap Your MapReduce - Apache Spark
 
Creating fishnet using_arc_gis
Creating fishnet using_arc_gisCreating fishnet using_arc_gis
Creating fishnet using_arc_gis
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
 
DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会DARTS: Differentiable Architecture Search at 社内論文読み会
DARTS: Differentiable Architecture Search at 社内論文読み会
 
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S..."Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
"Moving CNNs from Academic Theory to Embedded Reality," a Presentation from S...
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
Kaggle boschコンペ振り返り
Kaggle boschコンペ振り返りKaggle boschコンペ振り返り
Kaggle boschコンペ振り返り
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
Neo4j spatial-nosql-frankfurt
Neo4j spatial-nosql-frankfurtNeo4j spatial-nosql-frankfurt
Neo4j spatial-nosql-frankfurt
 
Geo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDXGeo Package and OWS Context at FOSS4G PDX
Geo Package and OWS Context at FOSS4G PDX
 
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
Kaggle bosch presentation material for Kaggle Tokyo Meetup #2
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
Advanced Cartographic Map Rendering In GeoServer
Advanced Cartographic Map Rendering In GeoServerAdvanced Cartographic Map Rendering In GeoServer
Advanced Cartographic Map Rendering In GeoServer
 
GeoMesa: Scalable Geospatial Analytics
GeoMesa:  Scalable Geospatial AnalyticsGeoMesa:  Scalable Geospatial Analytics
GeoMesa: Scalable Geospatial Analytics
 

En vedette

Petit-déjeuner OCTO - Management 3.0, au-delà du buzz word
Petit-déjeuner OCTO - Management 3.0, au-delà du buzz wordPetit-déjeuner OCTO - Management 3.0, au-delà du buzz word
Petit-déjeuner OCTO - Management 3.0, au-delà du buzz wordOCTO Technology
 
Petit-déjeuner OCTO - Changez de Mindset : pensez produit !
Petit-déjeuner OCTO - Changez de Mindset : pensez produit !Petit-déjeuner OCTO - Changez de Mindset : pensez produit !
Petit-déjeuner OCTO - Changez de Mindset : pensez produit !OCTO Technology
 
Petit-déjeuner "Cultiver l'art du code de qualité... Afin de livrer plus vite...
Petit-déjeuner "Cultiver l'art du code de qualité... Afin de livrer plus vite...Petit-déjeuner "Cultiver l'art du code de qualité... Afin de livrer plus vite...
Petit-déjeuner "Cultiver l'art du code de qualité... Afin de livrer plus vite...OCTO Technology
 
Petit-déjeuner OCTO - L'Infra au service de ses projets
Petit-déjeuner OCTO - L'Infra au service de ses projetsPetit-déjeuner OCTO - L'Infra au service de ses projets
Petit-déjeuner OCTO - L'Infra au service de ses projetsOCTO Technology
 
Hackathon, 3 jours chez les bricoleurs
Hackathon, 3 jours chez les bricoleursHackathon, 3 jours chez les bricoleurs
Hackathon, 3 jours chez les bricoleursOCTO Technology
 
Petit-déjeuner "Secteur Public : Retour d'expérience sur la refonte en agile ...
Petit-déjeuner "Secteur Public : Retour d'expérience sur la refonte en agile ...Petit-déjeuner "Secteur Public : Retour d'expérience sur la refonte en agile ...
Petit-déjeuner "Secteur Public : Retour d'expérience sur la refonte en agile ...OCTO Technology
 
#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon
#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon
#PortraitDeCDO - Laurent Assouad - Aéroport de LyonOCTO Technology
 
Petit-déjeuner "Psychanalyse du Chatbot"
Petit-déjeuner "Psychanalyse du Chatbot"Petit-déjeuner "Psychanalyse du Chatbot"
Petit-déjeuner "Psychanalyse du Chatbot"OCTO Technology
 
Solution de transfert mobile - Formats d'échange
Solution de transfert mobile - Formats d'échangeSolution de transfert mobile - Formats d'échange
Solution de transfert mobile - Formats d'échangeOCTO Technology
 
Ludovic cinquin octo - devoxx fr 2015 - les idées reçues de l'informatiqu...
Ludovic cinquin   octo - devoxx fr 2015 - les idées reçues de l'informatiqu...Ludovic cinquin   octo - devoxx fr 2015 - les idées reçues de l'informatiqu...
Ludovic cinquin octo - devoxx fr 2015 - les idées reçues de l'informatiqu...OCTO Technology
 
Petit-déjeuner OCTO : Culture Hacking
Petit-déjeuner OCTO : Culture HackingPetit-déjeuner OCTO : Culture Hacking
Petit-déjeuner OCTO : Culture HackingOCTO Technology
 
VERS UNE BANQUE MOBILE ? CHAPITRE 1 : Concevoir un produit bancaire remarquable
VERS UNE BANQUE MOBILE ? CHAPITRE 1 : Concevoir un produit bancaire remarquableVERS UNE BANQUE MOBILE ? CHAPITRE 1 : Concevoir un produit bancaire remarquable
VERS UNE BANQUE MOBILE ? CHAPITRE 1 : Concevoir un produit bancaire remarquableOCTO Technology
 
La Banque de demain : Chapitre 4
La Banque de demain : Chapitre 4 La Banque de demain : Chapitre 4
La Banque de demain : Chapitre 4 OCTO Technology
 
La banque de demain : quelles évolutions pour le modèle bancaire ?
La banque de demain : quelles évolutions pour le modèle bancaire ?La banque de demain : quelles évolutions pour le modèle bancaire ?
La banque de demain : quelles évolutions pour le modèle bancaire ?OCTO Technology
 
La Banque de demain, chapitre 3. L'open-banking : l'enjeu clé pour l'innovati...
La Banque de demain, chapitre 3. L'open-banking : l'enjeu clé pour l'innovati...La Banque de demain, chapitre 3. L'open-banking : l'enjeu clé pour l'innovati...
La Banque de demain, chapitre 3. L'open-banking : l'enjeu clé pour l'innovati...OCTO Technology
 
Petit-déjeuner OCTO Management 3.0 - Le Book
Petit-déjeuner OCTO Management 3.0 - Le BookPetit-déjeuner OCTO Management 3.0 - Le Book
Petit-déjeuner OCTO Management 3.0 - Le BookOCTO Technology
 
Petit-déjeuner OCTO du 03/06/2014 - La Révolution digitale
Petit-déjeuner OCTO du 03/06/2014 - La Révolution digitalePetit-déjeuner OCTO du 03/06/2014 - La Révolution digitale
Petit-déjeuner OCTO du 03/06/2014 - La Révolution digitaleOCTO Technology
 
#PortraitDeCDO - Juliette De Maupeou - Total
#PortraitDeCDO - Juliette De Maupeou - Total#PortraitDeCDO - Juliette De Maupeou - Total
#PortraitDeCDO - Juliette De Maupeou - TotalOCTO Technology
 
Petit-déjeuner OCTO - Objets connectés : We Are Able !
Petit-déjeuner OCTO - Objets connectés : We Are Able !Petit-déjeuner OCTO - Objets connectés : We Are Able !
Petit-déjeuner OCTO - Objets connectés : We Are Able !OCTO Technology
 
Engineering Data Scientist
Engineering Data ScientistEngineering Data Scientist
Engineering Data ScientistVincent HOLLEY
 

En vedette (20)

Petit-déjeuner OCTO - Management 3.0, au-delà du buzz word
Petit-déjeuner OCTO - Management 3.0, au-delà du buzz wordPetit-déjeuner OCTO - Management 3.0, au-delà du buzz word
Petit-déjeuner OCTO - Management 3.0, au-delà du buzz word
 
Petit-déjeuner OCTO - Changez de Mindset : pensez produit !
Petit-déjeuner OCTO - Changez de Mindset : pensez produit !Petit-déjeuner OCTO - Changez de Mindset : pensez produit !
Petit-déjeuner OCTO - Changez de Mindset : pensez produit !
 
Petit-déjeuner "Cultiver l'art du code de qualité... Afin de livrer plus vite...
Petit-déjeuner "Cultiver l'art du code de qualité... Afin de livrer plus vite...Petit-déjeuner "Cultiver l'art du code de qualité... Afin de livrer plus vite...
Petit-déjeuner "Cultiver l'art du code de qualité... Afin de livrer plus vite...
 
Petit-déjeuner OCTO - L'Infra au service de ses projets
Petit-déjeuner OCTO - L'Infra au service de ses projetsPetit-déjeuner OCTO - L'Infra au service de ses projets
Petit-déjeuner OCTO - L'Infra au service de ses projets
 
Hackathon, 3 jours chez les bricoleurs
Hackathon, 3 jours chez les bricoleursHackathon, 3 jours chez les bricoleurs
Hackathon, 3 jours chez les bricoleurs
 
Petit-déjeuner "Secteur Public : Retour d'expérience sur la refonte en agile ...
Petit-déjeuner "Secteur Public : Retour d'expérience sur la refonte en agile ...Petit-déjeuner "Secteur Public : Retour d'expérience sur la refonte en agile ...
Petit-déjeuner "Secteur Public : Retour d'expérience sur la refonte en agile ...
 
#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon
#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon
#PortraitDeCDO - Laurent Assouad - Aéroport de Lyon
 
Petit-déjeuner "Psychanalyse du Chatbot"
Petit-déjeuner "Psychanalyse du Chatbot"Petit-déjeuner "Psychanalyse du Chatbot"
Petit-déjeuner "Psychanalyse du Chatbot"
 
Solution de transfert mobile - Formats d'échange
Solution de transfert mobile - Formats d'échangeSolution de transfert mobile - Formats d'échange
Solution de transfert mobile - Formats d'échange
 
Ludovic cinquin octo - devoxx fr 2015 - les idées reçues de l'informatiqu...
Ludovic cinquin   octo - devoxx fr 2015 - les idées reçues de l'informatiqu...Ludovic cinquin   octo - devoxx fr 2015 - les idées reçues de l'informatiqu...
Ludovic cinquin octo - devoxx fr 2015 - les idées reçues de l'informatiqu...
 
Petit-déjeuner OCTO : Culture Hacking
Petit-déjeuner OCTO : Culture HackingPetit-déjeuner OCTO : Culture Hacking
Petit-déjeuner OCTO : Culture Hacking
 
VERS UNE BANQUE MOBILE ? CHAPITRE 1 : Concevoir un produit bancaire remarquable
VERS UNE BANQUE MOBILE ? CHAPITRE 1 : Concevoir un produit bancaire remarquableVERS UNE BANQUE MOBILE ? CHAPITRE 1 : Concevoir un produit bancaire remarquable
VERS UNE BANQUE MOBILE ? CHAPITRE 1 : Concevoir un produit bancaire remarquable
 
La Banque de demain : Chapitre 4
La Banque de demain : Chapitre 4 La Banque de demain : Chapitre 4
La Banque de demain : Chapitre 4
 
La banque de demain : quelles évolutions pour le modèle bancaire ?
La banque de demain : quelles évolutions pour le modèle bancaire ?La banque de demain : quelles évolutions pour le modèle bancaire ?
La banque de demain : quelles évolutions pour le modèle bancaire ?
 
La Banque de demain, chapitre 3. L'open-banking : l'enjeu clé pour l'innovati...
La Banque de demain, chapitre 3. L'open-banking : l'enjeu clé pour l'innovati...La Banque de demain, chapitre 3. L'open-banking : l'enjeu clé pour l'innovati...
La Banque de demain, chapitre 3. L'open-banking : l'enjeu clé pour l'innovati...
 
Petit-déjeuner OCTO Management 3.0 - Le Book
Petit-déjeuner OCTO Management 3.0 - Le BookPetit-déjeuner OCTO Management 3.0 - Le Book
Petit-déjeuner OCTO Management 3.0 - Le Book
 
Petit-déjeuner OCTO du 03/06/2014 - La Révolution digitale
Petit-déjeuner OCTO du 03/06/2014 - La Révolution digitalePetit-déjeuner OCTO du 03/06/2014 - La Révolution digitale
Petit-déjeuner OCTO du 03/06/2014 - La Révolution digitale
 
#PortraitDeCDO - Juliette De Maupeou - Total
#PortraitDeCDO - Juliette De Maupeou - Total#PortraitDeCDO - Juliette De Maupeou - Total
#PortraitDeCDO - Juliette De Maupeou - Total
 
Petit-déjeuner OCTO - Objets connectés : We Are Able !
Petit-déjeuner OCTO - Objets connectés : We Are Able !Petit-déjeuner OCTO - Objets connectés : We Are Able !
Petit-déjeuner OCTO - Objets connectés : We Are Able !
 
Engineering Data Scientist
Engineering Data ScientistEngineering Data Scientist
Engineering Data Scientist
 

Similaire à COURBOSPARK: DECISION TREE FOR TIME-SERIES ON SPARK

CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkDataWorks Summit
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facilityinside-BigData.com
 
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Purpose-built NoSQL Database for IoT by Basavaraj SoppannavarPurpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Purpose-built NoSQL Database for IoT by Basavaraj SoppannavarData Con LA
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...DataStax
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Edwin Poot
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Community
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Community
 
The SKA Project - The World's Largest Streaming Data Processor
The SKA Project - The World's Largest Streaming Data ProcessorThe SKA Project - The World's Largest Streaming Data Processor
The SKA Project - The World's Largest Streaming Data Processorinside-BigData.com
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...Amazon Web Services
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDBMongoDB
 
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...Romeo Kienzler
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...Amazon Web Services
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! Embarcadero Technologies
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumbergerinside-BigData.com
 
StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9StorPool Storage
 

Similaire à COURBOSPARK: DECISION TREE FOR TIME-SERIES ON SPARK (20)

CourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on SparkCourboSpark: Decision Tree for Time-series on Spark
CourboSpark: Decision Tree for Time-series on Spark
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Benefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a ServiceBenefits of Hadoop as Platform as a Service
Benefits of Hadoop as Platform as a Service
 
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Purpose-built NoSQL Database for IoT by Basavaraj SoppannavarPurpose-built NoSQL Database for IoT by Basavaraj Soppannavar
Purpose-built NoSQL Database for IoT by Basavaraj Soppannavar
 
Srikanta Mishra
Srikanta MishraSrikanta Mishra
Srikanta Mishra
 
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
Webinar: Dyn + DataStax - helping companies deliver exceptional end-user expe...
 
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
Analyzing petabytes of smartmeter data using Cloud Bigtable, Cloud Dataflow, ...
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic Cloud
 
Ceph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic CloudCeph Day Berlin: Scaling an Academic Cloud
Ceph Day Berlin: Scaling an Academic Cloud
 
The SKA Project - The World's Largest Streaming Data Processor
The SKA Project - The World's Largest Streaming Data ProcessorThe SKA Project - The World's Largest Streaming Data Processor
The SKA Project - The World's Largest Streaming Data Processor
 
The state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the CloudThe state of SQL-on-Hadoop in the Cloud
The state of SQL-on-Hadoop in the Cloud
 
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
AWS re:Invent 2016: Large-Scale, Cloud-Based Analysis of Cancer Genomes: Less...
 
Agility and Scalability with MongoDB
Agility and Scalability with MongoDBAgility and Scalability with MongoDB
Agility and Scalability with MongoDB
 
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
 
CONDOR @ NGCLE@e-Novia 15.11.2017
CONDOR @ NGCLE@e-Novia 15.11.2017CONDOR @ NGCLE@e-Novia 15.11.2017
CONDOR @ NGCLE@e-Novia 15.11.2017
 
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
AWS Public Sector Symposium 2014 Canberra | Big Data in the Cloud: Accelerati...
 
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News! ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
ER/Studio and DB PowerStudio Launch Webinar: Big Data, Big Models, Big News!
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9StorPool Presents at Cloud Field Day 9
StorPool Presents at Cloud Field Day 9
 
Welcome to the Datasphere – the next level of storage
Welcome to the Datasphere – the next level of storageWelcome to the Datasphere – the next level of storage
Welcome to the Datasphere – the next level of storage
 

Plus de OCTO Technology

Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloudLe Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloudOCTO Technology
 
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...OCTO Technology
 
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...OCTO Technology
 
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...OCTO Technology
 
OCTO Talks - Les IA s'invitent au chevet des développeurs
OCTO Talks - Les IA s'invitent au chevet des développeursOCTO Talks - Les IA s'invitent au chevet des développeurs
OCTO Talks - Les IA s'invitent au chevet des développeursOCTO Technology
 
OCTO Talks - Lancement du livre Culture Test
OCTO Talks - Lancement du livre Culture TestOCTO Talks - Lancement du livre Culture Test
OCTO Talks - Lancement du livre Culture TestOCTO Technology
 
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...OCTO Technology
 
OCTO Talks - State of the art Architecture dans les frontend web
OCTO Talks - State of the art Architecture dans les frontend webOCTO Talks - State of the art Architecture dans les frontend web
OCTO Talks - State of the art Architecture dans les frontend webOCTO Technology
 
Comptoir OCTO ALD Automotive/Leaseplan
Comptoir OCTO ALD Automotive/LeaseplanComptoir OCTO ALD Automotive/Leaseplan
Comptoir OCTO ALD Automotive/LeaseplanOCTO Technology
 
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ? Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ? OCTO Technology
 
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...OCTO Technology
 
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...
Le Comptoir OCTO -  Affinez vos forecasts avec la planification distribuée et...Le Comptoir OCTO -  Affinez vos forecasts avec la planification distribuée et...
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...OCTO Technology
 
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conceptionLe Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conceptionOCTO Technology
 
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...OCTO Technology
 
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone :  les solutions E...Le Comptoir OCTO - L'avenir de la gestion du bilan carbone :  les solutions E...
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...OCTO Technology
 
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...OCTO Technology
 
RefCard Tests sur tous les fronts
RefCard Tests sur tous les frontsRefCard Tests sur tous les fronts
RefCard Tests sur tous les frontsOCTO Technology
 
RefCard RESTful API Design
RefCard RESTful API DesignRefCard RESTful API Design
RefCard RESTful API DesignOCTO Technology
 
RefCard API Architecture Strategy
RefCard API Architecture StrategyRefCard API Architecture Strategy
RefCard API Architecture StrategyOCTO Technology
 

Plus de OCTO Technology (20)

Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloudLe Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
 
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
 
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
 
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
 
OCTO Talks - Les IA s'invitent au chevet des développeurs
OCTO Talks - Les IA s'invitent au chevet des développeursOCTO Talks - Les IA s'invitent au chevet des développeurs
OCTO Talks - Les IA s'invitent au chevet des développeurs
 
OCTO Talks - Lancement du livre Culture Test
OCTO Talks - Lancement du livre Culture TestOCTO Talks - Lancement du livre Culture Test
OCTO Talks - Lancement du livre Culture Test
 
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
 
OCTO Talks - State of the art Architecture dans les frontend web
OCTO Talks - State of the art Architecture dans les frontend webOCTO Talks - State of the art Architecture dans les frontend web
OCTO Talks - State of the art Architecture dans les frontend web
 
Refcard GraphQL
Refcard GraphQLRefcard GraphQL
Refcard GraphQL
 
Comptoir OCTO ALD Automotive/Leaseplan
Comptoir OCTO ALD Automotive/LeaseplanComptoir OCTO ALD Automotive/Leaseplan
Comptoir OCTO ALD Automotive/Leaseplan
 
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ? Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
 
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
 
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...
Le Comptoir OCTO -  Affinez vos forecasts avec la planification distribuée et...Le Comptoir OCTO -  Affinez vos forecasts avec la planification distribuée et...
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...
 
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conceptionLe Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
 
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
 
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone :  les solutions E...Le Comptoir OCTO - L'avenir de la gestion du bilan carbone :  les solutions E...
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...
 
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
 
RefCard Tests sur tous les fronts
RefCard Tests sur tous les frontsRefCard Tests sur tous les fronts
RefCard Tests sur tous les fronts
 
RefCard RESTful API Design
RefCard RESTful API DesignRefCard RESTful API Design
RefCard RESTful API Design
 
RefCard API Architecture Strategy
RefCard API Architecture StrategyRefCard API Architecture Strategy
RefCard API Architecture Strategy
 

Dernier

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxdolaknnilon
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.natarajan8993
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfchwongval
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Sapana Sha
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxUnduhUnggah1
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 

Dernier (20)

NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
IMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptxIMA MSN - Medical Students Network (2).pptx
IMA MSN - Medical Students Network (2).pptx
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.RABBIT: A CLI tool for identifying bots based on their GitHub events.
RABBIT: A CLI tool for identifying bots based on their GitHub events.
 
Multiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdfMultiple time frame trading analysis -brianshannon.pdf
Multiple time frame trading analysis -brianshannon.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
Saket, (-DELHI )+91-9654467111-(=)CHEAP Call Girls in Escorts Service Saket C...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
MK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docxMK KOMUNIKASI DATA (TI)komdat komdat.docx
MK KOMUNIKASI DATA (TI)komdat komdat.docx
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 

COURBOSPARK: DECISION TREE FOR TIME-SERIES ON SPARK

  • 1. COURBOSPARK: DECISION TREE FOR TIME-SERIES ON SPARK Christophe Salperwyck – EDF R&D Simon Maby – OCTO Technology - @simonmaby Xdata project: www.xdata.fr, grants from "Investissement d'Avenir" program, 'Big Data' call
  • 2. | 2 AGENDA 1. PROBLEM DESCRIPTION 2. IMPLEMENTATION • Courbotree: presentation of the algorithm • From mllib to courbospark 3. PERFORMANCES • Configuration (cluster description, spark config…) 4. FEEDBACK ON SPARK/MLLIB
  • 4. | 4 • 1 measure every 10 min • 35 million customers • Time-series: 144 points x 365 days  Annual data volume: 1800 billion records, 120 TB of raw data BIG DATA!
  • 5. | 5 LOAD CURVES CLASSIFICATION Contract type Region … Equipment type Load Curve 9KVA 75 … Elec 6KVA 22 … Gas … … … … … 12KVA 34 … Elec
  • 6. | 6 WHY A DECISION TREE? • Easy to understand • Ability to explore the model • Ability to choose the expressivity of the model
  • 7. | 7 Goal: find the most different curves depending on an explanatory feature How to split? we can either: • Minimize curves dispersion (intra inertia) or • Maximize differences between average curves (inter inertia) SPLIT CRITERIA: INERTIA
  • 8. | 8 MAXIMIZE DIFFERENCES BETWEEN AVERAGE CURVES (feature: Equipment Type) Electrical Gas Hour PinW ArgMax(d) mean
  • 9. | 9 EXISTING DISTRIBUTED DECISION TREE Scalable Distributed Decision Trees in Spark MLLib Manish Amde (Origami Logic), Hirakendu Das (Yahoo! Inc.), Evan Sparks (UC Berkeley), Ameet Talwalkar (UC Berkeley). Spark Summit 2014. http://spark-summit.org/wp- content/uploads/2014/07/Scalable-Distributed-Decision-Trees-in-Spark-Made-Das-Sparks-Talwalkar.pdf A MapReduce Implementation of C4.5 Decision Tree Algorithm Wei Dai, Wei Ji. International Journal of Database Theory and Application. Vol. 7, No. 1, 2014, pages 49- 60. http://www.chinacloud.cn/upload/2014-03/14031920373451.pdf PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce Biswanath Panda, Joshua S. Herbach, Sugato Basu, Roberto J. Bayardo. VLDB 2009. http://static.googleusercontent.com/media/research.google.com/en//pubs/archive/36296.pdf Distributed Decision Tree Learning for Mining Big Data Streams Arinto Murdopo, Master thesis, Yahoo ! Labs Barcelona, July 2013. http://people.ac.upc.edu/leandro/emdc/arinto-emdc-thesis.pdf
  • 10. | 10 MLLIB DECISION TREE PARALLELIZATION
  • 11. | 11 Step 1: compute average curves [0:10[ [10:20[ [0:10[ [10:20[ [0:10[ [10:20[ Host 1 Host 2 Host 3 [0:10[ [10:20[ Host 1 Step 2: collect and find the best split HORIZONTAL STRATEGY
  • 12. | 12 To build the tree: • Criteria: entropy, Gini, variance • Data structure: LabelPoint FROM MLLIB TO COURBOSPARK
  • 13. | 13 To build the tree: • Criteria: entropy, Gini, variance, inertia (to compare time-series) • Data structure: LabelPoint, TimeSeries • Finding split point for nominal features For data visualization of the tree: • Quantile on the nodes and leaves • Lost of inertia • Number of curves per nodes, leaves FROM MLLIB TO COURBOSPARK
  • 14. | 14 DEALING WITH NOMINAL FEATURES Current implementation for regression:  order the categories by their mean on the target A BC D Partitions tested: {A}/{CBD}, {AC}/{BD}, {ACB}/{C}
  • 15. | 15 NOMINAL VALUES: TYPE OF CONTRACT 4 CATEGORIES {A, B, C, D} A B C D?
  • 16. | 16 DEALING WITH NOMINAL FEATURES Hard to order curves… Solution 1: Compare curves 2 by 2  {A}/{BCD}, {AB}/{CD}, {ABC}/{D}, {AC}/{BD}… Problem: Combinatory problem depending on n the number of different categories. Complexity is O(2n)
  • 17. | 17 DEALING WITH NOMINAL FEATURES Solution 2: Agglomerative Hierarchical Clustering. Bottom up approach. Complexity is O(n3) - we don’t expect n > 100
  • 18. | 18 HOW TO Algorithm parameters Configure spark context Load the data file Learn the model
  • 19. | 19 LOOKING FOR THE TEST CONFIGURATION For a constant global capacity on 12 nodes: •120 cores + 120 GB RAM #Executors RAM per exec. Cores per exec. Performance on 100Gb data 12 10 GB 10 22 minutes 24 5 GB 5 17 minutes 60 2 GB 2 12 minutes 120 1 GB 1 15 minutes
  • 20. | 20 SCALABILITY TO #CONTAINERS
  • 21. | 21 SCALABILITY TO #CONTAINERS
  • 22. | 22 SCALABILITY TO #CONTAINERS
  • 24. | 24 FRAMEWORK STABILITY Tested on: • 10GB, 100GB, 200GB, 300GB, 400GB, 500GB, 1TB • Categorical and continuous variables • Bin sizes from 100 to 1000
  • 26. | 26 SCALABILITY TO #CATEGORIES
  • 27. | 27
  • 28. | 28 REAL LIFE DATASET 0 50 100 150 200 250 300 350 400 0 200 400 600 800 1000 1200 1400 Timeinminutes Data in GB • 9 executors with 20 GB and 8 cores • 10 to 1000 millions load curves (10 numerical and 10 categorical features)
  • 29. | 29 • spark.default.parallelism • spark.executor.memory • spark.storage.memoryfraction • spark.akka.framesize TUNING
  • 30. | 30 Developers view • Flawless transition from local to cluster mode • Debug mode with an IDE • Good performances need knowledge FEEDBACKS
  • 32. | 32 Data Scientists view • The API is not very data oriented • …but now we have SparkSQL and Dataframes! • IPython + pySpark • Feature engineering VS model engineering FEEDBACKS
  • 33. | 33 OPS view • Better than mapReduce • Performances are predictable for tested code • YARNed • Lots of releases, MlLib code is evolving quickly FEEDBACKS
  • 34. | 34 FUTURE WORKS • Unbalanced trees • Improve performance • Other criteria for time-series comparison • Missing values in explanatory features