SlideShare une entreprise Scribd logo
1  sur  38
Télécharger pour lire hors ligne
1
2
OVHcloud Data Processing :
Le nouveau service pour valoriser vos
données à l'aide du cloud
BELLLION Bastien
2020-06-25
3
Bastien BELLION
• Data Convergence team
• bastien.bellion@ovhcloud.com
• @BastienBellion
Data Convergence :
• Analytics Data Platform
• Data Processing
4
Content
• Big Data
• Cluster Computing
• Apache Spark
• Exemple 1: Lyrics Wordcount
• OVHcloud Data Processing platform
• Exemple 2: New-York Taxis
• Roadmap
5
Why do we need to process data?
• Invent new products
• Make better decisions
• Prevent danger and fraud
• Forecasting events
6
Flood of data
• 90 percent of data in the world has been
generated in last 2 years.
• Autonomouscars: 20TB /hour
• 500 million tweets per day.
• 75 billion IoT devices by 2025
• Data exhaust
7
Where is the data stored?
8
Cluster computing
Cluster
9
Selecting the right tool
• Open source
• Biggest open source
community in big data
• Fast
• Java/Scala/Python/R
• Cluster
Big data tools – Google trend graph
10
Who is using Spark ?
https://databricks.com/fr/session/netflixs-recommendation-ml-
pipeline-using-apache-spark
https://labs.criteo.com/2018/01/spark-out-of-memory/
11
Spark
12
Spark
Spark Cluster
Job 1
Job 2
Job 3
spark-submit --class org.apache.spark.examples.SparkPi
 --driver-memory 8G
 --master local[2]
 --executor-memory 8G
 --executor-cores 2
 --num-executors 3
 spark-examples.jar
13
Example 1 : Lyrics Wordcount
https://github.com/walkerkq/musiclyrics
Billboard hot 100 between 1965-2015
5100 song with for each:
• Rank of the year
• Song name
• Artist
• Year
• Lyrics
• Source
Full written guide : https://docs.ovh.com/gb/en/data-processing/wordcount-spark/
14
Example 1 : Lyrics Wordcount
1/3 of dataset, 1700 songs
1/3 of dataset, 1700 songs
1/3 of dataset, 1700 songs
Lyrics.csv
15
Example 1 : Lyrics Wordcount
Volume of processed data Time elapsed one computer Time elapsed spark cluster
7.59 Mb 0,5 s 60 s
759 Mb 5,1 s 100 s
7.59 Gb 48 s 186 s
76 Gb 490 s 358 s
760 Gb 4 820 s / 80 minutes 2 250 s / 37 minutes
Work computer
4 cpu cores, 8 threads
16 Gb of RAM
OVHcloud instances
10 workers
2 cpu cores per workers
8 Gb of RAM per workers
16
Manual Spark cluster problems
• Take time and knowledge to setup
• Cluster Management
Spark version 2.2.1
125 Cores , 500GB RAM
GRA
Job 1
Job 2
Job 3
Spark 2.2.1
100GB RAM, 25 Cores
GRA
Spark 2.2.1
150GB RAM, 37 Cores
GRA
Spark 2.2.1
100GB RAM, 25 Cores
GRA
17
Manual Spark cluster problems
• Take time and knowledge to setup
• Cluster Management
• Spark version bounded
Spark version 2.2.1
125 Cores , 500GB RAM
GRA
Job 1
Job 2
Job 3
Spark 2.2.1
100GB RAM, 25 Cores
GRA
Spark 2.2.1
150GB RAM, 37 Cores
GRA
Spark 2.2.1
100GB RAM, 25 Cores
GRA
Job 4
Spark 3.0.0
50GB RAM, 10 Cores
GRA
18
Manual Spark cluster problems
• Take time and knowledge to setup
• Cluster Management
• Spark version bounded
• Limited ressources
Spark version 2.2.1
125 Cores , 500GB RAM
GRA
Job 1
Job 2
Job 3
Spark 2.2.1
100GB RAM, 25 Cores
GRA
Spark 2.2.1
150GB RAM, 37 Cores
GRA
Spark 2.2.1
100GB RAM, 25 Cores
GRA
Job 4
Spark 2.2.1
200GB RAM, 50 Cores
GRA
19
Manual Spark cluster problems
• Take time and knowledge to setup
• Cluster Management
• Spark version bounded
• Limited ressources
• Region bounded
Spark version 2.2.1
125 Cores , 500GB RAM
GRA
Job 1
Job 2
Job 3
Spark 2.2.1
100GB RAM, 25 Cores
GRA
Spark 2.2.1
150GB RAM, 37 Cores
GRA
Spark 2.2.1
100GB RAM, 25 Cores
GRA
Job 4
Spark 2.2.1
50GB RAM, 10 Cores
BHS
20
Manual Spark cluster problems
Job 1
Job 2
Job 3
Spark 2.4.5
4 nodes, 100GB RAM
GRA
Spark 2.4.3
5 nodes, 200GB RAM
BHS
Spark 3
2 nodes, 50GB RAM
GRA
• No Cluster Management
• Start a cluster in seconds
• Dedicated resources
• No Unused resources
• Any Spark version
• Any region
Job 4
Spark 3.0.0
200GB RAM, 50 Cores
BHS
21
OVHcloud Data Processing
A single cluster per job To focus on what matters
Be in the cloud To scale up and down quickly
Bin different regions Data policy compliance and be close to
data
Distributed Computing
Technology
To distribute processing task to several
computers automatically
22
OVHcloud Data Processing
A single cluster per job To focus on what matters
Be in the cloud To scale up and down quickly
Bin different regions Data policy compliance and be close to
data
Distributed Computing
Technology
To distribute processing task to several
computers automatically
OVHcloud Data Processing
23
OVHcloud Data Processing
24
OVHcloud Data Processing
25
OVHcloud Data Processing
26
OVHcloud Data Processing
27
OVHcloud Data Processing
28
OVHcloud Data Processing
Supported languages :
• Java 8
• Scala 2.12
• Python v2.7+ and v3.4+
Available regions :
• Europe
• North America (Soon)
Engine:
• Apache Spark 2.4.3
• Apache Spark 3.0.0 (Soon)
Job size :
• Up to 60 GB of RAM per Executor
• Up to 16 Cores of CPU per Executor
• Up to 10 Executors per job
29
1 Via Control Panel 2 Via API OVHcloud 3 Via CLI (Spark-Submit)
$ ./ovh-spark-submit 
--project-id yourProjectId 
--upload ./spark-examples.jar 
--class org.apache.spark.examples.SparkPi 
--driver-cores 1 
--driver-memory 4G 
--executor-cores 1 
--executor-memory 4G 
--num-executors 1 
swift://odp/spark-examples.jar 1000
• Auto upload your application code
• Open source
https://github.com/ovh/data-processing-spark-submit
30
Example 2 : New-YorkTaxis
• Date
• Number of people per ride
• Price
• Tips
• Pickup and dropoff zone
• 10 years worth of taxis ride
• +100 Go of data
31
Example 2 : New-YorkTaxis
First step: make the data accessible
Public Cloud
Object Storage
32
Example 2 : New-YorkTaxis
Second step: submit your job
Data Processing
Start a new job
33
Example 2 : New-YorkTaxis
Second step: submit your job
34
Example 2 : New-YorkTaxis
Second step: submit your job
35
Example 2 : New-YorkTaxis
Second step: submit your job
36
Example 2 : New-YorkTaxis
.
.
. ..
..
.
.
Bill
I submit 1job, with
• 1 driver ( 4 vCore / 15GB RAM)
• 5 executors( 5 x 8vCore / 30GB RAM)
I alsoused OVHcloud Object Storage
Job lasted15 minutes
Payment: (44 vCores + 165 GB RAM ) for15 minutes
= 1,33€
+ classicpriceof Object Storage(storageanddatatransfer)
37
Roadmap
Development Lab GA Soon
ETA = July 2020Launched (April)
Free ! Full working:
• Product itself
• Control panel/API/CLI
• Documentation
Added:
• Bug fix
• Billing
S2 2020
Backlog:
• More Regions
• More flavors/prices
• More Engines
• More Spark versions
38
Thank You !
https://gitter.im/ovh/data-processing
https://www.ovh.com/blog/ovhcloud-techs-talk-fr-s01e09/

Contenu connexe

Tendances

Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
DataStax Academy
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
DataStax
 

Tendances (20)

How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)How to size up an Apache Cassandra cluster (Training)
How to size up an Apache Cassandra cluster (Training)
 
Cassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large NodesCassandra TK 2014 - Large Nodes
Cassandra TK 2014 - Large Nodes
 
Webinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in ProductionWebinar: Diagnosing Apache Cassandra Problems in Production
Webinar: Diagnosing Apache Cassandra Problems in Production
 
1 sysadmin vs 250 clusters de stockage
1 sysadmin vs 250 clusters de stockage1 sysadmin vs 250 clusters de stockage
1 sysadmin vs 250 clusters de stockage
 
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
iland Internet Solutions: Leveraging Cassandra for real-time multi-datacenter...
 
Cassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For OperatorsCassandra Summit 2015: Real World DTCS For Operators
Cassandra Summit 2015: Real World DTCS For Operators
 
Cassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stackCassandra Adoption on Cisco UCS & Open stack
Cassandra Adoption on Cisco UCS & Open stack
 
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User StoreAzure + DataStax Enterprise (DSE) Powers Office365 Per User Store
Azure + DataStax Enterprise (DSE) Powers Office365 Per User Store
 
Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2Oracle to Cassandra Core Concepts Guide Pt. 2
Oracle to Cassandra Core Concepts Guide Pt. 2
 
mParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from CassandramParticle's Journey to Scylla from Cassandra
mParticle's Journey to Scylla from Cassandra
 
Wide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data ModelingWide Column Store NoSQL vs SQL Data Modeling
Wide Column Store NoSQL vs SQL Data Modeling
 
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
Lessons Learned on Java Tuning for Our Cassandra Clusters (Carlos Monroy, Kne...
 
Scylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla OperatorScylla on Kubernetes: Introducing the Scylla Operator
Scylla on Kubernetes: Introducing the Scylla Operator
 
Understanding DSE Search by Matt Stump
Understanding DSE Search by Matt StumpUnderstanding DSE Search by Matt Stump
Understanding DSE Search by Matt Stump
 
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
Lessons Learned From Running 1800 Clusters (Brooke Jensen, Instaclustr) | Cas...
 
Building Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStaxBuilding Scalable, Real Time Applications for Financial Services with DataStax
Building Scalable, Real Time Applications for Financial Services with DataStax
 
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
Cold Storage That Isn't Glacial (Joshua Hollander, Protectwise) | Cassandra S...
 
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
Troubleshooting Cassandra (J.B. Langston, DataStax) | C* Summit 2016
 
Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise
Cassandra Day London 2015: Securing Cassandra and DataStax EnterpriseCassandra Day London 2015: Securing Cassandra and DataStax Enterprise
Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise
 
How to achieve no compromise performance and availability
How to achieve no compromise performance and availabilityHow to achieve no compromise performance and availability
How to achieve no compromise performance and availability
 

Similaire à OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service pour valoriser vos données à l'aide du cloud

The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
Nicolas Poggi
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
Kristofferson A
 

Similaire à OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service pour valoriser vos données à l'aide du cloud (20)

Spark Uber Development Kit
Spark Uber Development KitSpark Uber Development Kit
Spark Uber Development Kit
 
The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)The state of Hive and Spark in the Cloud (July 2017)
The state of Hive and Spark in the Cloud (July 2017)
 
The state of Spark in the cloud
The state of Spark in the cloudThe state of Spark in the cloud
The state of Spark in the cloud
 
Using Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.comUsing Riak for Events storage and analysis at Booking.com
Using Riak for Events storage and analysis at Booking.com
 
Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)Using BigBench to compare Hive and Spark (Long version)
Using BigBench to compare Hive and Spark (Long version)
 
High performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User GroupHigh performace network of Cloud Native Taiwan User Group
High performace network of Cloud Native Taiwan User Group
 
Machine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkMLMachine Learning With H2O vs SparkML
Machine Learning With H2O vs SparkML
 
Hybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGsHybrid Transactional/Analytics Processing with Spark and IMDGs
Hybrid Transactional/Analytics Processing with Spark and IMDGs
 
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-PremiseTackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
Tackling Network Bottlenecks with Hardware Accelerations: Cloud vs. On-Premise
 
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)Oracle Cloud Infrastructure Data Science 概要資料(20200406)
Oracle Cloud Infrastructure Data Science 概要資料(20200406)
 
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with SchlumbergerGet Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
Get Your Head in the Cloud - Lessons in GPU Computing with Schlumberger
 
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second(BDT318) How Netflix Handles Up To 8 Million Events Per Second
(BDT318) How Netflix Handles Up To 8 Million Events Per Second
 
There's no magic... until you talk about databases
 There's no magic... until you talk about databases There's no magic... until you talk about databases
There's no magic... until you talk about databases
 
Webinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case StudyWebinar - DreamObjects/Ceph Case Study
Webinar - DreamObjects/Ceph Case Study
 
Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)Introduction to Apache Geode (Cork, Ireland)
Introduction to Apache Geode (Cork, Ireland)
 
Databricks clusters in autopilot mode
Databricks clusters in autopilot modeDatabricks clusters in autopilot mode
Databricks clusters in autopilot mode
 
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using OpenstackCloud: From Unmanned Data Center to Algorithmic Economy using Openstack
Cloud: From Unmanned Data Center to Algorithmic Economy using Openstack
 
KSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success StoryKSCOPE 2013: Exadata Consolidation Success Story
KSCOPE 2013: Exadata Consolidation Success Story
 
Leveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark PipelinesLeveraging Databricks for Spark Pipelines
Leveraging Databricks for Spark Pipelines
 
Leveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelinesLeveraging Databricks for Spark pipelines
Leveraging Databricks for Spark pipelines
 

Plus de OVHcloud

Plus de OVHcloud (20)

OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups OVHcloud Startup Program : Découvrir l'écosystème au service des startups
OVHcloud Startup Program : Découvrir l'écosystème au service des startups
 
Fine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP modelsFine tune and deploy Hugging Face NLP models
Fine tune and deploy Hugging Face NLP models
 
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
OVHcloud Tech Talks S01E08 - GAIA-X pour les techs : OVHcloud & Scaleway vous...
 
Webinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud DatabasesWebinar - Enterprise Cloud Databases
Webinar - Enterprise Cloud Databases
 
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
OVHcloud Tech Talks S01E07 – Introduction à l’intelligence artificielle pour ...
 
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
OVHcloud Tech Talks Fr S01E06 – BeeGFS, un filesystem orienté performance, ma...
 
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
OVHcloud Tech Talks Fr S01E05 – L’opérateur Harbor, une nécessité pour certai...
 
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilitéOVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
OVHcloud Tech-Talk S01E04 - La télémétrie au service de l'agilité
 
OVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML servingOVHcloud TechTalks - ML serving
OVHcloud TechTalks - ML serving
 
Logs @ OVHcloud
Logs @ OVHcloudLogs @ OVHcloud
Logs @ OVHcloud
 
Les APIs OpenStack
Les APIs OpenStackLes APIs OpenStack
Les APIs OpenStack
 
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
Migrer 3 millions de sites sans maitriser leur code source ? Impossible mais ...
 
Industrialize Machine Learning
Industrialize Machine Learning Industrialize Machine Learning
Industrialize Machine Learning
 
OVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud DatabasesOVHcloud – Enterprise Cloud Databases
OVHcloud – Enterprise Cloud Databases
 
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSXOVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
OVHcloud Hosted Private Cloud Platform Network use cases with VMware NSX
 
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
Pilotage et gestion proactive de vos machines virtuelles dans le Hosted Priva...
 
Online passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattackOnline passwords – understanding "credential stuffing" cyberattack
Online passwords – understanding "credential stuffing" cyberattack
 
OVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sectorOVHcloud and Microsoft for the public sector
OVHcloud and Microsoft for the public sector
 
The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?The new AMD EPYC solutions from OVHcloud: what benefits?
The new AMD EPYC solutions from OVHcloud: what benefits?
 
Maximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructureMaximising the security of your cloud infrastructure
Maximising the security of your cloud infrastructure
 

Dernier

( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
nilamkumrai
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
nirzagarg
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 

Dernier (20)

VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
VVIP Pune Call Girls Sinhagad WhatSapp Number 8005736733 With Elite Staff And...
 
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
VIP Model Call Girls NIBM ( Pune ) Call ON 8005736733 Starting From 5K to 25K...
 
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
Call Girls Sangvi Call Me 7737669865 Budget Friendly No Advance BookingCall G...
 
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
20240509 QFM015 Engineering Leadership Reading List April 2024.pdf
 
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort ServiceBusty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
Busty Desi⚡Call Girls in Vasundhara Ghaziabad >༒8448380779 Escort Service
 
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
( Pune ) VIP Baner Call Girls 🎗️ 9352988975 Sizzling | Escorts | Girls Are Re...
 
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
All Time Service Available Call Girls Mg Road 👌 ⏭️ 6378878445
 
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Green Park Escort Service Delhi N.C.R.
 
Trump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts SweatshirtTrump Diapers Over Dems t shirts Sweatshirt
Trump Diapers Over Dems t shirts Sweatshirt
 
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
Yerawada ] Independent Escorts in Pune - Book 8005736733 Call Girls Available...
 
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men  🔝mehsana🔝   Escorts...
➥🔝 7737669865 🔝▻ mehsana Call-girls in Women Seeking Men 🔝mehsana🔝 Escorts...
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
20240510 QFM016 Irresponsible AI Reading List April 2024.pdf
 
Al Barsha Night Partner +0567686026 Call Girls Dubai
Al Barsha Night Partner +0567686026 Call Girls  DubaiAl Barsha Night Partner +0567686026 Call Girls  Dubai
Al Barsha Night Partner +0567686026 Call Girls Dubai
 
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
2nd Solid Symposium: Solid Pods vs Personal Knowledge Graphs
 
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
Wagholi & High Class Call Girls Pune Neha 8005736733 | 100% Gennuine High Cla...
 
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Himatnagar 7001035870 Whatsapp Number, 24/07 Booking
 
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...Top Rated  Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
Top Rated Pune Call Girls Daund ⟟ 6297143586 ⟟ Call Me For Genuine Sex Servi...
 
APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53APNIC Updates presented by Paul Wilson at ARIN 53
APNIC Updates presented by Paul Wilson at ARIN 53
 
Real Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirtReal Men Wear Diapers T Shirts sweatshirt
Real Men Wear Diapers T Shirts sweatshirt
 

OVHcloud Tech Talks S01E09 - OVHcloud Data Processing : Le nouveau service pour valoriser vos données à l'aide du cloud

  • 1. 1
  • 2. 2 OVHcloud Data Processing : Le nouveau service pour valoriser vos données à l'aide du cloud BELLLION Bastien 2020-06-25
  • 3. 3 Bastien BELLION • Data Convergence team • bastien.bellion@ovhcloud.com • @BastienBellion Data Convergence : • Analytics Data Platform • Data Processing
  • 4. 4 Content • Big Data • Cluster Computing • Apache Spark • Exemple 1: Lyrics Wordcount • OVHcloud Data Processing platform • Exemple 2: New-York Taxis • Roadmap
  • 5. 5 Why do we need to process data? • Invent new products • Make better decisions • Prevent danger and fraud • Forecasting events
  • 6. 6 Flood of data • 90 percent of data in the world has been generated in last 2 years. • Autonomouscars: 20TB /hour • 500 million tweets per day. • 75 billion IoT devices by 2025 • Data exhaust
  • 7. 7 Where is the data stored?
  • 9. 9 Selecting the right tool • Open source • Biggest open source community in big data • Fast • Java/Scala/Python/R • Cluster Big data tools – Google trend graph
  • 10. 10 Who is using Spark ? https://databricks.com/fr/session/netflixs-recommendation-ml- pipeline-using-apache-spark https://labs.criteo.com/2018/01/spark-out-of-memory/
  • 12. 12 Spark Spark Cluster Job 1 Job 2 Job 3 spark-submit --class org.apache.spark.examples.SparkPi --driver-memory 8G --master local[2] --executor-memory 8G --executor-cores 2 --num-executors 3 spark-examples.jar
  • 13. 13 Example 1 : Lyrics Wordcount https://github.com/walkerkq/musiclyrics Billboard hot 100 between 1965-2015 5100 song with for each: • Rank of the year • Song name • Artist • Year • Lyrics • Source Full written guide : https://docs.ovh.com/gb/en/data-processing/wordcount-spark/
  • 14. 14 Example 1 : Lyrics Wordcount 1/3 of dataset, 1700 songs 1/3 of dataset, 1700 songs 1/3 of dataset, 1700 songs Lyrics.csv
  • 15. 15 Example 1 : Lyrics Wordcount Volume of processed data Time elapsed one computer Time elapsed spark cluster 7.59 Mb 0,5 s 60 s 759 Mb 5,1 s 100 s 7.59 Gb 48 s 186 s 76 Gb 490 s 358 s 760 Gb 4 820 s / 80 minutes 2 250 s / 37 minutes Work computer 4 cpu cores, 8 threads 16 Gb of RAM OVHcloud instances 10 workers 2 cpu cores per workers 8 Gb of RAM per workers
  • 16. 16 Manual Spark cluster problems • Take time and knowledge to setup • Cluster Management Spark version 2.2.1 125 Cores , 500GB RAM GRA Job 1 Job 2 Job 3 Spark 2.2.1 100GB RAM, 25 Cores GRA Spark 2.2.1 150GB RAM, 37 Cores GRA Spark 2.2.1 100GB RAM, 25 Cores GRA
  • 17. 17 Manual Spark cluster problems • Take time and knowledge to setup • Cluster Management • Spark version bounded Spark version 2.2.1 125 Cores , 500GB RAM GRA Job 1 Job 2 Job 3 Spark 2.2.1 100GB RAM, 25 Cores GRA Spark 2.2.1 150GB RAM, 37 Cores GRA Spark 2.2.1 100GB RAM, 25 Cores GRA Job 4 Spark 3.0.0 50GB RAM, 10 Cores GRA
  • 18. 18 Manual Spark cluster problems • Take time and knowledge to setup • Cluster Management • Spark version bounded • Limited ressources Spark version 2.2.1 125 Cores , 500GB RAM GRA Job 1 Job 2 Job 3 Spark 2.2.1 100GB RAM, 25 Cores GRA Spark 2.2.1 150GB RAM, 37 Cores GRA Spark 2.2.1 100GB RAM, 25 Cores GRA Job 4 Spark 2.2.1 200GB RAM, 50 Cores GRA
  • 19. 19 Manual Spark cluster problems • Take time and knowledge to setup • Cluster Management • Spark version bounded • Limited ressources • Region bounded Spark version 2.2.1 125 Cores , 500GB RAM GRA Job 1 Job 2 Job 3 Spark 2.2.1 100GB RAM, 25 Cores GRA Spark 2.2.1 150GB RAM, 37 Cores GRA Spark 2.2.1 100GB RAM, 25 Cores GRA Job 4 Spark 2.2.1 50GB RAM, 10 Cores BHS
  • 20. 20 Manual Spark cluster problems Job 1 Job 2 Job 3 Spark 2.4.5 4 nodes, 100GB RAM GRA Spark 2.4.3 5 nodes, 200GB RAM BHS Spark 3 2 nodes, 50GB RAM GRA • No Cluster Management • Start a cluster in seconds • Dedicated resources • No Unused resources • Any Spark version • Any region Job 4 Spark 3.0.0 200GB RAM, 50 Cores BHS
  • 21. 21 OVHcloud Data Processing A single cluster per job To focus on what matters Be in the cloud To scale up and down quickly Bin different regions Data policy compliance and be close to data Distributed Computing Technology To distribute processing task to several computers automatically
  • 22. 22 OVHcloud Data Processing A single cluster per job To focus on what matters Be in the cloud To scale up and down quickly Bin different regions Data policy compliance and be close to data Distributed Computing Technology To distribute processing task to several computers automatically OVHcloud Data Processing
  • 28. 28 OVHcloud Data Processing Supported languages : • Java 8 • Scala 2.12 • Python v2.7+ and v3.4+ Available regions : • Europe • North America (Soon) Engine: • Apache Spark 2.4.3 • Apache Spark 3.0.0 (Soon) Job size : • Up to 60 GB of RAM per Executor • Up to 16 Cores of CPU per Executor • Up to 10 Executors per job
  • 29. 29 1 Via Control Panel 2 Via API OVHcloud 3 Via CLI (Spark-Submit) $ ./ovh-spark-submit --project-id yourProjectId --upload ./spark-examples.jar --class org.apache.spark.examples.SparkPi --driver-cores 1 --driver-memory 4G --executor-cores 1 --executor-memory 4G --num-executors 1 swift://odp/spark-examples.jar 1000 • Auto upload your application code • Open source https://github.com/ovh/data-processing-spark-submit
  • 30. 30 Example 2 : New-YorkTaxis • Date • Number of people per ride • Price • Tips • Pickup and dropoff zone • 10 years worth of taxis ride • +100 Go of data
  • 31. 31 Example 2 : New-YorkTaxis First step: make the data accessible Public Cloud Object Storage
  • 32. 32 Example 2 : New-YorkTaxis Second step: submit your job Data Processing Start a new job
  • 33. 33 Example 2 : New-YorkTaxis Second step: submit your job
  • 34. 34 Example 2 : New-YorkTaxis Second step: submit your job
  • 35. 35 Example 2 : New-YorkTaxis Second step: submit your job
  • 36. 36 Example 2 : New-YorkTaxis . . . .. .. . . Bill I submit 1job, with • 1 driver ( 4 vCore / 15GB RAM) • 5 executors( 5 x 8vCore / 30GB RAM) I alsoused OVHcloud Object Storage Job lasted15 minutes Payment: (44 vCores + 165 GB RAM ) for15 minutes = 1,33€ + classicpriceof Object Storage(storageanddatatransfer)
  • 37. 37 Roadmap Development Lab GA Soon ETA = July 2020Launched (April) Free ! Full working: • Product itself • Control panel/API/CLI • Documentation Added: • Bug fix • Billing S2 2020 Backlog: • More Regions • More flavors/prices • More Engines • More Spark versions