SlideShare une entreprise Scribd logo
1  sur  51
1© OCTO 2013
Big Data and Machine Learning
Mathieu DESPRIEE
mde@octo.com
twitter : @mdeocto
2
3
What a buzzword !!!
Google trends on “big data”
Gartner hype cycle
2012
4
WEB
Google, Amazon,
Facebook, Twitter,
…
IT Vendors
IBM, Teradata,
Vmware, EMC,
…
Management
McKinsey, BCG,
Gartner, …
Web giants gave some reality to a concept anticipated by Gartner.
This software evolution didn’t come from traditional software vendors
(which is quite unusual)
Origins of Big Data
Web giants implement BigData
solutions for their owns needs
Vendors are followers in
this movement. They try to
take a hold on this very
promising business
Consulting firms predicted a
big economic change, and
Big Data is part of it
5
Origins of Big Data
6
There’s no clear definition of Big Data
It is altogether a business ambition and many technological opportunities
Is there a clear definition ?
Super
datawarehouse?
Low cost storage
?
NoSQL?
Cloud?
Internet
Intelligence?
Real-time
analysis ?
Unstructured
data?
Open Data?
Big
databases?
7
data
deluge !
8
VOLUME
VELOCITY
VARIETY
9
Volume
Variety
Velocity
Day
Hour
Second
Real time
MB TBGB PBFile
Structured
Social
networks
API
Text
Video
Web
Audio
10
Data we traditionally
manipulate
(customers, product catalog…)
Innovation is here !
Data and Innovation
11
NEW
USAGES
NEW
SERVICES
NEW
IT SYSTEMS
12
Big Data aims at getting an
economical advantage
from the quantitative analysis of
internal and external data
Big Data : proposed definition
13
Some real use-cases
studied with OCTO
Telecom
• Analyze behavior of
customers (calls to service
center, opinion about the
brand on social networks
…) to identify a risk of
churn
• Analyze the huge amount
of data quality metrics from
network infrastructure in
real-time to proactively
inform the call-center about
network quality of service
Insurance
• Crawl the web (especially
forums) to identify
correlation between
damages, and center of
interests in communities
(health, household
insurance, car
insurance…)
• Improve datamining
models, and risk models
e-Commerce
• Analyze weblogs and
customer reviews to
improve product
recommendation
• Analyze data from call-
center (calls, emails) to
improve customer loyalty
14
15
Machine
Learning
16
« Machine Learning » is not new. A first definition of it was given in 1959 :
Field of study that gives computers the ability to
learn without being explicitly programmed
Arthur Samuel
1959
A computer program is said to learn from
experience E with respect to some class of
tasks T and performance measure P, if its
performance at tasks in T, as measured by P,
improves with experience ETom Mitchell
1998
Machine Learning : a definition
We prefer this definition, more recent, and more precise :
17
A computer program
is said to learn from
experience E with
respect to some
class of tasks T and
performance
measure P, if its
performance at
tasks in T, as
measured by P,
improves with
experience E
Example with a SPAM classifier
I tag some of my emails
into ‘spam’ or not
Ratio of emails correctly classified
automatically
The classifier put incoming emails
in ‘spam’ or not
SPAM Classifier
18
A Machine Learning approach works only if 3 conditions are fulfilled
What’s new with Big Data in Machine Learning ?
Some « pattern » exist in data
You have a lot of data. A LOT.
(millions of samples)
There’s no analytical model to describe it
(= it’s a probabilistic problem)
A Big Data approach allows us to collect and manipulate much more data.
Machine Learning is a fundamental tool to leverage this huge amount of information
1
2
3
Machine Learning algorithms exist
since many years to address these
In the past, performance of ML models
was often limited by the lack of
available data.
Now we can collect and manipulate
much more
19
Let’s imagine we want to predict if a customer of a telecom operator will churn
(go to a concurrent)
We will build a classifier, and start by building a learning set
For each customer, we collect a finite number of data, named attributes
Customer offer / plan
Customer data (region, age, sex, …)
Last 12 bills amount
Number of calls to call-center last 6 months
Amount of local calls of last 12 months
Amount of international calls of last 12 months
Amount of downloaded data
etc.
And for each customer in the training set, we know if the customer churned or
not. It’s the tag.
Machine Learning example : classification
20
Logistic regression classifier
45.72
34.21
23.55
46.12
12.45
45.90
41.79
39.80
17.59
9.45
…
84.23
21.43
50.64
32.76
5.42
32.11
21.43
50.64
32.76
4.13
81.23
19.71
13.83
5.44
…
4.21
19.24
32.34
56.16
1
0
0
2
0
0
0
5
0
0
…
0
2
0
0
380
412
310
365
367
450
515
290
340
420
…
410
504
554
650
404
491
148
323
385
581
649
434
219
439
…
283
425
535
701
0
1
0
1
0
0
0
1
0
0
…
0
0
0
1
Bill 1 Bill 2 Bill k Calls to
call-center
Avg
Data
Xn
X θ1
θ2
θ3
θ4
θ5
θ6
θ7
θ8
θ9
θ10
…
θm-1
θm
θ Y
…
…
Sex
M
M
M
F
M
F
F
F
M
M
…
F
F
F
M
…
…
learning
samples :
customers
learning
samples :
TAG
“churn” or not
n attributes per
customer
parameters of
the model
to compute
=
learning output
fn()
sigmoïd function
to have a binary
output
21
The θ vector is computed during the training phase
When the θ vector is computed, our classification model is ready
Then we test this model against other values for X (the test set), and we check if
our model is good at predicting the output value y. We talk about robustness of
the model = its capacity to generalize the prediction.
The challenge is to get a reasonable error ratio, and not to “overfit” the algorithm to the
training sample (it will predict nothing)
In general, 80% of your whole data set are used for training, and 20% for testing
Machine Learning example : classification
* C’est souvent 60%/20%/20% pour effectuer une
étape de validation du modèle
22
Supervised learning
Data is tagged : we know if the customer is a churner or not for the training phase
Positives (churners) are abundant enough in sample to identify the typical churner
For some use-cases, the tagging may require the help of an expert to prepare the
training set. Expertise is needed before machine learning.
The challenge is about the generalization of the model
Unsupervised learning
We don’t know output values (the Y vector). We don’t know the number of tags, nor
their nature
Some of the attributes are not homogeneous amongst all the samples in X
The algorithm will group inputs xi by similarities (creating clusters)
The expertise is needed after machine learning, to interpret the results, and name the
discovered categories
The challenge is about understanding the output classification
Different strategies in categorisation
??
23
Draw a line (hyperplane) that divide points in space, into 2 classes
Find a line with the best margin (good distance from points to the line)
Try to minimize the error (points on the bad side)
Example of supervised algorithm : Support Vector Machine
If distribution is fundamentaly non-
linearly separable, algorithms exist to
transform the data to higher
dimension, and make it linearly
separable.
24
Example of unsupervised algorithm : K-Means clustering
Choose k points randomly in space
(the seeds)
Until convergence
Assign each input point to nearest seed to
form clusters
Compute the center of gravity of clusters,
and use these points as new seeds
25
Dimensionality reduction
Example : product recommendation engine
N customers x P products
(ci, pj) = 1 if customer i bought product j
Very big and sparse matrix
Each customer is a point in a space having a
big number of dimensions
Idea : find a way to group products and
reduce dimensions of this space
Others algorithms
0
0
0
0
1
0
0
…
0
0
1
0
0
1 M products
10Mcustomers
P1 P2 Pn
…
…
0
1
0
0
0
0
0
…
0
0
0
0
0
0
0
0
0
0
0
1
…
0
0
0
0
0
Quantity prediction
Linear regression : The oldest and most known algorithm
26
Many algorithms to use, depending on the situation !
27
© OCTO 2013© OCTO 2012© OCTO 2013
TECHNOLOGY
28
1956 : 50 k$ for a 5 MB IBM hard-drive… today : 20 € for a 8 GB microSD !
29
Exponential growth of capacities
CPU, memory, network bandwith, storage … all of them followed the Moore’s law
Source :
http://strata.oreilly.com/2011/08/building-data-startups.html
30
The old strategy : Scale-up
0.01
0.10
1.00
10.00
100.00
1,000.00
10,000.00
100,000.00
1,000,000.00
1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015
100k $/GB
0,10 $/GB
HDD
RAM
The old way :
If you have too much data, just wait a few months that the cost decrease,
and then scale-up your infrastructure
Source :
http://www.mkomo.com/cost-per-gigabyte
31
© OCTO 2013
BUT…
32
0
10
20
30
40
50
60
70
MB/s
1990 2010
64 MB/s
0,7 MB/s
Seagate
Barracuda
7200.10
Seagate
Barracuda
ATA IV
IBM DTTA
35010
x 100’000 x 91
Storage capacity Throughtput
We can store 100’000 times more data, but it takes 1000 times longer to read it !
33
Limitations of traditional architectures
Over 10 Tb, « classical »
architectures requires huge
software and hardware
adaptations.
Over 1 000 transactions /
second, « classical »
architectures requires huge
software and hardware
adaptations.
Over 10 threads/Core CPU,
sequential programming reach
its limits (IO).
Over 1 000 events / second,
« classical » architectures
requires huge software and
hardware adaptations.
Distributed
storage
Share
nothing
XTP
Parallel
processing
Event Stream
Processing
« Traditional »
architectures
RDBMS,
Application server,
ETL, ESB
Event flow oriented
application
(streaming)
Transaction oriented
applications
(TPS)
Storage oriented
applications
(IO bound)
Computation
oriented applications
(CPU bound)
34
Big Data = explosion of volumes :
data to store online
processing to parallelize
number of transactions per second to handle
number of messages per second to process
+
New constraints
New types of data (unstructured, semi-structured…)
Distribution of storage and processing
Cost reduction
Need of elasticity
=
New technologies
Horizontal scalability and clustering
Data partitioning / sharding
Parallel processing
In-memory processing
New Architectures
35
Some emerging solutions
Event flow oriented
application
(streaming)
Transaction oriented
applications
(TPS)
Storage oriented
applications
(IO bound)
Computation
oriented applications
(CPU bound)
Cassandra, Mong
oDB, CouchDB
HDFS
SQLFire
Teradata
Hana
Grid Computing
Giga Spaces
Map Reduce
GPU
Voldemort
Exadata
HBase
Esper
Quartet
ActivePivot
Sqoop
RabbitMQ,
Zero MQ
Ultra Low
latency
Hama
Igraph
MapR
EMC
Redis
ExalyticsIn memory
Distributed
« Traditional »
architectures
RDBMS,
Application server,
ETL, ESB
36
Event flow oriented
application
(streaming)
Transaction oriented
applications
(TPS)
Storage oriented
applications
(IO bound)
Computation
oriented applications
(CPU bound)
NoSQL
NewSQL
NoSQL : ditributed non-
relational stores,
NewSQL : SQL compliant
distributed stores
Streaming
CEP - Complex Event Processing,
ESP - Event Stream Processing
Grid -
GPU
Grid computing on
CPU, or on GPU
In-memory analytics solutions
distribute the data in the
memory of several nodes to
obtain a low processing time.
In-memory
analytics
Hadoop
The Hadoop ecosystem offers
a distributed storage, but also
distributed computing using
MapReduce.
Emerging families
37
38
Hadoop : a reference in the Big Data landscape
• Apache Hadoop
Open Source
• Cloudera CDH
• Hortonworks
• MapR
• DataStax (Brisk)
Main distributions
• Greenplum (EMC)
• IBM InfoSphere BigInsights (CDH)
• Oracle Big data appliance (CDH)
• NetApp Analytics (CDH)
• …
Commercial
• Amazon EMR (MapR)
• VirtualScale (CDH)
Cloud
39
Key principles
File storage more voluminous than a single disk
Data distributed on several nodes
Data replication to ensure « fail-over », with « rack awareness »
Use of commodity disk instead of SAN
Hadoop Distributed File System (HDFS)
40
Key principles
Parallelise and distribute processing
Quicker processing of smaller data volumes (unitary)
Co-location of processing and data
Hadoop distributed processing : Map Reduce
41
Overview of Hadoop architecture
Distributed Storage
Distributed Processing
Querying
Advanced
processing
Orchestration
Integrationw/
InformationSystem
MonitoringandManagement
42
Available tools in a typical distribution (CDH)
HDFS
MapReduce
YARN (v2)
Pig
Cascading
Hive
Oozie
Azkaban
Mahout
HAMA
Giraph
Sqoop
Flume
Scribe
Chukwa
CLI
Web
Console
Hue
Cloudera
Manager
HBase
Impala
43
Hadoop : a blooming ecosystem !!
Processing
Hadoop Distributed
Storage
Distributed FS Local FS NoSQL datastores
GlusterFS HDFS S3 CephCassandra RingDynamoDB
OLAP OLTP
Machine
Learning
HBase Impala Hawq Map Reduce /
Tez
Map
Reduce /
Tez
R, Python,…
MahoutStreaming Cascading
R, Python,…
Hive Pig StreamingCascading
Spark Spark
Openstack
SwiftIsilon
Scalding
Giraph Hama
SciKit
Stinger
MapR
Lots of annoucements and new tools appearing every day …
Maturity is very variable from one tool to another
44
Maturities of solutions in the Hadoop ecosystem are very
heterogeneous
Ex : HDFS and MapReduce are perfectly production ready
Yahoo manages a peta-byte scale HDFS cluster
But some tools around are still poor : especially admin and debug tools
Ex : Impala (real-time querying, with SQL-compliant queries) is not
production-ready
Ex : Adaptation of machine learning libraries to distributed computation
with MarReduce is on-going
Apache Mahout has MapReduce compliant algorithms
MapReduce libraries for R are quite young
Maturity of tools
45
Hadoop is a rich and quite new technology, difficult to master
Get trained, bring experts in your project !
46
WRAP-UP
47
Big Data aims at getting an
economical advantage
from the quantitative analysis of
internal and external data
48
Data we traditionally
manipulate
(customers, product catalog…)
Innovation is here !
49
Since many years, we use Machine Learning algorithms to find patterns in data
Big Data technologies now allow us to manipulate much more data, and get
more value with Machine Learning techniques
Machine Learning + Big Data
Linear regression
Neural network
50
Hadoop : a reference in the Big Data technology landscap
But with a very effervescent ecosystem.
It’s hard to follow all the trends and evolutions without a dedicated RnD team.
Don’t do this alone, get trained, and bring experts in your project
Hadoop
51
Mathieu DESPRIEE
mde@octo.com
twitter : @mdeocto

Contenu connexe

Tendances

Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data Vaibhav Kurkute
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEuropean Data Forum
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...Simplilearn
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine LearningCorey Chivers
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningNik Spirin
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningRaveen Perera
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI dayMohammed Barakat
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDomino Data Lab
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —swethaT16
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data scienceSampath Kumar
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data ScientistNarong Intiruk
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and AnalyticsSrinath Perera
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Data Science Thailand
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientistryanorban
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesRukshan Batuwita
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentationDavid Raj Kanthi
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsXavier Amatriain
 

Tendances (20)

Machine Learning using Big data
Machine Learning using Big data Machine Learning using Big data
Machine Learning using Big data
 
EDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko GrobelnikEDF2013: Big Data Tutorial: Marko Grobelnik
EDF2013: Big Data Tutorial: Marko Grobelnik
 
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...What Is Data Science? | Introduction to Data Science | Data Science For Begin...
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
 
Intro to Machine Learning
Intro to Machine LearningIntro to Machine Learning
Intro to Machine Learning
 
Introduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine LearningIntroduction to Data Science and Large-scale Machine Learning
Introduction to Data Science and Large-scale Machine Learning
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Data science presentation 2nd CI day
Data science presentation 2nd CI dayData science presentation 2nd CI day
Data science presentation 2nd CI day
 
Deep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up SeattleDeep Learning Use Cases - Data Science Pop-up Seattle
Deep Learning Use Cases - Data Science Pop-up Seattle
 
1. introduction to data science —
1. introduction to data science —1. introduction to data science —
1. introduction to data science —
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Lecture #01
Lecture #01Lecture #01
Lecture #01
 
Begin with Data Scientist
Begin with Data ScientistBegin with Data Scientist
Begin with Data Scientist
 
Introduction to Data Science and Analytics
Introduction to Data Science and AnalyticsIntroduction to Data Science and Analytics
Introduction to Data Science and Analytics
 
Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)Introduction to Data Science (Data Science Thailand Meetup #1)
Introduction to Data Science (Data Science Thailand Meetup #1)
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Big Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our LivesBig Data and Data Science: The Technologies Shaping Our Lives
Big Data and Data Science: The Technologies Shaping Our Lives
 
Machine learning with Big Data power point presentation
Machine learning with Big Data power point presentationMachine learning with Big Data power point presentation
Machine learning with Big Data power point presentation
 
Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)Machine Learning Algorithms (Part 1)
Machine Learning Algorithms (Part 1)
 
Lessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systemsLessons learned from building practical deep learning systems
Lessons learned from building practical deep learning systems
 

En vedette

EiTESAL IOT DAY 26-10-2016
EiTESAL IOT DAY 26-10-2016EiTESAL IOT DAY 26-10-2016
EiTESAL IOT DAY 26-10-2016EITESANGO
 
Tug Boat Loading in Singapore
Tug Boat Loading in SingaporeTug Boat Loading in Singapore
Tug Boat Loading in Singaporeravsinha
 
Research Vessel Data Management
Research Vessel Data ManagementResearch Vessel Data Management
Research Vessel Data ManagementAdam Leadbetter
 
Tugboat towboat tours 2015
Tugboat towboat tours 2015 Tugboat towboat tours 2015
Tugboat towboat tours 2015 Katie Miller
 
Marine salvage and the protection of the marine environment
Marine salvage and the protection of the marine environmentMarine salvage and the protection of the marine environment
Marine salvage and the protection of the marine environmentTiago Zanella
 
SQLearn - Vessel Learning Management System
SQLearn - Vessel Learning Management SystemSQLearn - Vessel Learning Management System
SQLearn - Vessel Learning Management SystemSQLearn
 
Real time ship tracking system using ais data
Real time ship tracking system using ais dataReal time ship tracking system using ais data
Real time ship tracking system using ais dataChathura
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesGalit Shmueli
 
Bab II Procedure Loading dengan applikasi Casp 6.1
Bab II  Procedure Loading dengan applikasi Casp 6.1Bab II  Procedure Loading dengan applikasi Casp 6.1
Bab II Procedure Loading dengan applikasi Casp 6.1Capt. Persobi Waldemar
 
Vessel tracking, coastal surveillance and other navigational aids, particular...
Vessel tracking, coastal surveillance and other navigational aids, particular...Vessel tracking, coastal surveillance and other navigational aids, particular...
Vessel tracking, coastal surveillance and other navigational aids, particular...Railways and Harbours
 
Vessel Monitoring System
Vessel Monitoring SystemVessel Monitoring System
Vessel Monitoring SystemAdi Wicaksono
 
Vessel Traffic Management System
Vessel Traffic Management SystemVessel Traffic Management System
Vessel Traffic Management SystemRolta
 
VESSEL TRAFFIC MANAGEMENT SYSTEM
VESSEL TRAFFIC MANAGEMENT SYSTEMVESSEL TRAFFIC MANAGEMENT SYSTEM
VESSEL TRAFFIC MANAGEMENT SYSTEMSHAILESH SHUKLA
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerDatameer
 
Cursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningCursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningStratebi
 

En vedette (20)

EiTESAL IOT DAY 26-10-2016
EiTESAL IOT DAY 26-10-2016EiTESAL IOT DAY 26-10-2016
EiTESAL IOT DAY 26-10-2016
 
Tug Boat Loading in Singapore
Tug Boat Loading in SingaporeTug Boat Loading in Singapore
Tug Boat Loading in Singapore
 
Oceans of Linked Data?
Oceans of Linked Data?Oceans of Linked Data?
Oceans of Linked Data?
 
Research Vessel Data Management
Research Vessel Data ManagementResearch Vessel Data Management
Research Vessel Data Management
 
Towage
TowageTowage
Towage
 
Tugboat towboat tours 2015
Tugboat towboat tours 2015 Tugboat towboat tours 2015
Tugboat towboat tours 2015
 
20 slides canadá tecnologia portos 05 mai 2015
20 slides  canadá tecnologia portos  05 mai 201520 slides  canadá tecnologia portos  05 mai 2015
20 slides canadá tecnologia portos 05 mai 2015
 
Marine salvage and the protection of the marine environment
Marine salvage and the protection of the marine environmentMarine salvage and the protection of the marine environment
Marine salvage and the protection of the marine environment
 
SQLearn - Vessel Learning Management System
SQLearn - Vessel Learning Management SystemSQLearn - Vessel Learning Management System
SQLearn - Vessel Learning Management System
 
Pilotage lrg
Pilotage lrgPilotage lrg
Pilotage lrg
 
Real time ship tracking system using ais data
Real time ship tracking system using ais dataReal time ship tracking system using ais data
Real time ship tracking system using ais data
 
Big Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative IndustriesBig Data & Analytics in the Digital Creative Industries
Big Data & Analytics in the Digital Creative Industries
 
Bab II Procedure Loading dengan applikasi Casp 6.1
Bab II  Procedure Loading dengan applikasi Casp 6.1Bab II  Procedure Loading dengan applikasi Casp 6.1
Bab II Procedure Loading dengan applikasi Casp 6.1
 
Ship planning part i
Ship planning part iShip planning part i
Ship planning part i
 
Vessel tracking, coastal surveillance and other navigational aids, particular...
Vessel tracking, coastal surveillance and other navigational aids, particular...Vessel tracking, coastal surveillance and other navigational aids, particular...
Vessel tracking, coastal surveillance and other navigational aids, particular...
 
Vessel Monitoring System
Vessel Monitoring SystemVessel Monitoring System
Vessel Monitoring System
 
Vessel Traffic Management System
Vessel Traffic Management SystemVessel Traffic Management System
Vessel Traffic Management System
 
VESSEL TRAFFIC MANAGEMENT SYSTEM
VESSEL TRAFFIC MANAGEMENT SYSTEMVESSEL TRAFFIC MANAGEMENT SYSTEM
VESSEL TRAFFIC MANAGEMENT SYSTEM
 
Best Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by DatameerBest Practices for Big Data Analytics with Machine Learning by Datameer
Best Practices for Big Data Analytics with Machine Learning by Datameer
 
Cursos de Big Data y Machine Learning
Cursos de Big Data y Machine LearningCursos de Big Data y Machine Learning
Cursos de Big Data y Machine Learning
 

Similaire à Big Data & Machine Learning - TDC2013 Sao Paulo

Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystifiedMarc Moreau
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsArpana Awasthi
 
DSCI 552 machine learning for data science
DSCI 552 machine learning for data scienceDSCI 552 machine learning for data science
DSCI 552 machine learning for data sciencepavithrak2205
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesCodePolitan
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxVenkateswaraBabuRavi
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...Edge AI and Vision Alliance
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer universityLászló Kovács
 
Ntegra 20180523 v10 copy.pptx
Ntegra 20180523 v10 copy.pptxNtegra 20180523 v10 copy.pptx
Ntegra 20180523 v10 copy.pptxISSIP
 
Machine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual WorkshopMachine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual WorkshopCCG
 
Machine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMachine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMATLABISRAEL
 
Eick/Alpaydin Introduction
Eick/Alpaydin IntroductionEick/Alpaydin Introduction
Eick/Alpaydin Introductionbutest
 
Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015lbishal
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIPramit Choudhary
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarJessica Willis
 

Similaire à Big Data & Machine Learning - TDC2013 Sao Paulo (20)

Analytics demystified
Analytics demystifiedAnalytics demystified
Analytics demystified
 
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its ApplicationsMachine Learning: Need of Machine Learning, Its Challenges and its Applications
Machine Learning: Need of Machine Learning, Its Challenges and its Applications
 
LR2. Summary Day 2
LR2. Summary Day 2LR2. Summary Day 2
LR2. Summary Day 2
 
DSCI 552 machine learning for data science
DSCI 552 machine learning for data scienceDSCI 552 machine learning for data science
DSCI 552 machine learning for data science
 
A leap around AI
A leap around AIA leap around AI
A leap around AI
 
Machine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & OpportunitiesMachine Learning - Challenges, Learnings & Opportunities
Machine Learning - Challenges, Learnings & Opportunities
 
Debugging AI
Debugging AIDebugging AI
Debugging AI
 
Industrial Machine Learning
Industrial Machine LearningIndustrial Machine Learning
Industrial Machine Learning
 
Machine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptxMachine learning ppt unit one syllabuspptx
Machine learning ppt unit one syllabuspptx
 
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
“Introducing Machine Learning and How to Teach Machines to See,” a Presentati...
 
Big Data
Big DataBig Data
Big Data
 
Machine learning at b.e.s.t. summer university
Machine learning  at b.e.s.t. summer universityMachine learning  at b.e.s.t. summer university
Machine learning at b.e.s.t. summer university
 
Ntegra 20180523 v10 copy.pptx
Ntegra 20180523 v10 copy.pptxNtegra 20180523 v10 copy.pptx
Ntegra 20180523 v10 copy.pptx
 
Machine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual WorkshopMachine Learning with Azure and Databricks Virtual Workshop
Machine Learning with Azure and Databricks Virtual Workshop
 
Machine learning for sensor Data Analytics
Machine learning for sensor Data AnalyticsMachine learning for sensor Data Analytics
Machine learning for sensor Data Analytics
 
Eick/Alpaydin Introduction
Eick/Alpaydin IntroductionEick/Alpaydin Introduction
Eick/Alpaydin Introduction
 
Meetup 29042015
Meetup 29042015Meetup 29042015
Meetup 29042015
 
Human in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AIHuman in the loop: Bayesian Rules Enabling Explainable AI
Human in the loop: Bayesian Rules Enabling Explainable AI
 
Data Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit JaokarData Science for Internet of Things with Ajit Jaokar
Data Science for Internet of Things with Ajit Jaokar
 
Ajit jaokar slides
Ajit jaokar slidesAjit jaokar slides
Ajit jaokar slides
 

Plus de OCTO Technology

Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloudLe Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloudOCTO Technology
 
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...OCTO Technology
 
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...OCTO Technology
 
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...OCTO Technology
 
OCTO Talks - Les IA s'invitent au chevet des développeurs
OCTO Talks - Les IA s'invitent au chevet des développeursOCTO Talks - Les IA s'invitent au chevet des développeurs
OCTO Talks - Les IA s'invitent au chevet des développeursOCTO Technology
 
OCTO Talks - Lancement du livre Culture Test
OCTO Talks - Lancement du livre Culture TestOCTO Talks - Lancement du livre Culture Test
OCTO Talks - Lancement du livre Culture TestOCTO Technology
 
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...OCTO Technology
 
OCTO Talks - State of the art Architecture dans les frontend web
OCTO Talks - State of the art Architecture dans les frontend webOCTO Talks - State of the art Architecture dans les frontend web
OCTO Talks - State of the art Architecture dans les frontend webOCTO Technology
 
Comptoir OCTO ALD Automotive/Leaseplan
Comptoir OCTO ALD Automotive/LeaseplanComptoir OCTO ALD Automotive/Leaseplan
Comptoir OCTO ALD Automotive/LeaseplanOCTO Technology
 
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ? Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ? OCTO Technology
 
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...OCTO Technology
 
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...
Le Comptoir OCTO -  Affinez vos forecasts avec la planification distribuée et...Le Comptoir OCTO -  Affinez vos forecasts avec la planification distribuée et...
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...OCTO Technology
 
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conceptionLe Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conceptionOCTO Technology
 
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...OCTO Technology
 
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone :  les solutions E...Le Comptoir OCTO - L'avenir de la gestion du bilan carbone :  les solutions E...
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...OCTO Technology
 
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...OCTO Technology
 
RefCard Tests sur tous les fronts
RefCard Tests sur tous les frontsRefCard Tests sur tous les fronts
RefCard Tests sur tous les frontsOCTO Technology
 
RefCard RESTful API Design
RefCard RESTful API DesignRefCard RESTful API Design
RefCard RESTful API DesignOCTO Technology
 
RefCard API Architecture Strategy
RefCard API Architecture StrategyRefCard API Architecture Strategy
RefCard API Architecture StrategyOCTO Technology
 

Plus de OCTO Technology (20)

Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloudLe Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
Le Comptoir OCTO - MLOps : Les patterns MLOps dans le cloud
 
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
La Grosse Conf 2024 - Philippe Stepniewski -Atelier - Live coding d'une base ...
 
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
La Grosse Conf 2024 - Philippe Prados - Atelier - RAG : au-delà de la démonst...
 
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
Le Comptoir OCTO - Maîtriser le RAG : connecter les modèles d’IA génératives ...
 
OCTO Talks - Les IA s'invitent au chevet des développeurs
OCTO Talks - Les IA s'invitent au chevet des développeursOCTO Talks - Les IA s'invitent au chevet des développeurs
OCTO Talks - Les IA s'invitent au chevet des développeurs
 
OCTO Talks - Lancement du livre Culture Test
OCTO Talks - Lancement du livre Culture TestOCTO Talks - Lancement du livre Culture Test
OCTO Talks - Lancement du livre Culture Test
 
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
Le Comptoir OCTO - Green AI, comment éviter que votre votre potion magique d’...
 
OCTO Talks - State of the art Architecture dans les frontend web
OCTO Talks - State of the art Architecture dans les frontend webOCTO Talks - State of the art Architecture dans les frontend web
OCTO Talks - State of the art Architecture dans les frontend web
 
Refcard GraphQL
Refcard GraphQLRefcard GraphQL
Refcard GraphQL
 
Comptoir OCTO ALD Automotive/Leaseplan
Comptoir OCTO ALD Automotive/LeaseplanComptoir OCTO ALD Automotive/Leaseplan
Comptoir OCTO ALD Automotive/Leaseplan
 
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ? Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
Le Comptoir OCTO - Comment optimiser les stocks en linéaire par la Data ?
 
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
Le Comptoir OCTO - Retour sur 5 ans de mise en oeuvre : Comment le RGPD a réi...
 
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...
Le Comptoir OCTO -  Affinez vos forecasts avec la planification distribuée et...Le Comptoir OCTO -  Affinez vos forecasts avec la planification distribuée et...
Le Comptoir OCTO - Affinez vos forecasts avec la planification distribuée et...
 
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conceptionLe Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
Le Comptoir OCTO - La formation au cœur de la stratégie d’éco-conception
 
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
Le Comptoir OCTO - Une vision de plateforme sans leadership tech n’est qu’hal...
 
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone :  les solutions E...Le Comptoir OCTO - L'avenir de la gestion du bilan carbone :  les solutions E...
Le Comptoir OCTO - L'avenir de la gestion du bilan carbone : les solutions E...
 
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
Le Comptoir OCTO - Continuous discovery et continuous delivery pour construir...
 
RefCard Tests sur tous les fronts
RefCard Tests sur tous les frontsRefCard Tests sur tous les fronts
RefCard Tests sur tous les fronts
 
RefCard RESTful API Design
RefCard RESTful API DesignRefCard RESTful API Design
RefCard RESTful API Design
 
RefCard API Architecture Strategy
RefCard API Architecture StrategyRefCard API Architecture Strategy
RefCard API Architecture Strategy
 

Dernier

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfSeasiaInfotech2
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Manik S Magar
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 

Dernier (20)

Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdfThe Future of Software Development - Devin AI Innovative Approach.pdf
The Future of Software Development - Devin AI Innovative Approach.pdf
 
Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!Anypoint Exchange: It’s Not Just a Repo!
Anypoint Exchange: It’s Not Just a Repo!
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 

Big Data & Machine Learning - TDC2013 Sao Paulo

  • 1. 1© OCTO 2013 Big Data and Machine Learning Mathieu DESPRIEE mde@octo.com twitter : @mdeocto
  • 2. 2
  • 3. 3 What a buzzword !!! Google trends on “big data” Gartner hype cycle 2012
  • 4. 4 WEB Google, Amazon, Facebook, Twitter, … IT Vendors IBM, Teradata, Vmware, EMC, … Management McKinsey, BCG, Gartner, … Web giants gave some reality to a concept anticipated by Gartner. This software evolution didn’t come from traditional software vendors (which is quite unusual) Origins of Big Data Web giants implement BigData solutions for their owns needs Vendors are followers in this movement. They try to take a hold on this very promising business Consulting firms predicted a big economic change, and Big Data is part of it
  • 6. 6 There’s no clear definition of Big Data It is altogether a business ambition and many technological opportunities Is there a clear definition ? Super datawarehouse? Low cost storage ? NoSQL? Cloud? Internet Intelligence? Real-time analysis ? Unstructured data? Open Data? Big databases?
  • 9. 9 Volume Variety Velocity Day Hour Second Real time MB TBGB PBFile Structured Social networks API Text Video Web Audio
  • 10. 10 Data we traditionally manipulate (customers, product catalog…) Innovation is here ! Data and Innovation
  • 12. 12 Big Data aims at getting an economical advantage from the quantitative analysis of internal and external data Big Data : proposed definition
  • 13. 13 Some real use-cases studied with OCTO Telecom • Analyze behavior of customers (calls to service center, opinion about the brand on social networks …) to identify a risk of churn • Analyze the huge amount of data quality metrics from network infrastructure in real-time to proactively inform the call-center about network quality of service Insurance • Crawl the web (especially forums) to identify correlation between damages, and center of interests in communities (health, household insurance, car insurance…) • Improve datamining models, and risk models e-Commerce • Analyze weblogs and customer reviews to improve product recommendation • Analyze data from call- center (calls, emails) to improve customer loyalty
  • 14. 14
  • 16. 16 « Machine Learning » is not new. A first definition of it was given in 1959 : Field of study that gives computers the ability to learn without being explicitly programmed Arthur Samuel 1959 A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience ETom Mitchell 1998 Machine Learning : a definition We prefer this definition, more recent, and more precise :
  • 17. 17 A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E Example with a SPAM classifier I tag some of my emails into ‘spam’ or not Ratio of emails correctly classified automatically The classifier put incoming emails in ‘spam’ or not SPAM Classifier
  • 18. 18 A Machine Learning approach works only if 3 conditions are fulfilled What’s new with Big Data in Machine Learning ? Some « pattern » exist in data You have a lot of data. A LOT. (millions of samples) There’s no analytical model to describe it (= it’s a probabilistic problem) A Big Data approach allows us to collect and manipulate much more data. Machine Learning is a fundamental tool to leverage this huge amount of information 1 2 3 Machine Learning algorithms exist since many years to address these In the past, performance of ML models was often limited by the lack of available data. Now we can collect and manipulate much more
  • 19. 19 Let’s imagine we want to predict if a customer of a telecom operator will churn (go to a concurrent) We will build a classifier, and start by building a learning set For each customer, we collect a finite number of data, named attributes Customer offer / plan Customer data (region, age, sex, …) Last 12 bills amount Number of calls to call-center last 6 months Amount of local calls of last 12 months Amount of international calls of last 12 months Amount of downloaded data etc. And for each customer in the training set, we know if the customer churned or not. It’s the tag. Machine Learning example : classification
  • 20. 20 Logistic regression classifier 45.72 34.21 23.55 46.12 12.45 45.90 41.79 39.80 17.59 9.45 … 84.23 21.43 50.64 32.76 5.42 32.11 21.43 50.64 32.76 4.13 81.23 19.71 13.83 5.44 … 4.21 19.24 32.34 56.16 1 0 0 2 0 0 0 5 0 0 … 0 2 0 0 380 412 310 365 367 450 515 290 340 420 … 410 504 554 650 404 491 148 323 385 581 649 434 219 439 … 283 425 535 701 0 1 0 1 0 0 0 1 0 0 … 0 0 0 1 Bill 1 Bill 2 Bill k Calls to call-center Avg Data Xn X θ1 θ2 θ3 θ4 θ5 θ6 θ7 θ8 θ9 θ10 … θm-1 θm θ Y … … Sex M M M F M F F F M M … F F F M … … learning samples : customers learning samples : TAG “churn” or not n attributes per customer parameters of the model to compute = learning output fn() sigmoïd function to have a binary output
  • 21. 21 The θ vector is computed during the training phase When the θ vector is computed, our classification model is ready Then we test this model against other values for X (the test set), and we check if our model is good at predicting the output value y. We talk about robustness of the model = its capacity to generalize the prediction. The challenge is to get a reasonable error ratio, and not to “overfit” the algorithm to the training sample (it will predict nothing) In general, 80% of your whole data set are used for training, and 20% for testing Machine Learning example : classification * C’est souvent 60%/20%/20% pour effectuer une étape de validation du modèle
  • 22. 22 Supervised learning Data is tagged : we know if the customer is a churner or not for the training phase Positives (churners) are abundant enough in sample to identify the typical churner For some use-cases, the tagging may require the help of an expert to prepare the training set. Expertise is needed before machine learning. The challenge is about the generalization of the model Unsupervised learning We don’t know output values (the Y vector). We don’t know the number of tags, nor their nature Some of the attributes are not homogeneous amongst all the samples in X The algorithm will group inputs xi by similarities (creating clusters) The expertise is needed after machine learning, to interpret the results, and name the discovered categories The challenge is about understanding the output classification Different strategies in categorisation ??
  • 23. 23 Draw a line (hyperplane) that divide points in space, into 2 classes Find a line with the best margin (good distance from points to the line) Try to minimize the error (points on the bad side) Example of supervised algorithm : Support Vector Machine If distribution is fundamentaly non- linearly separable, algorithms exist to transform the data to higher dimension, and make it linearly separable.
  • 24. 24 Example of unsupervised algorithm : K-Means clustering Choose k points randomly in space (the seeds) Until convergence Assign each input point to nearest seed to form clusters Compute the center of gravity of clusters, and use these points as new seeds
  • 25. 25 Dimensionality reduction Example : product recommendation engine N customers x P products (ci, pj) = 1 if customer i bought product j Very big and sparse matrix Each customer is a point in a space having a big number of dimensions Idea : find a way to group products and reduce dimensions of this space Others algorithms 0 0 0 0 1 0 0 … 0 0 1 0 0 1 M products 10Mcustomers P1 P2 Pn … … 0 1 0 0 0 0 0 … 0 0 0 0 0 0 0 0 0 0 0 1 … 0 0 0 0 0 Quantity prediction Linear regression : The oldest and most known algorithm
  • 26. 26 Many algorithms to use, depending on the situation !
  • 27. 27 © OCTO 2013© OCTO 2012© OCTO 2013 TECHNOLOGY
  • 28. 28 1956 : 50 k$ for a 5 MB IBM hard-drive… today : 20 € for a 8 GB microSD !
  • 29. 29 Exponential growth of capacities CPU, memory, network bandwith, storage … all of them followed the Moore’s law Source : http://strata.oreilly.com/2011/08/building-data-startups.html
  • 30. 30 The old strategy : Scale-up 0.01 0.10 1.00 10.00 100.00 1,000.00 10,000.00 100,000.00 1,000,000.00 1965 1970 1975 1980 1985 1990 1995 2000 2005 2010 2015 100k $/GB 0,10 $/GB HDD RAM The old way : If you have too much data, just wait a few months that the cost decrease, and then scale-up your infrastructure Source : http://www.mkomo.com/cost-per-gigabyte
  • 32. 32 0 10 20 30 40 50 60 70 MB/s 1990 2010 64 MB/s 0,7 MB/s Seagate Barracuda 7200.10 Seagate Barracuda ATA IV IBM DTTA 35010 x 100’000 x 91 Storage capacity Throughtput We can store 100’000 times more data, but it takes 1000 times longer to read it !
  • 33. 33 Limitations of traditional architectures Over 10 Tb, « classical » architectures requires huge software and hardware adaptations. Over 1 000 transactions / second, « classical » architectures requires huge software and hardware adaptations. Over 10 threads/Core CPU, sequential programming reach its limits (IO). Over 1 000 events / second, « classical » architectures requires huge software and hardware adaptations. Distributed storage Share nothing XTP Parallel processing Event Stream Processing « Traditional » architectures RDBMS, Application server, ETL, ESB Event flow oriented application (streaming) Transaction oriented applications (TPS) Storage oriented applications (IO bound) Computation oriented applications (CPU bound)
  • 34. 34 Big Data = explosion of volumes : data to store online processing to parallelize number of transactions per second to handle number of messages per second to process + New constraints New types of data (unstructured, semi-structured…) Distribution of storage and processing Cost reduction Need of elasticity = New technologies Horizontal scalability and clustering Data partitioning / sharding Parallel processing In-memory processing New Architectures
  • 35. 35 Some emerging solutions Event flow oriented application (streaming) Transaction oriented applications (TPS) Storage oriented applications (IO bound) Computation oriented applications (CPU bound) Cassandra, Mong oDB, CouchDB HDFS SQLFire Teradata Hana Grid Computing Giga Spaces Map Reduce GPU Voldemort Exadata HBase Esper Quartet ActivePivot Sqoop RabbitMQ, Zero MQ Ultra Low latency Hama Igraph MapR EMC Redis ExalyticsIn memory Distributed « Traditional » architectures RDBMS, Application server, ETL, ESB
  • 36. 36 Event flow oriented application (streaming) Transaction oriented applications (TPS) Storage oriented applications (IO bound) Computation oriented applications (CPU bound) NoSQL NewSQL NoSQL : ditributed non- relational stores, NewSQL : SQL compliant distributed stores Streaming CEP - Complex Event Processing, ESP - Event Stream Processing Grid - GPU Grid computing on CPU, or on GPU In-memory analytics solutions distribute the data in the memory of several nodes to obtain a low processing time. In-memory analytics Hadoop The Hadoop ecosystem offers a distributed storage, but also distributed computing using MapReduce. Emerging families
  • 37. 37
  • 38. 38 Hadoop : a reference in the Big Data landscape • Apache Hadoop Open Source • Cloudera CDH • Hortonworks • MapR • DataStax (Brisk) Main distributions • Greenplum (EMC) • IBM InfoSphere BigInsights (CDH) • Oracle Big data appliance (CDH) • NetApp Analytics (CDH) • … Commercial • Amazon EMR (MapR) • VirtualScale (CDH) Cloud
  • 39. 39 Key principles File storage more voluminous than a single disk Data distributed on several nodes Data replication to ensure « fail-over », with « rack awareness » Use of commodity disk instead of SAN Hadoop Distributed File System (HDFS)
  • 40. 40 Key principles Parallelise and distribute processing Quicker processing of smaller data volumes (unitary) Co-location of processing and data Hadoop distributed processing : Map Reduce
  • 41. 41 Overview of Hadoop architecture Distributed Storage Distributed Processing Querying Advanced processing Orchestration Integrationw/ InformationSystem MonitoringandManagement
  • 42. 42 Available tools in a typical distribution (CDH) HDFS MapReduce YARN (v2) Pig Cascading Hive Oozie Azkaban Mahout HAMA Giraph Sqoop Flume Scribe Chukwa CLI Web Console Hue Cloudera Manager HBase Impala
  • 43. 43 Hadoop : a blooming ecosystem !! Processing Hadoop Distributed Storage Distributed FS Local FS NoSQL datastores GlusterFS HDFS S3 CephCassandra RingDynamoDB OLAP OLTP Machine Learning HBase Impala Hawq Map Reduce / Tez Map Reduce / Tez R, Python,… MahoutStreaming Cascading R, Python,… Hive Pig StreamingCascading Spark Spark Openstack SwiftIsilon Scalding Giraph Hama SciKit Stinger MapR Lots of annoucements and new tools appearing every day … Maturity is very variable from one tool to another
  • 44. 44 Maturities of solutions in the Hadoop ecosystem are very heterogeneous Ex : HDFS and MapReduce are perfectly production ready Yahoo manages a peta-byte scale HDFS cluster But some tools around are still poor : especially admin and debug tools Ex : Impala (real-time querying, with SQL-compliant queries) is not production-ready Ex : Adaptation of machine learning libraries to distributed computation with MarReduce is on-going Apache Mahout has MapReduce compliant algorithms MapReduce libraries for R are quite young Maturity of tools
  • 45. 45 Hadoop is a rich and quite new technology, difficult to master Get trained, bring experts in your project !
  • 47. 47 Big Data aims at getting an economical advantage from the quantitative analysis of internal and external data
  • 48. 48 Data we traditionally manipulate (customers, product catalog…) Innovation is here !
  • 49. 49 Since many years, we use Machine Learning algorithms to find patterns in data Big Data technologies now allow us to manipulate much more data, and get more value with Machine Learning techniques Machine Learning + Big Data Linear regression Neural network
  • 50. 50 Hadoop : a reference in the Big Data technology landscap But with a very effervescent ecosystem. It’s hard to follow all the trends and evolutions without a dedicated RnD team. Don’t do this alone, get trained, and bring experts in your project Hadoop