SlideShare une entreprise Scribd logo
1  sur  83
DATA SCIENCE –
how we do the magic
And how can the customer help.
Prof. Danko Nikolic, PhD
How we become successful
together?
Explain how CSC does data
science and what we need
from the customer.
MAIN GOAL:
PRIMARY AUDIENCE
technical personnel at the customer side
How technically deep is this document (scale 1 – 10)?:
5
For effective data science, how essential is
collaboration with the customer (scale 1-10)?:
10
INTRO
Textbooks…
… use simplified data to explain
how you apply statistical
methods,
do not say much on how you
deal with data in real life.
… and are thus, misleading.
Textbooks make you believe
that an appropriate model for
your data already exists.
“You just needs to select the
right model and apply it.”
Unfortunately, data science
is not that simple.
Data scientists do not just
pick models.
Correction: a data scientist creates a model.
Misconception: a data scientist applies a model.
Each data set has its own
oddities, quirks, issues, …
Each phenomenon that we
want to model lives in its
own world.
The job of a data scientist is to understand
this world, and to tailor a model
accordingly.
Rarely will an off-the-shelf model be
outright optimal for a real-life problem.
What a customer buys: a unique model
optimized for the customer’s needs.
Skills and experience of a data scientist
translate into the ability to create
customized models.
It may take 10 or 20 years to stack up a skill
set to effectively build customized models.
At we offer that experience.
Creation of a model
requires one to master:
statistics, coding, optimization, story telling,
visualization, experimental design,
Big Data technology, clustering, business models,
regression, handling data bases, probability …
… scientific thinking, deep learning, intuition,
distributions, overfitting, information theory,
cross-correlation, fractal geometry,
computation, multivariate analysis, statistical
biases, no-free-lunch theorem, support-
vector machine, normalization,
regularization, matrix algebra, graph
theory, …
… Boltzman machine, drop out, entropy, auto-
associative networks, reinforcement learning,
Lasso, Cohonen network, back propagation, …
.
… natural language processing, scientific publishing,
Bayes theorem, genetic algorithms, swarm intelligence,
boosting, Markov process, softmax, power spectrum,
good regulator theorem, presentation skills, …
… + keeping up with 100s of new models and tools announced
every year.
Hence, a team of experienced
data scientists can often
navigate this world more
effectively
Experience + Team is what gets the
customer the best model at the end.
and creatively
than an
individual
alone.
Examples of notable team
efforts:
US$ 1,000,000
These are all unique, newly created
models tailored for a particular purpose.
No existing model off-the-shelf could be simply
applied.
But what will a data scientist do?
How does one create a new model?
Important to distinguish model
architecture from a complete model.
Architecture: model specified but without training. Equations
and interactions between equations are defined, but
parameter values are not yet known.
Complete model: trained model. Parameter values are
known. Machine learning has been applied. The model
has been fully trained and tested, and is ready to be
deployed.
Example architecture:
A wiring diagram, defined data flow, topology, equations,…
but parameter values are not yet specified.
Example complete model:
W1,1 = 0.12
W1,2 = 0.03
W2,4 = -0.45
…
…+
Optimal values of parameters are found
through machine learning (training) process.
Architecture
Training
Human
person does
the work.
Machine
does the
work.
+
A data scientist works with a
tradeoff between effort
invested in designing
model’s architecture and
training a model.
The more specialized the
architecture for a given
problem, the less training is
needed.
IMPORTANT:
Architecture
Training
Advantages from a specialized
architecture:
- smaller datasets for training
- more resilient to over-fitting
- closer to global maximum
- fewer resources
- cost effective
- better overall performance
The opposite is an eclectic architecture.
Eclectic architecture can be applied to many
different data but needs more training. As a result,
- larger amounts of data needed
- intensive computation
- easily over-fitted
- likely ending in local minima
- higher development costs
- weaker performance
Architecture
Training
Specialized architecture
brings heavy weight to the
performance of a model.
Why does specialized architecture
enhance learning?
- The architecture possesses already a
part of the needed knowledge — less is
left to be learned.
- The learning space becomes smaller
(reduced dimensionality)
- During learning, specialized architecture
rises signal above noise.
Big Data,
due to their mass,
allow working with more general
(eclectic)
architectures.
Relative contributions to model’s knowledge
Highly
specialized
architecture
“Small”
data
This is the ratio
we prefer.
Eclectic
architecture
Big
Data
This tradeoff is
often successful.
Example of specialized architecture -
general liner model (GLM):
Regression based on GLM can work well already
with as few as 100 data points.
The architecture of GLM
already contains knowledge
about:
- Gaussian distributions,
- linear relationships,
- independent sampling,
- pairwise correlations,
- …
The specialization of
GLM is founded in
the discoveries by
generations of
statisticians.
Over years, they
discovered a set of
properties that
tended to repeat in
real-life data sets.
The result is GLM.
A neural net can learn the same
linear relations as GLM + many other
relations that GLM cannot. This makes
neural nets more eclectic.
However, much larger data sets are
needed. The price for the generality
of architecture is data size and
training time.
it can learn a lot of different things.
Example of an eclectic architecture
– Multi-layer perceptron:
(aka, artificial neural network)
Small architecture
(general) Big Data also
profit from
specialized
architectures!
Bigger architecture
(more specific)
(less) Big Data
More data cannot
always replace
architecture: (curse of
dimensionality)
Example Big Data combined
with specialized architecture
– Convolutional NN:
Only local connectivity; the same
weights are repeated across all
neurons of one layer.
Convolutional layers in a neural
network contain specific
knowledge on how the visual
world is organized.
Addition of convolutional layers
improves learning.
Better NN architecture; more suitable
for processing images; the model
‘knows’ that local pixels are correlated
and that they contain information on
visual features.
Consequences:
A deep neural network with convolutional layers will
perform more effectively than either an all-to-all
connected deep network or any other “shallow”
network.
A customer can assist
data scientists in
developing:
as specialized
architecture as possible.
“Any two optimization algorithms are
equivalent when their performance is
averaged across all possible
problems.”
No free lunch theorem
Can there exist an eclectic model that also
learns easily, like a specialized model?
No!
Because of:
This is what machine learning
is not - even with Big Data.
Any data science
problem will
require working
on an
appropriate
model
architecture.
high training effort,
lower performance
Specialized
architecture
low training effort,
often high performance
Eclectic
architecture
FastlearnersSlowlearners
If you are in this
corner, you may be
using a wrong model
for the given data.
Laws of
physics
Linear
regression
Deep
learning
Genetic
algo-
rithms
SVM
Decision
tree
Random
forest
Naïve
Bayes
Various off-the-
shelf models can
be approximately
sorted according to
how specialized
they are:
the
black
triangle
of unreality
due to the no–
free–lunch theorem
The slope of
optimal model
application
Off-the-shelf models usually are
not end architectures. More often,
they are only components of
specialized models.
The more eclectic an off-the-shelf
model, the more room for adding
specializations there is.
A data scientists will often
combine of-the-shelf models
with other components to
build a model specialized for
customer’s data.
Commonly used specialization tool: data wrangling.
Data wrangling extracts from the data what is important (the signal!) and in a way
that is suitable for an off-the-shelf model. Example:
Equations for
data wrangling
Data
Neural net + Specific wrangling steps -> form together a highly specialized model.
Here, data wrangling plays a role similar to that of convolution in deep neural nets.
Less thought may be needed to apply a neural
net. This is because neural net alone provides
an eclectic architecture.
+
Extensive thought
given to data
wrangling.
Remember: A data scientist CREATES a model.
High training effort
Specialized
architecture
Low training effort
Eclectic
architecture
An inexperienced
data scientists may
spend a lot of time
in this corner.
Where does a
data scientist
operate?
A naïve ‘data
scientist’
would hope to
end up here.
.
How does a data scientist
do that?
Three main steps
for building a
specialized
architecture:
1. UNDERSTAND!
- Analyze data,
dependencies between
variables, distributions,
etc.
- Study the (physical)
system that generated
the data.
A data scientist will perform calculations with the goal to
understand the data.
Various tools to help understanding:
A data scientist will talk to experts, ask questions, read
literature, go for a walk to think.
descriptive statistics, distribution plots,
visualizations, scatter plots, time series, cross-
correlation, fractal dimension, …
By doing so, a data scientist will seek insights necessary to
implement novel model architectures.
2. Formally describe
Describe the insight by
drawing a graph, writing
equations, listing the
rules, … ?
3. Implement into
software (code)
Understand
Formalize
Code
Various software
tools lay on data
scientist’s disposal.
No simple recipe on which
parts of a model to begin
working first
it’s a creative process!
Understand
Formalize
Code model
Test
Train
Evaluate
Important help
from the
customer
comes here.
Therefore, iterations:
Examples of successful specialized
models created by Data
Science team:
Example I:
Predictive maintenance—fan operations
Vibration analysis
Goal: Detect healthy and unhealthy operations of a fan + classify the
source of disturbances. 3-axis vibration sensor mounted on the fan.
Data wrangling and insights: power spectrum to identify frequency
bands carrying signals.
Anomaly detection: An auto-associative neural network on full
power spectrum.
Disturbance classification: Logistic regression on selected frequency
bands.
Performance: 100% on new data sets.
Data Science tools:
Example II:
Mind reading
Brain signals
Goal: Reconstruct what the animal sees (stimulus) from the activity of neurons in the visual
cortex.
Data wrangling and insights: Spike sorting; Convolution of neuronal spiking activity.
Stimulus identification: Support vector machine fed with convoluted neural activity.
Stimulus reconstruction: An array of naïve bias classifiers.
Performance: Up to 90%, 10-fold cross-validation.
Data Science tools:
Reference: Nikolić, D.*, S. Häusler*, W. Singer and W. Maass
(2009) Distributed fading memory for stimulus properties in
the primary visual cortex. PLoS Biology 2009, 7: e1000260.
Example III:
Predictive maintenance—Coffee machines
Visits from a service
technician
Goal: Predict whether a coffee machine will be visited by a
technician within the next 3 months. Data: telemetric data
on machine usage.
Data wrangling and insights: cumulative variables, cross-
correlation, heat map.
Model: 4-layer artificial neural network on wrangled data.
Performance: 14.1% above chance, 10-fold cross validation.
Best performance among 10 competitors.
Data Science tools:
Example IV:
Train departure and arrival time
Goal: Compute new timetables in real-time depending on the current traffic situation.
Model specialization: Railway network implemented
as a graph; nodes and edges executed as neural nets.
Predictions: individual delays; departure, arrival and waiting times.
Performance: We could predict with 68% accuracy a 3-minute window in which a train will
arrive/depart, for as far as 48 hours in the future;
Data Science and
Big Data tools:
How exactly does
a customer help?
Customer does not only
deliver data.
WHAT WE NEED FROM THE
CUSTOMER IS:
Make us understand your
world!
You need to do everything in your
power to transfer model-relevant
knowledge to us.
(We’ll do the rest.)
Customer’s homework:
- Know your economics.
- Describe the process that created the data.
- Formulate hypotheses.
- Ensure access to relevant experts in your
company.
Your economics: Which model
could possibly make you money, or bring
other benefits?
Costs increase with
Data Science and
analytics effort.
As a result
savings and
profits rise,
but not
linearly.
Sweet spot:
Data Science costs are low,
benefits are large
Data Science can cost
you more than what it
saves.
The process that created data
Be it a single machine or an entire factory floor, a hospital ward or
a marketing campaign, the more we understand about the process,
the more specialization can we insert into the model.
Where do you think the signal in the
data is? What is your hypothesis?
Good specialized architecture
extracts signal over noise.
Point us to the direction you think is
right. We’ll check whether there is a
signal.
The person
we may need
to talk to
CSC + Customer form a full team.
The difference between taking an off-
the-shelf-model and investing time and
expertise to create a specialized model
translates into a difference between
mediocre results
and excellent results.
At we are after excellent results.
CSC provides top Data Science expertise for
developing specialized model architectures
in industry.
Dr. Günter Koch
Senior Manager
gkoch@csc.com
Davor Andric
Principal Solution Architect
dandric@csc.com
Christian Kaupa
Director BD&A
ckaupa@csc.com
Prof. Dr. Danko Nikolic
Lead Data Scientist
dnikolic3@csc.com
Contacts:
How data science works and how can customers help

Contenu connexe

Tendances

soft-computing
 soft-computing soft-computing
soft-computing
student
 
Soft computing abstracts
Soft computing abstractsSoft computing abstracts
Soft computing abstracts
abctry
 

Tendances (19)

State-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domainsState-of-the-art Image Processing across all domains
State-of-the-art Image Processing across all domains
 
soft-computing
 soft-computing soft-computing
soft-computing
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
02 knowledge-based systems
02 knowledge-based systems02 knowledge-based systems
02 knowledge-based systems
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Machine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin UniversityMachine Reasoning at A2I2, Deakin University
Machine Reasoning at A2I2, Deakin University
 
Soft Computing
Soft ComputingSoft Computing
Soft Computing
 
Introduction to Soft Computing
Introduction to Soft ComputingIntroduction to Soft Computing
Introduction to Soft Computing
 
Visual reasoning
Visual reasoningVisual reasoning
Visual reasoning
 
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...
SWARM OPTIMIZED MODULAR NEURAL NETWORK BASED DIAGNOSTIC SYSTEM FOR BREAST CAN...
 
Soft computing abstracts
Soft computing abstractsSoft computing abstracts
Soft computing abstracts
 
Deep learning ppt
Deep learning pptDeep learning ppt
Deep learning ppt
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
MACHINE LEARNING & ARTIFICIAL INTELLIGENCE: BEYOND DIAGNOSIS
MACHINE LEARNING & ARTIFICIAL INTELLIGENCE: BEYOND DIAGNOSIS MACHINE LEARNING & ARTIFICIAL INTELLIGENCE: BEYOND DIAGNOSIS
MACHINE LEARNING & ARTIFICIAL INTELLIGENCE: BEYOND DIAGNOSIS
 
Artificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support ProjectArtificial Intelligence for Automated Decision Support Project
Artificial Intelligence for Automated Decision Support Project
 
1.Introduction to deep learning
1.Introduction to deep learning1.Introduction to deep learning
1.Introduction to deep learning
 
AI/ML as an empirical science
AI/ML as an empirical scienceAI/ML as an empirical science
AI/ML as an empirical science
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 
The Deep Learning Glossary
The Deep Learning GlossaryThe Deep Learning Glossary
The Deep Learning Glossary
 

En vedette

resume- laxmi sati
resume- laxmi satiresume- laxmi sati
resume- laxmi sati
Laxmi Sati
 
CDR_2016_Defence_Executive
CDR_2016_Defence_ExecutiveCDR_2016_Defence_Executive
CDR_2016_Defence_Executive
Mike Greenley
 
CAPSTONE-FINAL-5-11-15
CAPSTONE-FINAL-5-11-15CAPSTONE-FINAL-5-11-15
CAPSTONE-FINAL-5-11-15
Rusty Mooney
 
Relatoria sesión ii diplomado en formacion pedagogica mediada por el uso de l...
Relatoria sesión ii diplomado en formacion pedagogica mediada por el uso de l...Relatoria sesión ii diplomado en formacion pedagogica mediada por el uso de l...
Relatoria sesión ii diplomado en formacion pedagogica mediada por el uso de l...
Diana patricia Paredes Guerrero
 

En vedette (20)

Ciclos formativos CEGAM
Ciclos formativos CEGAMCiclos formativos CEGAM
Ciclos formativos CEGAM
 
Michael J 3
Michael J 3Michael J 3
Michael J 3
 
Propuesta de marketing
Propuesta de marketingPropuesta de marketing
Propuesta de marketing
 
resume- laxmi sati
resume- laxmi satiresume- laxmi sati
resume- laxmi sati
 
CDR_2016_Defence_Executive
CDR_2016_Defence_ExecutiveCDR_2016_Defence_Executive
CDR_2016_Defence_Executive
 
CAPSTONE-FINAL-5-11-15
CAPSTONE-FINAL-5-11-15CAPSTONE-FINAL-5-11-15
CAPSTONE-FINAL-5-11-15
 
Analisis del formulario
Analisis del formularioAnalisis del formulario
Analisis del formulario
 
Presentación de ejemplo
Presentación de ejemploPresentación de ejemplo
Presentación de ejemplo
 
President’s report first quarter 2015
President’s report first quarter 2015President’s report first quarter 2015
President’s report first quarter 2015
 
Jesús de Natzaret (1)
Jesús de Natzaret (1)Jesús de Natzaret (1)
Jesús de Natzaret (1)
 
Herramientas para subir archivos en la web
Herramientas para subir archivos en la webHerramientas para subir archivos en la web
Herramientas para subir archivos en la web
 
M4Lv2
M4Lv2M4Lv2
M4Lv2
 
El Software y el Hardware Daniela Aguilera 8.00 a 9:30 Miercoles
El Software y el Hardware Daniela Aguilera 8.00 a 9:30 MiercolesEl Software y el Hardware Daniela Aguilera 8.00 a 9:30 Miercoles
El Software y el Hardware Daniela Aguilera 8.00 a 9:30 Miercoles
 
Relatoria sesión ii diplomado en formacion pedagogica mediada por el uso de l...
Relatoria sesión ii diplomado en formacion pedagogica mediada por el uso de l...Relatoria sesión ii diplomado en formacion pedagogica mediada por el uso de l...
Relatoria sesión ii diplomado en formacion pedagogica mediada por el uso de l...
 
InfoWord - D, E e F
InfoWord - D, E e FInfoWord - D, E e F
InfoWord - D, E e F
 
Implementación de la integración erp en una empresa distribuidora de libros
Implementación de la integración erp en una empresa distribuidora de librosImplementación de la integración erp en una empresa distribuidora de libros
Implementación de la integración erp en una empresa distribuidora de libros
 
Informática instrumental
Informática instrumentalInformática instrumental
Informática instrumental
 
Herramientas para el trabajo colaborativo
Herramientas para el trabajo colaborativoHerramientas para el trabajo colaborativo
Herramientas para el trabajo colaborativo
 
Wole Soyinka
Wole SoyinkaWole Soyinka
Wole Soyinka
 
Fort casey album
Fort casey albumFort casey album
Fort casey album
 

Similaire à How data science works and how can customers help

notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
butest
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
butest
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
kartikaursang53
 
ETRnew.doc.doc
ETRnew.doc.docETRnew.doc.doc
ETRnew.doc.doc
butest
 
ETRnew.doc.doc
ETRnew.doc.docETRnew.doc.doc
ETRnew.doc.doc
butest
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
butest
 

Similaire à How data science works and how can customers help (20)

ML crash course
ML crash courseML crash course
ML crash course
 
Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?Why we really need big data? Can't things work with small data too?
Why we really need big data? Can't things work with small data too?
 
Model Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep LearningModel Evaluation in the land of Deep Learning
Model Evaluation in the land of Deep Learning
 
Multilayered paper prototyping for user concept modeling
Multilayered paper prototyping for user concept modelingMultilayered paper prototyping for user concept modeling
Multilayered paper prototyping for user concept modeling
 
Novi sad ai event 1-2018
Novi sad ai event 1-2018Novi sad ai event 1-2018
Novi sad ai event 1-2018
 
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
M.Sc. Thesis Topics and Proposals @ Polimi Data Science Lab - 2024 - prof. Br...
 
MachinaFiesta: A Vision into Machine Learning 🚀
MachinaFiesta: A Vision into Machine Learning 🚀MachinaFiesta: A Vision into Machine Learning 🚀
MachinaFiesta: A Vision into Machine Learning 🚀
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
LearningAG.ppt
LearningAG.pptLearningAG.ppt
LearningAG.ppt
 
Unit one ppt of deeep learning which includes Ann cnn
Unit one ppt of  deeep learning which includes Ann cnnUnit one ppt of  deeep learning which includes Ann cnn
Unit one ppt of deeep learning which includes Ann cnn
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
Deep Learning For Practitioners,  lecture 2: Selecting the right applications...Deep Learning For Practitioners,  lecture 2: Selecting the right applications...
Deep Learning For Practitioners, lecture 2: Selecting the right applications...
 
Deep learning Introduction and Basics
Deep learning  Introduction and BasicsDeep learning  Introduction and Basics
Deep learning Introduction and Basics
 
ETRnew.doc.doc
ETRnew.doc.docETRnew.doc.doc
ETRnew.doc.doc
 
ETRnew.doc.doc
ETRnew.doc.docETRnew.doc.doc
ETRnew.doc.doc
 
Deep learning tutorial 9/2019
Deep learning tutorial 9/2019Deep learning tutorial 9/2019
Deep learning tutorial 9/2019
 
Deep Learning Tutorial
Deep Learning TutorialDeep Learning Tutorial
Deep Learning Tutorial
 
Model evaluation in the land of deep learning
Model evaluation in the land of deep learningModel evaluation in the land of deep learning
Model evaluation in the land of deep learning
 
On Machine Learning and Data Mining
On Machine Learning and Data MiningOn Machine Learning and Data Mining
On Machine Learning and Data Mining
 

Dernier

Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 

Dernier (20)

ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 

How data science works and how can customers help

  • 1. DATA SCIENCE – how we do the magic And how can the customer help. Prof. Danko Nikolic, PhD
  • 2. How we become successful together? Explain how CSC does data science and what we need from the customer. MAIN GOAL:
  • 4. How technically deep is this document (scale 1 – 10)?: 5 For effective data science, how essential is collaboration with the customer (scale 1-10)?: 10
  • 6. Textbooks… … use simplified data to explain how you apply statistical methods, do not say much on how you deal with data in real life. … and are thus, misleading.
  • 7. Textbooks make you believe that an appropriate model for your data already exists. “You just needs to select the right model and apply it.”
  • 8. Unfortunately, data science is not that simple. Data scientists do not just pick models.
  • 9. Correction: a data scientist creates a model. Misconception: a data scientist applies a model.
  • 10. Each data set has its own oddities, quirks, issues, … Each phenomenon that we want to model lives in its own world. The job of a data scientist is to understand this world, and to tailor a model accordingly.
  • 11. Rarely will an off-the-shelf model be outright optimal for a real-life problem.
  • 12. What a customer buys: a unique model optimized for the customer’s needs.
  • 13. Skills and experience of a data scientist translate into the ability to create customized models. It may take 10 or 20 years to stack up a skill set to effectively build customized models.
  • 14. At we offer that experience.
  • 15. Creation of a model requires one to master: statistics, coding, optimization, story telling, visualization, experimental design, Big Data technology, clustering, business models, regression, handling data bases, probability …
  • 16. … scientific thinking, deep learning, intuition, distributions, overfitting, information theory, cross-correlation, fractal geometry, computation, multivariate analysis, statistical biases, no-free-lunch theorem, support- vector machine, normalization, regularization, matrix algebra, graph theory, … … Boltzman machine, drop out, entropy, auto- associative networks, reinforcement learning, Lasso, Cohonen network, back propagation, … . … natural language processing, scientific publishing, Bayes theorem, genetic algorithms, swarm intelligence, boosting, Markov process, softmax, power spectrum, good regulator theorem, presentation skills, … … + keeping up with 100s of new models and tools announced every year.
  • 17. Hence, a team of experienced data scientists can often navigate this world more effectively Experience + Team is what gets the customer the best model at the end. and creatively than an individual alone.
  • 18. Examples of notable team efforts:
  • 19.
  • 20.
  • 22.
  • 23. These are all unique, newly created models tailored for a particular purpose. No existing model off-the-shelf could be simply applied.
  • 24. But what will a data scientist do? How does one create a new model?
  • 25. Important to distinguish model architecture from a complete model. Architecture: model specified but without training. Equations and interactions between equations are defined, but parameter values are not yet known. Complete model: trained model. Parameter values are known. Machine learning has been applied. The model has been fully trained and tested, and is ready to be deployed.
  • 26. Example architecture: A wiring diagram, defined data flow, topology, equations,… but parameter values are not yet specified.
  • 27. Example complete model: W1,1 = 0.12 W1,2 = 0.03 W2,4 = -0.45 … …+ Optimal values of parameters are found through machine learning (training) process.
  • 29. A data scientist works with a tradeoff between effort invested in designing model’s architecture and training a model. The more specialized the architecture for a given problem, the less training is needed. IMPORTANT: Architecture Training
  • 30. Advantages from a specialized architecture: - smaller datasets for training - more resilient to over-fitting - closer to global maximum - fewer resources - cost effective - better overall performance
  • 31. The opposite is an eclectic architecture. Eclectic architecture can be applied to many different data but needs more training. As a result, - larger amounts of data needed - intensive computation - easily over-fitted - likely ending in local minima - higher development costs - weaker performance
  • 32. Architecture Training Specialized architecture brings heavy weight to the performance of a model.
  • 33. Why does specialized architecture enhance learning? - The architecture possesses already a part of the needed knowledge — less is left to be learned. - The learning space becomes smaller (reduced dimensionality) - During learning, specialized architecture rises signal above noise.
  • 34. Big Data, due to their mass, allow working with more general (eclectic) architectures.
  • 35. Relative contributions to model’s knowledge Highly specialized architecture “Small” data This is the ratio we prefer. Eclectic architecture Big Data This tradeoff is often successful.
  • 36. Example of specialized architecture - general liner model (GLM): Regression based on GLM can work well already with as few as 100 data points. The architecture of GLM already contains knowledge about: - Gaussian distributions, - linear relationships, - independent sampling, - pairwise correlations, - …
  • 37. The specialization of GLM is founded in the discoveries by generations of statisticians. Over years, they discovered a set of properties that tended to repeat in real-life data sets. The result is GLM.
  • 38. A neural net can learn the same linear relations as GLM + many other relations that GLM cannot. This makes neural nets more eclectic. However, much larger data sets are needed. The price for the generality of architecture is data size and training time. it can learn a lot of different things. Example of an eclectic architecture – Multi-layer perceptron: (aka, artificial neural network)
  • 39. Small architecture (general) Big Data also profit from specialized architectures! Bigger architecture (more specific) (less) Big Data More data cannot always replace architecture: (curse of dimensionality)
  • 40. Example Big Data combined with specialized architecture – Convolutional NN: Only local connectivity; the same weights are repeated across all neurons of one layer. Convolutional layers in a neural network contain specific knowledge on how the visual world is organized. Addition of convolutional layers improves learning.
  • 41. Better NN architecture; more suitable for processing images; the model ‘knows’ that local pixels are correlated and that they contain information on visual features. Consequences: A deep neural network with convolutional layers will perform more effectively than either an all-to-all connected deep network or any other “shallow” network.
  • 42. A customer can assist data scientists in developing: as specialized architecture as possible.
  • 43. “Any two optimization algorithms are equivalent when their performance is averaged across all possible problems.” No free lunch theorem Can there exist an eclectic model that also learns easily, like a specialized model? No! Because of:
  • 44. This is what machine learning is not - even with Big Data. Any data science problem will require working on an appropriate model architecture.
  • 45. high training effort, lower performance Specialized architecture low training effort, often high performance Eclectic architecture FastlearnersSlowlearners If you are in this corner, you may be using a wrong model for the given data. Laws of physics Linear regression Deep learning Genetic algo- rithms SVM Decision tree Random forest Naïve Bayes Various off-the- shelf models can be approximately sorted according to how specialized they are: the black triangle of unreality due to the no– free–lunch theorem The slope of optimal model application
  • 46. Off-the-shelf models usually are not end architectures. More often, they are only components of specialized models. The more eclectic an off-the-shelf model, the more room for adding specializations there is.
  • 47. A data scientists will often combine of-the-shelf models with other components to build a model specialized for customer’s data.
  • 48. Commonly used specialization tool: data wrangling. Data wrangling extracts from the data what is important (the signal!) and in a way that is suitable for an off-the-shelf model. Example: Equations for data wrangling Data Neural net + Specific wrangling steps -> form together a highly specialized model. Here, data wrangling plays a role similar to that of convolution in deep neural nets. Less thought may be needed to apply a neural net. This is because neural net alone provides an eclectic architecture. + Extensive thought given to data wrangling.
  • 49. Remember: A data scientist CREATES a model.
  • 50. High training effort Specialized architecture Low training effort Eclectic architecture An inexperienced data scientists may spend a lot of time in this corner. Where does a data scientist operate? A naïve ‘data scientist’ would hope to end up here. .
  • 51. How does a data scientist do that? Three main steps for building a specialized architecture:
  • 52. 1. UNDERSTAND! - Analyze data, dependencies between variables, distributions, etc. - Study the (physical) system that generated the data.
  • 53. A data scientist will perform calculations with the goal to understand the data. Various tools to help understanding: A data scientist will talk to experts, ask questions, read literature, go for a walk to think. descriptive statistics, distribution plots, visualizations, scatter plots, time series, cross- correlation, fractal dimension, … By doing so, a data scientist will seek insights necessary to implement novel model architectures.
  • 54. 2. Formally describe Describe the insight by drawing a graph, writing equations, listing the rules, … ?
  • 57. Various software tools lay on data scientist’s disposal.
  • 58. No simple recipe on which parts of a model to begin working first it’s a creative process!
  • 59. Understand Formalize Code model Test Train Evaluate Important help from the customer comes here. Therefore, iterations:
  • 60. Examples of successful specialized models created by Data Science team:
  • 61. Example I: Predictive maintenance—fan operations Vibration analysis
  • 62. Goal: Detect healthy and unhealthy operations of a fan + classify the source of disturbances. 3-axis vibration sensor mounted on the fan. Data wrangling and insights: power spectrum to identify frequency bands carrying signals. Anomaly detection: An auto-associative neural network on full power spectrum. Disturbance classification: Logistic regression on selected frequency bands. Performance: 100% on new data sets. Data Science tools:
  • 64. Goal: Reconstruct what the animal sees (stimulus) from the activity of neurons in the visual cortex. Data wrangling and insights: Spike sorting; Convolution of neuronal spiking activity. Stimulus identification: Support vector machine fed with convoluted neural activity. Stimulus reconstruction: An array of naïve bias classifiers. Performance: Up to 90%, 10-fold cross-validation. Data Science tools: Reference: Nikolić, D.*, S. Häusler*, W. Singer and W. Maass (2009) Distributed fading memory for stimulus properties in the primary visual cortex. PLoS Biology 2009, 7: e1000260.
  • 65. Example III: Predictive maintenance—Coffee machines Visits from a service technician
  • 66. Goal: Predict whether a coffee machine will be visited by a technician within the next 3 months. Data: telemetric data on machine usage. Data wrangling and insights: cumulative variables, cross- correlation, heat map. Model: 4-layer artificial neural network on wrangled data. Performance: 14.1% above chance, 10-fold cross validation. Best performance among 10 competitors. Data Science tools:
  • 67. Example IV: Train departure and arrival time
  • 68. Goal: Compute new timetables in real-time depending on the current traffic situation. Model specialization: Railway network implemented as a graph; nodes and edges executed as neural nets. Predictions: individual delays; departure, arrival and waiting times. Performance: We could predict with 68% accuracy a 3-minute window in which a train will arrive/depart, for as far as 48 hours in the future; Data Science and Big Data tools:
  • 69. How exactly does a customer help?
  • 70. Customer does not only deliver data.
  • 71. WHAT WE NEED FROM THE CUSTOMER IS: Make us understand your world!
  • 72. You need to do everything in your power to transfer model-relevant knowledge to us. (We’ll do the rest.)
  • 73. Customer’s homework: - Know your economics. - Describe the process that created the data. - Formulate hypotheses. - Ensure access to relevant experts in your company.
  • 74. Your economics: Which model could possibly make you money, or bring other benefits? Costs increase with Data Science and analytics effort. As a result savings and profits rise, but not linearly. Sweet spot: Data Science costs are low, benefits are large Data Science can cost you more than what it saves.
  • 75. The process that created data Be it a single machine or an entire factory floor, a hospital ward or a marketing campaign, the more we understand about the process, the more specialization can we insert into the model.
  • 76. Where do you think the signal in the data is? What is your hypothesis? Good specialized architecture extracts signal over noise. Point us to the direction you think is right. We’ll check whether there is a signal.
  • 77. The person we may need to talk to
  • 78. CSC + Customer form a full team.
  • 79. The difference between taking an off- the-shelf-model and investing time and expertise to create a specialized model translates into a difference between mediocre results and excellent results.
  • 80. At we are after excellent results.
  • 81. CSC provides top Data Science expertise for developing specialized model architectures in industry.
  • 82. Dr. Günter Koch Senior Manager gkoch@csc.com Davor Andric Principal Solution Architect dandric@csc.com Christian Kaupa Director BD&A ckaupa@csc.com Prof. Dr. Danko Nikolic Lead Data Scientist dnikolic3@csc.com Contacts: