The document discusses how CSC creates specialized models for customers through data science. It explains that textbooks oversimplify real-world data modeling, and that data scientists create customized models rather than just applying existing ones. Specialized model architectures require less data and training than general ones. The customer can help data scientists develop specialized architectures by understanding their business needs, explaining the data generation process, formulating hypotheses, and providing domain experts for consultation. CSC provides data science expertise to develop specialized models that can achieve excellent results for customers.
4. How technically deep is this document (scale 1 – 10)?:
5
For effective data science, how essential is
collaboration with the customer (scale 1-10)?:
10
6. Textbooks…
… use simplified data to explain
how you apply statistical
methods,
do not say much on how you
deal with data in real life.
… and are thus, misleading.
7. Textbooks make you believe
that an appropriate model for
your data already exists.
“You just needs to select the
right model and apply it.”
9. Correction: a data scientist creates a model.
Misconception: a data scientist applies a model.
10. Each data set has its own
oddities, quirks, issues, …
Each phenomenon that we
want to model lives in its
own world.
The job of a data scientist is to understand
this world, and to tailor a model
accordingly.
11. Rarely will an off-the-shelf model be
outright optimal for a real-life problem.
12. What a customer buys: a unique model
optimized for the customer’s needs.
13. Skills and experience of a data scientist
translate into the ability to create
customized models.
It may take 10 or 20 years to stack up a skill
set to effectively build customized models.
15. Creation of a model
requires one to master:
statistics, coding, optimization, story telling,
visualization, experimental design,
Big Data technology, clustering, business models,
regression, handling data bases, probability …
16. … scientific thinking, deep learning, intuition,
distributions, overfitting, information theory,
cross-correlation, fractal geometry,
computation, multivariate analysis, statistical
biases, no-free-lunch theorem, support-
vector machine, normalization,
regularization, matrix algebra, graph
theory, …
… Boltzman machine, drop out, entropy, auto-
associative networks, reinforcement learning,
Lasso, Cohonen network, back propagation, …
.
… natural language processing, scientific publishing,
Bayes theorem, genetic algorithms, swarm intelligence,
boosting, Markov process, softmax, power spectrum,
good regulator theorem, presentation skills, …
… + keeping up with 100s of new models and tools announced
every year.
17. Hence, a team of experienced
data scientists can often
navigate this world more
effectively
Experience + Team is what gets the
customer the best model at the end.
and creatively
than an
individual
alone.
23. These are all unique, newly created
models tailored for a particular purpose.
No existing model off-the-shelf could be simply
applied.
24. But what will a data scientist do?
How does one create a new model?
25. Important to distinguish model
architecture from a complete model.
Architecture: model specified but without training. Equations
and interactions between equations are defined, but
parameter values are not yet known.
Complete model: trained model. Parameter values are
known. Machine learning has been applied. The model
has been fully trained and tested, and is ready to be
deployed.
26. Example architecture:
A wiring diagram, defined data flow, topology, equations,…
but parameter values are not yet specified.
27. Example complete model:
W1,1 = 0.12
W1,2 = 0.03
W2,4 = -0.45
…
…+
Optimal values of parameters are found
through machine learning (training) process.
29. A data scientist works with a
tradeoff between effort
invested in designing
model’s architecture and
training a model.
The more specialized the
architecture for a given
problem, the less training is
needed.
IMPORTANT:
Architecture
Training
30. Advantages from a specialized
architecture:
- smaller datasets for training
- more resilient to over-fitting
- closer to global maximum
- fewer resources
- cost effective
- better overall performance
31. The opposite is an eclectic architecture.
Eclectic architecture can be applied to many
different data but needs more training. As a result,
- larger amounts of data needed
- intensive computation
- easily over-fitted
- likely ending in local minima
- higher development costs
- weaker performance
33. Why does specialized architecture
enhance learning?
- The architecture possesses already a
part of the needed knowledge — less is
left to be learned.
- The learning space becomes smaller
(reduced dimensionality)
- During learning, specialized architecture
rises signal above noise.
34. Big Data,
due to their mass,
allow working with more general
(eclectic)
architectures.
35. Relative contributions to model’s knowledge
Highly
specialized
architecture
“Small”
data
This is the ratio
we prefer.
Eclectic
architecture
Big
Data
This tradeoff is
often successful.
36. Example of specialized architecture -
general liner model (GLM):
Regression based on GLM can work well already
with as few as 100 data points.
The architecture of GLM
already contains knowledge
about:
- Gaussian distributions,
- linear relationships,
- independent sampling,
- pairwise correlations,
- …
37. The specialization of
GLM is founded in
the discoveries by
generations of
statisticians.
Over years, they
discovered a set of
properties that
tended to repeat in
real-life data sets.
The result is GLM.
38. A neural net can learn the same
linear relations as GLM + many other
relations that GLM cannot. This makes
neural nets more eclectic.
However, much larger data sets are
needed. The price for the generality
of architecture is data size and
training time.
it can learn a lot of different things.
Example of an eclectic architecture
– Multi-layer perceptron:
(aka, artificial neural network)
39. Small architecture
(general) Big Data also
profit from
specialized
architectures!
Bigger architecture
(more specific)
(less) Big Data
More data cannot
always replace
architecture: (curse of
dimensionality)
40. Example Big Data combined
with specialized architecture
– Convolutional NN:
Only local connectivity; the same
weights are repeated across all
neurons of one layer.
Convolutional layers in a neural
network contain specific
knowledge on how the visual
world is organized.
Addition of convolutional layers
improves learning.
41. Better NN architecture; more suitable
for processing images; the model
‘knows’ that local pixels are correlated
and that they contain information on
visual features.
Consequences:
A deep neural network with convolutional layers will
perform more effectively than either an all-to-all
connected deep network or any other “shallow”
network.
42. A customer can assist
data scientists in
developing:
as specialized
architecture as possible.
43. “Any two optimization algorithms are
equivalent when their performance is
averaged across all possible
problems.”
No free lunch theorem
Can there exist an eclectic model that also
learns easily, like a specialized model?
No!
Because of:
44. This is what machine learning
is not - even with Big Data.
Any data science
problem will
require working
on an
appropriate
model
architecture.
45. high training effort,
lower performance
Specialized
architecture
low training effort,
often high performance
Eclectic
architecture
FastlearnersSlowlearners
If you are in this
corner, you may be
using a wrong model
for the given data.
Laws of
physics
Linear
regression
Deep
learning
Genetic
algo-
rithms
SVM
Decision
tree
Random
forest
Naïve
Bayes
Various off-the-
shelf models can
be approximately
sorted according to
how specialized
they are:
the
black
triangle
of unreality
due to the no–
free–lunch theorem
The slope of
optimal model
application
46. Off-the-shelf models usually are
not end architectures. More often,
they are only components of
specialized models.
The more eclectic an off-the-shelf
model, the more room for adding
specializations there is.
47. A data scientists will often
combine of-the-shelf models
with other components to
build a model specialized for
customer’s data.
48. Commonly used specialization tool: data wrangling.
Data wrangling extracts from the data what is important (the signal!) and in a way
that is suitable for an off-the-shelf model. Example:
Equations for
data wrangling
Data
Neural net + Specific wrangling steps -> form together a highly specialized model.
Here, data wrangling plays a role similar to that of convolution in deep neural nets.
Less thought may be needed to apply a neural
net. This is because neural net alone provides
an eclectic architecture.
+
Extensive thought
given to data
wrangling.
50. High training effort
Specialized
architecture
Low training effort
Eclectic
architecture
An inexperienced
data scientists may
spend a lot of time
in this corner.
Where does a
data scientist
operate?
A naïve ‘data
scientist’
would hope to
end up here.
.
51. How does a data scientist
do that?
Three main steps
for building a
specialized
architecture:
52. 1. UNDERSTAND!
- Analyze data,
dependencies between
variables, distributions,
etc.
- Study the (physical)
system that generated
the data.
53. A data scientist will perform calculations with the goal to
understand the data.
Various tools to help understanding:
A data scientist will talk to experts, ask questions, read
literature, go for a walk to think.
descriptive statistics, distribution plots,
visualizations, scatter plots, time series, cross-
correlation, fractal dimension, …
By doing so, a data scientist will seek insights necessary to
implement novel model architectures.
62. Goal: Detect healthy and unhealthy operations of a fan + classify the
source of disturbances. 3-axis vibration sensor mounted on the fan.
Data wrangling and insights: power spectrum to identify frequency
bands carrying signals.
Anomaly detection: An auto-associative neural network on full
power spectrum.
Disturbance classification: Logistic regression on selected frequency
bands.
Performance: 100% on new data sets.
Data Science tools:
64. Goal: Reconstruct what the animal sees (stimulus) from the activity of neurons in the visual
cortex.
Data wrangling and insights: Spike sorting; Convolution of neuronal spiking activity.
Stimulus identification: Support vector machine fed with convoluted neural activity.
Stimulus reconstruction: An array of naïve bias classifiers.
Performance: Up to 90%, 10-fold cross-validation.
Data Science tools:
Reference: Nikolić, D.*, S. Häusler*, W. Singer and W. Maass
(2009) Distributed fading memory for stimulus properties in
the primary visual cortex. PLoS Biology 2009, 7: e1000260.
66. Goal: Predict whether a coffee machine will be visited by a
technician within the next 3 months. Data: telemetric data
on machine usage.
Data wrangling and insights: cumulative variables, cross-
correlation, heat map.
Model: 4-layer artificial neural network on wrangled data.
Performance: 14.1% above chance, 10-fold cross validation.
Best performance among 10 competitors.
Data Science tools:
68. Goal: Compute new timetables in real-time depending on the current traffic situation.
Model specialization: Railway network implemented
as a graph; nodes and edges executed as neural nets.
Predictions: individual delays; departure, arrival and waiting times.
Performance: We could predict with 68% accuracy a 3-minute window in which a train will
arrive/depart, for as far as 48 hours in the future;
Data Science and
Big Data tools:
71. WHAT WE NEED FROM THE
CUSTOMER IS:
Make us understand your
world!
72. You need to do everything in your
power to transfer model-relevant
knowledge to us.
(We’ll do the rest.)
73. Customer’s homework:
- Know your economics.
- Describe the process that created the data.
- Formulate hypotheses.
- Ensure access to relevant experts in your
company.
74. Your economics: Which model
could possibly make you money, or bring
other benefits?
Costs increase with
Data Science and
analytics effort.
As a result
savings and
profits rise,
but not
linearly.
Sweet spot:
Data Science costs are low,
benefits are large
Data Science can cost
you more than what it
saves.
75. The process that created data
Be it a single machine or an entire factory floor, a hospital ward or
a marketing campaign, the more we understand about the process,
the more specialization can we insert into the model.
76. Where do you think the signal in the
data is? What is your hypothesis?
Good specialized architecture
extracts signal over noise.
Point us to the direction you think is
right. We’ll check whether there is a
signal.
79. The difference between taking an off-
the-shelf-model and investing time and
expertise to create a specialized model
translates into a difference between
mediocre results
and excellent results.
81. CSC provides top Data Science expertise for
developing specialized model architectures
in industry.
82. Dr. Günter Koch
Senior Manager
gkoch@csc.com
Davor Andric
Principal Solution Architect
dandric@csc.com
Christian Kaupa
Director BD&A
ckaupa@csc.com
Prof. Dr. Danko Nikolic
Lead Data Scientist
dnikolic3@csc.com
Contacts: