4. What Is Data Science?
There
is no “data
science.” It’s a
misnomer
All science is
empirical and involves
data analysis.
Science implements a
method.
So do statisticians
5. What Is A Data Scientist?
Project
manager
Qualified statistician
Domain Business
expert
Experienced data
architect
Software engineer
(It’s a team)
6. Data Scientist v Business Analysts
Claims
that business
analysts can be data
scientists are dubious
Good practitioners of
statistics understand
data (from years of
training)
Software understands
nothing, it simply
implements algorithms
10. The Field of Business Intelligence
Hindsight
• Regular
reporting/operational
BI
Oversight
• Dashboards, OLAP,
BPM, etc.
Insight
• Data
mining, statistical
analysis
Foresight
• Predictive analytics
12. A Process Not An Activity
Data Analytics is a multidisciplinary end-to-end
process
Until recently it was a
walled-garden. But recently
the walls were torn down
by…
Data availability
Scalable technology
Open source tools
14. The CRITICAL Workload Issue
Previously, we viewed
database workloads as
an i/o optimization
problem
With analytics the
workload is a very
variable mix of i/o
and calculation
No databases were
built for this – not
even Big Data
databases
16. Analytical Latencies
1 Data access
2 Data preparation
3 Model development
4 Execution
5 Implementation
6 Model Audit & Update
Speed = value (probably)
17. The Open Source Dynamic
The R Language
Over 1 million
users
Hadoop and its
Ecosystem
Reduced latency
for analytics
Machine Learning
Algorithms
Raw power
None of these are engineered for performance
18. Machine Learning Algorithms - 1
There are many:
Neural network(s)
Bayesian networks
Decisions
trees/random
forests
Support vector
machines
K-means
Clustering
Regression(s)
Etc.
19. Machine Learning Algorithms - 2
They are not newly
invented
We did not
previously use them
much because we
never had the
computer power
Now that we have
the power (at a
price) we can
employ them
20. Machine Learning Algorithms - 3
Machine learning
algorithms can check
all possibilities
We never had the
computer power
Now that we have
the power (at a
price) we can
employ them
21. The Impact?
Machine learning
and processing
power (parallelism)
will change the
data analysis
process
The analytics team
needs to
understand IT
23. Business Metamorphosis
The role of data
analysis has not
changed
Only the speed has
changed
The process will
evolve
It will be disruptive
for incumbent
vendors
24. The Data Analysis Budget
Data Analysis is
Business R&D
The focus is on
business process
The outcome of
successful R&D is
a changed process
Think of
manufacturing for
a useful analogy
25. The Data Analysis Budget
Data Analysis is
Business R&D
The focus is on
business process
The outcome of
successful R&D is
a changed process
Think of
manufacturing for
a useful analogy
27. Non èfinitafino a quando
la signora grassacanta
Hardware disruption
Software disruption
Business process
disruption
All we know is:
Analytical
processing will get
faster
Analytic latencies
will reduce
Data will continue
to grow
Analytics will be a
differentiator