A presentation delivered by Mohammed Barakat on the 2nd Jordanian Continuous Improvement Open Day in Amman. The presentation is about Data Science and was delivered on 3rd October 2015.
2. About ME
Mohammed K. Barakat
• Industrial Engineer, The University of Jordan
• Business Excellence Manager-FINE Hygienic Paper Company
• Professional Engineer in Industrial Engineering (PE), (JCPQA-JEA)
• Project Management Professional (PMP), (PMI)
• Risk Management Professional (PMI-RMP), (PMI)
• Certified Six Sigma Black Belt (CSSBB), (ASQ)
• Certified Six Sigma Green Belt (CSSGB), (ASQ)
• Microsoft Certified Technology Specialist (MCTS), (Microsoft)
• Microsoft Certified Trainer (MCT), (Microsoft)
mohammedbarakat
MohdBarakat
MohdKBarakat
10/13/2015
2
3. Data Science: Career of the Future
10/13/2015
3
http://www.wired.com/insights/2014/06/tell-kids-data-scientists-doctors/
…Did you hear that? Data scientists earning more than
doctors…
…But salary is not the only reason…
…data scientists will have a measurable impact on the
future of healthcare.
4. Why Data Science?
10/13/2015
4
http://www.economist.com/node/15579717
…the quantity of information in the world is soaring
…150 exabytes (billion gigabytes) of data in 2005. This year,
it will create 1,200 exabytes…
…keeping up with this flood, and storing the bits that might
be useful, is difficult enough…
…Analyzing it, to spot patterns and extract useful
information, is harder..
…Even so, the data deluge is already starting to transform
business…
5. Why “Data Scientist” is a hugely important
profession in the next decade?
10/13/2015
5
“I keep saying that the sexy job in the next
10 years will be statisticians,” said Hal
Varian, chief economist at Google. “And I’m
not kidding.”
https://www.youtube.com/watch?v=pi472Mi3VLw
6. Why “Data Scientist” is a hugely important
profession in the next decade?
• …ability to take the data
10/13/2015
6
• …extract value from it
• …understand the process
• …visualize it
• …Not only at the professional level
• …communicate it
• …Ubiquitous data…but
• …Statisticians are just part of it
• …Scarcity in ability to understand data
and extract value from it
• …Managers need to access and
understand the data themselves
• …No army behind the scenes to
digest the information for you
7. What is Data Science?
10/13/2015
7
“Data Science is the extraction of knowledge from
large volumes of data that are structured or
unstructured”
often requires sorting through a great amount of
information and writing algorithms to extract insights
from this data.
8. What is Big Data?
10/13/2015
8
Big Data is high volume, high velocity, and/or high variety
information assets that require new forms of processing
to enable enhanced decision making, insight discovery
and process optimization."
The 3V’s of Big Data:
Volume: amount of data
Velocity: speed of data in and out
Variety: range of data type and sources
10. The Data Scientist Toolbox
10/13/2015
10
R Software
a software environment for statistical
computing and graphics
11. The Data Scientist Toolbox
10/13/2015
11
RStudio
An open source software to make it easy for
anyone to analyze data with R
12. The Data Scientist Toolbox
10/13/2015
12
You’ve got to do a lot of
coding!
13. The Data Scientist Toolbox
10/13/2015
13
You’ve got to work out
a lot of statistics!
14. The Data Scientist Toolbox
10/13/2015
14
Github.com RPubs.com
Share your results and code
Publish your full report and build a personal Brand
15. The Data Scientist Toolbox
10/13/2015
15
RPubs.com
You’d be a Data Scientist…
…..evidence-based results
…..reproducible research
16. The Data Science process explained
10/13/2015
16
STEP 1: Getting and Cleaning Data
Downloading files
Reading data
Raw vs. Tidy data
Merging data
Reshaping data
Summarizing data
Data ‘Housekeeping’
17. The Data Science process explained
10/13/2015
17
STEP 2: Exploratory Data Analysis
understand data properties
find patterns in data
communicate results
It is made quickly
Many are made
The goal is for personal understanding
18. The Data Science process explained
10/13/2015
18
STEP 3: Perform Statistical Inference
“Statistical inference is the process of drawing formal
conclusions from data”.
Some techniques and concepts:
Sampling
Randomization
Hypothesis Testing
Confidence Intervals (uncertainty)
Experimental Design
19. The Data Science process explained
10/13/2015
19
STEP 4: Perform Regression Modelling
“a statistical process for estimating the
relationships among variables”
understand how the value of the dependent
variable changes when any one of the
independent variables is varied.
widely used for prediction (next step)
20. The Data Science process explained
10/13/2015
20
STEP 5: Perform Machine Learning
“is a computer's way of learning from examples
by using algorithms that take in data and
improve themselves to predict on new data”
Example:
The spam filter working in the background to
block your junk email.
21. The Data Science process explained
10/13/2015
21
STEP 6: Make your research Reproducible
“Make analytic data and code available so that
others may reproduce findings”
Why?!
To provide scientific evidence of your findings.
http://www.rpubs.com/mohammedkb/TransMPGAnalysis
22. What it takes you to be a good Data Scientist
10/13/2015
22
Business
skills Communications
skills
Analytical
skills
Computer
science
Statistics
Creativity
Scientific
Mindset
Passion &
Perseverance
23. What to do next?
10/13/2015
23
Start learning about Data Science
Go to the Massive Open Online Course (MOOC)
o Coursera/Data Science
o DataCamp