SlideShare une entreprise Scribd logo
1  sur  73
C3249C - Data Mining and
Predictive Analytics
SpecialistDiplomainBusinessAnalytics(SDBA)
Lesson 14 – Concepts Recapitulation and
Conclusions: The Penultimate Lesson
6th June 2019
Rudy Ridwen
school•of•inforcomm
republic•polytechnic
©2020 Republic Polytechnic
Data Mining
Methodologies
A process guide for analytics
projects
2
©2020 Republic Polytechnic
Analytics
Why “Analytics”?
3https://www.freepik.com/free-vector/mechanical-
brain_769574.htm#term=sketch&page=5&position=2
©2020 Republic Polytechnic
Data Never Sleeps
4
Source: Domo, Inc.
©2020 Republic Polytechnic
Data Never Sleeps
5
Source: Domo, Inc.
“By 2025, it’s
estimated that 463
exabytes of data will be
created each day
globally – that’s the
equivalent of
212,765,957 DVDs per
day!”
World Economic Forum,
2019
NB:
• a Gigabyte (GB) is 1,000
Megabytes (MB);
• a Terabyte (TB) is 1,000
Gigabytes;
• a Petabyte (PB) is 1,000
Gigabytes;
• an Exabyte (EB) is 1,000
Petabytes
©2020 Republic Polytechnic
Data-Driven Innovation
6
“Data is a resource, much like water or
energy, and like any resource, data does
nothing on its own.
Rather, it is world-changing in how it is
employed in human decision making.”
Justin Hienz
Owner of Cogent Writing, LLC
©2020 Republic Polytechnic
Data is the New Oil
7
“Data is the new oil."
Coined in 2006 by British
Mathematician, Clive Humby.
This now famous phrase was
embraced by the World
Economic Forum in a 2011
report.
©2020 Republic Polytechnic
Data: the Basis of Everything
Cloud
Applications
(e.g. social media)
• Pervasive digitization exploded the amount of data being created and
collected.
• This provide the opportunity to make use of the data to gain insights
and to make better decision.
Social
Needs
Environment
Studies
Public
Services
Company
Operations
8
©2020 Republic Polytechnic
From Data to Wisdom
9
©2018 Republic Polytechnic
©2020 Republic Polytechnic
Decision Making with Analytics
10
Analytics can overcome human limitations to
improve the speed, accuracy, consistency, and
transparency of decisions.
©2020 Republic Polytechnic
11
Analytics 1.0 – the era of “business intelligence”
• This was the era of the enterprise data warehouse,
used to capture information, and of business
intelligence software, used to query and report it.
Analytics 2.0 – the era of big data
• Analytics 2.0 employed next-generation quantitative
analysts were called data scientists, and they
possessed both computational and analytical skills.
Analytics 3.0 – the era of data-enriched offerings
• Analytics 3.0 creates products and services from
analyses of data. Since every digital activity leaves a
trail, it provide the ability to embed analytics and
optimization into every business decision made at the
operation front lines.
The Evolution of Analytics
©2020 Republic Polytechnic
12
Business Analytics:
• Mathematical and statistical process of
transforming data into insight for making
better decisions.
• The data-driven analytics insights are used as
a complement to the decision maker’s
experience and “gut-feel”.
Business Analytics - Defined
©2020 Republic Polytechnic
2018 Gartner Magic Quadrant for BI and Analytics
13
©2020 Republic Polytechnic
2019 Gartner Magic Quadrant for BI and Analytics
14
©2020 Republic Polytechnic
Spectrum of Business Analytics
15
©2020 Republic Polytechnic
Achieving Success with Business Analytics
16
CompetitiveAdvantage
Basic Reporting What happened?
Ad Hoc Reporting How many, how often, where?
Dynamic Reporting Where exactly are the problems?
Reporting with Early Warning What actions are needed?
Basic Statistical Analysis Why is this happening?
Forecasting What if these trends continue?
Predictive Modeling What will happen next?
Decision Optimization What is the best decision?
Data Information Intelligence
Advanced
Analytics
Basic
Analytics
Reporting
Decision Support Decision Guidance
©2020 Republic Polytechnic
17
Planning from the Top Down
Analyze suitable analytics or
modeling that can answer
the business questions?
Define mission-critical
business questions that
must be answered.
Identify data do you have
that that can help to build
the model.
©2020 Republic Polytechnic
18
Planning from the Bottom Up
Determine suitable analytics
or modeling can be done
using the available data.
Suggest business problem
that can be solved using
analytics.
Identify the data that you
have.
©2020 Republic Polytechnic
19
Data Mining:
• Finding patterns or relationships among
elements of the data.
[unsupervised and supervised learning]
Predictive Analytics:
• Finding a pattern (from historical data) so that
an opportunity outcome can be identified
before it occurred.
[supervised learning]
Business Analytics
©2020 Republic Polytechnic
Analytics Expertise Required for Success
Domain
Knowledge
Intimate knowledge
of related industry
critical to analytics
project success.
Data
Availability
Data always impose
the constraints of
analytics
Analytical
Methods
and
Principles
Data Analytics
Skills
20
©2020 Republic Polytechnic
21
Deployment and use of BA:
• Financial analytics
• Human resource (HR) analytics
• Marketing analytics
• Health care analytics
• Supply chain analytics
• Analytics for government and non-profits
• Sports analytics
• Web and Social Media analytics
Business Analytics in Practice
©2020 Republic Polytechnic
Analytics
Frameworks
The Process of an Analytics Project
22https://www.freepik.com/free-vector/vintage-aircraft-
illustration_3043533.htm#term=sketch&page=11&position=12
©2020 Republic Polytechnic
23
Several methodologies Data Mining have been
developed, each with their own perspective.
The popular methodologies are:
• SEMMA (SAS)
• SAS Enterprise Miner
• Fayyad et al. (Computer science)
• WEKA
• CRISP-DM (IBM)
• SPSS Modeler
Methodologies for Data
Mining
©2020 Republic Polytechnic
SEMMA Methodology
24
Supported by SAS Enterprise Mining environment
SAMPLE
Input data,
Sampling,
Data partition
EXPLORE
Distribution explorer,
Multiplot,
Insight,
Association,
Variable selection
MODEL
Regression,
Tree,
Neural Network,
Ensemble
MODIFY
Transform variable,
Filter outliers,
Clustering,
SOM / Kohonen
ASSESS
Assessment,
Score,
Report
©2020 Republic Polytechnic
Fayyad’s KDD Methodology
25
KDD: knowledge discovery and data mining
data
Target
data
Processed
data
Transformed
data Patterns
Knowledge
Selection
Preprocessing
& cleaning
Transformation
& feature
selection
Data Mining
Interpretation
Evaluation
Reproduced from: maastrichtuniversity.nl lecture notes
©2020 Republic Polytechnic
CRISP-DM Methodology
26
CRISP-DM: Cross-industry standard process for data mining
Business understanding
• Business objective
• Assess situation
• Data mining goals
• Project plan
Data understanding
• Collect data
• Describe data
• Explore data
• Verify data quality
Data Preparation
• Select data
• Clean data
• Construct data
• Integrate data
• Format data
Modeling
• Select modeling
techniques
• Design the test
• Build model
• Assess model
Evaluation
• Evaluate results
• Review process
• Determine next steps
Deployment
• Plan deployment
• Plan monitoring and
maintenance
• Final report
• Review project
©2020 Republic Polytechnic
CRISP-DM
27
What is CRISP-DM?
• Cross Industry Standard Process for Data
Mining (CRISP-DM) is a methodology
that describes the approach use in
tackling data mining problems.
[http://www.crisp-dm.org/]
• CRISP-DM allow data analytics
practitioners to follow a systematic
process in generating an analytics
solution that is:
1. Well-understood
2. Well-planned
3. Well-executed
4. Well-documented
©2020 Republic Polytechnic
General Data-Mining Process
Data-mining process comprises the following steps:
Data Preparation
• Data Sampling:
Extract a sample
of data that is
relevant to the
business problem
under
consideration.
• Data Preparation:
Manipulate the
data to put it in a
form suitable for
formal modeling.
Model Construction
• Apply the
appropriate data-
mining technique
(e.g. k-means,
classification trees)
to accomplish the
desired data-
mining task
(prediction,
classification,
clustering, etc.).
Model Assessment
• Evaluate models
by comparing
performance on
appropriate data
sets.
• Decide on the
champion model.
28
©2020 Republic Polytechnic
29
Analytics Framework in a
Nutshell
1. Frame a sharp question to be answered (i.e. the
business question)
2. Identify the data and prepare it
3. Create models to answer the question
4. Interpret and rationalise the results
5. Consolidate findings and tell a story (i.e. present
findings)
©2020 Republic Polytechnic
Data
Data Data Everywhere
30
https://www.freepik.com/free-vector/sketchy-robot_794262.htm
©2020 Republic Polytechnic
31
Data Understanding &
Quality
Select useful inputs
Before any analytics adventure, the analyst must
have a clear understanding of the data:
• What each field/variable means
• Where did the data come from
• When data was saved (i.e. data frequency and
latency)
• How the data was created or collected
Quality of Data is Critical
• No quality data, no quality results
e.g. duplicate data may cause incorrect or
misleading statistics
©2020 Republic Polytechnic
Data Preparation
32
Major Tasks in Data Preparation:
1. Data cleaning
2. Data integration
3. Data transformation
4. Data reduction
Expansion of tasks:
• Sampling: select a representative subset
from a large population of data
• Outlier data: investigate and accord
appropriate treatment of the data
• Missing data: investigate and have
strategies to handle this issue
• Normalisation or standardisation data
©2020 Republic Polytechnic
33
Data Preparation
Select useful inputs
Preparing data for analytics work is very time
consuming.
At least 70% of time, in an analytics project, will
be spent on data understanding, cleaning and
preparation.
Image Source: https://pixabay.com/en/pie-chart-pacman-portion-shape-27359/
70%
©2020 Republic Polytechnic
Supervised
Learning
Make a Prediction
34
https://www.freepik.com/index.php?goto=74&idfoto=3043535
©2020 Republic Polytechnic
Supervised Learning
35
Predictive Analytics (PA):
• Finding a pattern (from historical data) so
that an opportunity outcome can be
identified before it occurred.
• PA is a supervised learning, where a
target (i.e. the data we want to predict) is
required.
• A supervised learning algorithm analyses
the historical (i.e. training) data and
produces an inferred function, which can
be used for mapping new examples (i.e.
predictions).
©2020 Republic Polytechnic
36
Two Prediction Types
estimates
decisions
inputs prediction
A predictive model
uses input
measurements
to make the best
decision for each
case.
prediction
primary
secondary
secondary
primary
tertiary
A predictive model
uses input
measurements
to optimally estimate
the target value.
prediction
0.65
0.33
0.75
0.28
0.54
Decision Predictions Estimate Predictions
©2020 Republic Polytechnic
37
Predictive Modeling Overview
Data
Training
Data
Testing
Data
Model A
Model B
Model C
Model D Model D is the
champion model
Training data
creates model
Test data
tests model
©2020 Republic Polytechnic
38
Data Partitioning
• This data partitioning distribution is a Rule of Thumb
• Generally, the Training dataset is bigger than Validation
dataset. And Test dataset is smaller than modeling dataset.
70% 15% 15%
Full Dataset
Dataset for
Modeling
Dataset to
Assess Model
©2020 Republic Polytechnic
39
The Curse of Dimensionality
1–D
2–D
3–D
©2020 Republic Polytechnic
40
Model Complexity
Too flexible
Just right
©2020 Republic Polytechnic
41
Model Performance Assessment
and Selection
5
4
2
1
5
4
3
2
1
Training Data Validation Data
Model
Complexity
Validation
Assessment
Select the simplest model
with the highest validation
assessment.
inputs target inputs target
©2020 Republic Polytechnic
42
Accuracy:
Overall, how often is the classifier correct?
(TP+TN)/(TP+TN+FP+FN)
Misclassification Rate or Error Rate:
Overall, how often is the classifier wrong?
(FP+FN)/(TP+TN+FP+FN) {or equivalent to 1 minus Accuracy}
Sensitivity, Recall, or True Positive Rate:
When it's actually YES, how often does it predict YES?
TP/(TP+FN)
Specificity:
When it's actually NO, how often does it predict NO?
TN/(TN+FP)
Precision:
When it predicts YES, how often is it correct?
TP/(TP+FP)
Prevalence:
How often does the YES condition actually occur in our sample?
(TP+FN)/(TP+TN+FP+FN)
Confusion
Matrix Rates
©2020 Republic Polytechnic
Supervised Learning
43
Determining the target’s datatype is
important, as it will affect the choice of
algorithms.
Target can be:
• Classification
• Binary
• Multiclass
• Regression
Model assessment is dependant on the type
of target on hand.
Assessment can be:
• Classification
• Binary – Confusion Matrix
• Multiclass – F1 score [1]
• Regression – RMSE [2]
[1] F1 Score is not covered in SDBA programme
[2] Root mean square error (RMSE) metric is
not covered in SDBA programme
©2020 Republic Polytechnic
Algorithms
Models are created from…
algorithms
44
https://www.freepik.com/index.php?goto=74&idfoto=2782996
©2020 Republic Polytechnic
Supervised Learning
45
Decision Trees Algorithm
• Decision Trees can be used to predict a
categorical or a continuous target (called
regression trees in the latter case)
• Unlike logistic regression and neural
networks, no equations are estimated in
decision trees
• A tree structure of rules over the input
variables are used to classify or predict the
cases according to the target variable
• The rules are of an IF-THEN form – for
example:
If Risk = Low, then predict on-time payment of a loan
©2020 Republic Polytechnic
Supervised Learning
46
Algorithm: Regression (Logistic Regression)
• Regression is the attempt to explain the
variation in a dependent variable using the
variation in independent variables.
• If the independent variables sufficiently explain
the variation in the dependent variable, the
model can be used for prediction.
• There are many important research topics for
which the dependent variable is "limited."
• For example: whether or not a person smokes, or a
fraud is committed. For these the outcome is not
continuous or distributed normally.
• Logistic regression is a type of regression
analysis where the dependent variable is a
dummy variable: coded 0 (did not smoke) or
1(did smoke)
©2020 Republic Polytechnic
Supervised Learning
47
Algorithm: Neural Networks
• Neural networks are exceptionally good at
performing pattern recognition that are
very difficult to program using
conventional techniques.
• Programs that employ neural nets are
also capable of learning on their own and
adapting to changing conditions.
• Neural networks pattern recognition can
be achieved by using the Backpropagation
algorithm. The algorithm searches for
weight values that minimize the total
error of the network over the set of
training examples (i.e. training set).
©2020 Republic Polytechnic
48
Min-Max normalization
Min/Max normalization to [0,1]
40 2001 7
0 1
0 0.25 0.5 0.75 1
Min/Max normalization to [-1,1]
(where 0 is the central point)
1 7
0 1
-1 0.5 0 0.5 1
©2020 Republic Polytechnic
49
Choosing
Champion Model
• Models created using various
algorithms will invariably produce
different results.
• Model assessment is required to
determine the which of the many
models create is the champion
model.
• ROC chart can be used to
determine the champion. Other
model assessment measurement
can also be used (e.g. Confusion
Matrix, RMSE).
©2020 Republic Polytechnic
50
• Training data includes both the input (i.e.
independent variables) and the desired results (i.e.
dependent variable or target).
• Predictive models are constructed using the training
data.
• Testing data includes both the input and known
target.
• A model’s results from the test data will ascertain its
predictive prowess.
• A good model will be able to generalise. It will give
correct results when new input data are given
without knowing the target.
Recap: Supervised Learning
©2020 Republic Polytechnic
51
Machine Learning Algorithms
Source:
https://s3.amazonaws.com/MLMastery/MachineLearningAlgorithms.png?__s=
yxwb9fsmnfj72ypjei1f
©2020 Republic Polytechnic
Unsupervised
Learning
Something is telling us…
52
©2020 Republic Polytechnic
Unsupervised
Learning
“Tell me what you see”
53
https://www.freepik.com/index.php?goto=74&idfoto=945899
©2020 Republic Polytechnic
54
• The model is not provided with the correct results (i.e.
target) during the training. In other words, there is no
target to aim for.
• The aim is to explore the data to find some intrinsic
structures in them.
• Model is the results of their statistical or mathematical
results only.
• Interpretation of the results from the unsupervised
learning is still done by humans.
• Unsupervised learning is unlike supervised learning,
there is no correct answers (i.e. no target to compare
against). Algorithms are left to their own devises to
discover and present the interesting structure in the
data for humans to interpret.
Unsupervised Learning
©2020 Republic Polytechnic
Unsupervised Learning
55
Algorithm: Association Analysis
• Association Rule:
Given a set of transactions, find rules that
will predict the occurrence of an item
based on the occurrences of other items
in the transaction. Collectively these
items coupling is called, itemset.
• Rule Evaluation Metrics:
Support and Confidence calculations will
give an indication of the itemset status.
• Commonly used algorithm for association
analysis is Apriori principle.
©2020 Republic Polytechnic
Unsupervised Learning
56
Algorithm: Cluster Analysis
• Cluster analysis is used to segment (i.e.
group) data objects without any
instructions or target.
• Data objects within a group are similar (or
related) to one another and different
from (or unrelated to) the data objects in
other groups.
• Cluster analysis constructs a partition of a
set of n records into a set of k clusters
• Each record belongs to exactly one
cluster
• The number of clusters k is given in
advance
• Commonly used algorithm for clustering
is the k-means.
©2020 Republic Polytechnic
57
Beyond the
module
demonstrations
Data Mining Tools
©2020 Republic Polytechnic
Summary and
Conclusion
58
©2020 Republic Polytechnic
59
Machine Learning
Select useful inputs
• Data Mining/Predictive Analytics is a subset of
Machine Learning.
• Machine learning is a field of computer science
that gives computers the ability to learn without
being explicitly programmed.[1]
[1] Samuel, Arthur (1959). "Some Studies in Machine Learning Using the Game of Checkers"
©2020 Republic Polytechnic
60
Data Mining
Select useful inputs
• Data Mining is about automating the process of
searching for patterns in the data.
• Two types of Machine Learning:
• Supervised
• Unsupervised
• In supervised learning, a good model will be able
to generalise. It will give correct results when
new input data are given without knowing the
target.
• In unsupervised learning, interpretation of the
results from the unsupervised learning is still
done by humans.
©2020 Republic Polytechnic
61
Proof is in the Pudding
Select useful inputs
• A model is only as good as its test results
(i.e. from model assessment)
• A model must give better prediction than
the population’s probability to be useful.
• The best model is when it stood the test
after deployment to the real-world.
©2020 Republic Polytechnic
The Analytics
Landscape
The Big Picture View
62
https://www.freepik.com/index.php?goto=74&idfoto=2783060
©2020 Republic Polytechnic
63
Analytics Use within 3 Years
Source: Operationalizing and Embedding Analytics for Action by Fern Halper. TDWI Research.
©2020 Republic Polytechnic
64
Transform with Predictive
Insights
Source: SAP (www.sap.com/predictive)
©2020 Republic Polytechnic
65
An Analytics Architecture
©2020 Republic Polytechnic
66
An Analytics Architecture
©2020 Republic Polytechnic
67
The Analytics Challenges
Source: Operationalizing and Embedding Analytics for Action by Fern Halper.
TDWI Research.
©2020 Republic Polytechnic
Conclusion and
Reflection
What is the future of
data analytics?
68
©2020 Republic Polytechnic
69
Why smart statistics are the key to fighting crime
by Anne Milgram at TED@BCG
https://www.youtube.com/watch?v=ZJNESMhIxQ0
What is the Cambridge Analytica scandal?
by The Guardian
https://www.youtube.com/watch?v=Q91nvbJSmS4
Real-World Predictive Analytics
in Action
©2020 Republic Polytechnic
70
The Analytics Challenges
Source: https://mashable.com/2017/04/27/man-tweets-pie-charts/
ThankYou
rudy_ridwen@rp.edu.sg
@rudyridwen
@rudyridwen
@rudy.ridwen
C3249C - Data Mining and
Predictive Analytics
SpecialistDiplomainBusinessAnalytics(SDBA)
Lesson 14 – Concepts Recapitulation and
Conclusions: The Penultimate Lesson
6th June 2019
Rudy Ridwen
school•of•inforcomm
republic•polytechnic
©2020 Republic Polytechnic
2
Why smart statistics are the key to fighting crime
by Anne Milgram at TED@BCG
https://www.youtube.com/watch?v=ZJNESMhIxQ0
What is the Cambridge Analytica scandal?
by The Guardian
https://www.youtube.com/watch?v=Q91nvbJSmS4
Real-World Predictive Analytics
in Action

Contenu connexe

Tendances (20)

Cluster analysis
Cluster analysisCluster analysis
Cluster analysis
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Data science applications and usecases
Data science applications and usecasesData science applications and usecases
Data science applications and usecases
 
Scaling and Normalization
Scaling and NormalizationScaling and Normalization
Scaling and Normalization
 
OLAP operations
OLAP operationsOLAP operations
OLAP operations
 
Knowledge Discovery and Data Mining
Knowledge Discovery and Data MiningKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Ppt on data science
Ppt on data science Ppt on data science
Ppt on data science
 
ML - Multiple Linear Regression
ML - Multiple Linear RegressionML - Multiple Linear Regression
ML - Multiple Linear Regression
 
Back propagation
Back propagationBack propagation
Back propagation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
K Nearest Neighbors
K Nearest NeighborsK Nearest Neighbors
K Nearest Neighbors
 
Predictive Modelling
Predictive ModellingPredictive Modelling
Predictive Modelling
 
Machine Learning-Linear regression
Machine Learning-Linear regressionMachine Learning-Linear regression
Machine Learning-Linear regression
 
ID3 ALGORITHM
ID3 ALGORITHMID3 ALGORITHM
ID3 ALGORITHM
 
Statistics for data science
Statistics for data science Statistics for data science
Statistics for data science
 
Decision Trees
Decision TreesDecision Trees
Decision Trees
 
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
Linear Regression Analysis | Linear Regression in Python | Machine Learning A...
 
ML - Simple Linear Regression
ML - Simple Linear RegressionML - Simple Linear Regression
ML - Simple Linear Regression
 
Introduction to Hadoop
Introduction to HadoopIntroduction to Hadoop
Introduction to Hadoop
 

Similaire à Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and Conclusions - The Penultimate Lesson - Mr Rudy Ridwen

Elementary Data Analysis with MS excel_Day-1
Elementary Data Analysis with MS excel_Day-1Elementary Data Analysis with MS excel_Day-1
Elementary Data Analysis with MS excel_Day-1Redwan Ferdous
 
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...Tao Xie
 
Assocham global conference audit data standards - 28.10.2020
Assocham global conference   audit data standards - 28.10.2020Assocham global conference   audit data standards - 28.10.2020
Assocham global conference audit data standards - 28.10.2020Vinod Kashyap
 
Machine learning will transform how we deliver projects
Machine learning will transform how we deliver projectsMachine learning will transform how we deliver projects
Machine learning will transform how we deliver projectsPMIUKChapter
 
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBig Data Value Association
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsDATAVERSITY
 
Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)GICTTraining
 
BIG DATA ANALYTICS-2.pptx
BIG DATA ANALYTICS-2.pptxBIG DATA ANALYTICS-2.pptx
BIG DATA ANALYTICS-2.pptxAliyaanRaahiL
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.ZaraaTitima1
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy Hussain Sultan
 
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...BigData_Europe
 
Building the Cognitive Era : Big Data Strategies
Building the Cognitive Era : Big Data StrategiesBuilding the Cognitive Era : Big Data Strategies
Building the Cognitive Era : Big Data StrategiesKevin Sigliano
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Formulatedby
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Yaman Hajja, Ph.D.
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing cosma_r
 
Modern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and ImplementationsModern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and ImplementationsDavid J Rosenthal
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?DATAVERSITY
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?Denodo
 

Similaire à Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and Conclusions - The Penultimate Lesson - Mr Rudy Ridwen (20)

Elementary Data Analysis with MS excel_Day-1
Elementary Data Analysis with MS excel_Day-1Elementary Data Analysis with MS excel_Day-1
Elementary Data Analysis with MS excel_Day-1
 
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
MSR 2022 Foundational Contribution Award Talk: Software Analytics: Reflection...
 
Assocham global conference audit data standards - 28.10.2020
Assocham global conference   audit data standards - 28.10.2020Assocham global conference   audit data standards - 28.10.2020
Assocham global conference audit data standards - 28.10.2020
 
Machine learning will transform how we deliver projects
Machine learning will transform how we deliver projectsMachine learning will transform how we deliver projects
Machine learning will transform how we deliver projects
 
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector WebinarBigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
BigDataPilotDemoDays - I-BiDaaS Application to the Financial Sector Webinar
 
Building a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business GoalsBuilding a Data Strategy – Practical Steps for Aligning with Business Goals
Building a Data Strategy – Practical Steps for Aligning with Business Goals
 
Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)Certified Big Data Science Analyst (CBDSA)
Certified Big Data Science Analyst (CBDSA)
 
BIG DATA ANALYTICS-2.pptx
BIG DATA ANALYTICS-2.pptxBIG DATA ANALYTICS-2.pptx
BIG DATA ANALYTICS-2.pptx
 
DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.DATA BI: put key insights at the finger tip of decision makers.
DATA BI: put key insights at the finger tip of decision makers.
 
How to make your data scientists happy
How to make your data scientists happy How to make your data scientists happy
How to make your data scientists happy
 
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
SC6 Workshop 1: Big Data Europe platform requirements and draft architecture:...
 
Building the Cognitive Era : Big Data Strategies
Building the Cognitive Era : Big Data StrategiesBuilding the Cognitive Era : Big Data Strategies
Building the Cognitive Era : Big Data Strategies
 
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
 
Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)Big Data & Analytics (Conceptual and Practical Introduction)
Big Data & Analytics (Conceptual and Practical Introduction)
 
Big data Analytics
Big data AnalyticsBig data Analytics
Big data Analytics
 
Data is not the new snake oil
Data is not the new snake oilData is not the new snake oil
Data is not the new snake oil
 
Bitrock manufacturing
Bitrock manufacturing Bitrock manufacturing
Bitrock manufacturing
 
Modern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and ImplementationsModern Business Intelligence - Design and Implementations
Modern Business Intelligence - Design and Implementations
 
Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?Emerging Trends in Data Architecture – What’s the Next Big Thing?
Emerging Trends in Data Architecture – What’s the Next Big Thing?
 
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?¿En qué se parece el Gobierno del Dato a un parque de atracciones?
¿En qué se parece el Gobierno del Dato a un parque de atracciones?
 

Plus de Michael Lew

Big Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson ScheduleBig Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson ScheduleMichael Lew
 
ICDL Computer Fundamentals (MS Windows 10 & Office 2016)
ICDL Computer Fundamentals (MS Windows 10 & Office 2016)ICDL Computer Fundamentals (MS Windows 10 & Office 2016)
ICDL Computer Fundamentals (MS Windows 10 & Office 2016)Michael Lew
 
ICDL Image Editing (GIMP)
ICDL Image Editing (GIMP)ICDL Image Editing (GIMP)
ICDL Image Editing (GIMP)Michael Lew
 
Web browsing and communication using Outlook
Web browsing and communication using OutlookWeb browsing and communication using Outlook
Web browsing and communication using OutlookMichael Lew
 
Online collaboration
Online collaborationOnline collaboration
Online collaborationMichael Lew
 
Secure Use of IT
Secure Use of ITSecure Use of IT
Secure Use of ITMichael Lew
 
Scenario (Evaluation)
Scenario (Evaluation)Scenario (Evaluation)
Scenario (Evaluation)Michael Lew
 
Manage online information
Manage online informationManage online information
Manage online informationMichael Lew
 
CE Diagnostic answers
CE Diagnostic answersCE Diagnostic answers
CE Diagnostic answersMichael Lew
 
OE Diagnostic Test Questions
OE Diagnostic Test QuestionsOE Diagnostic Test Questions
OE Diagnostic Test QuestionsMichael Lew
 
ICDL Module 2 - Using Computers & Managing Files (Windows XP) - Presentation ...
ICDL Module 2 - Using Computers & Managing Files (Windows XP) - Presentation ...ICDL Module 2 - Using Computers & Managing Files (Windows XP) - Presentation ...
ICDL Module 2 - Using Computers & Managing Files (Windows XP) - Presentation ...Michael Lew
 
ICDL Advanced Excel 2010 - Tutorial
ICDL Advanced Excel 2010 - TutorialICDL Advanced Excel 2010 - Tutorial
ICDL Advanced Excel 2010 - TutorialMichael Lew
 
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...Michael Lew
 
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...Michael Lew
 
Ecdl v5 module 7 print
Ecdl v5 module 7 printEcdl v5 module 7 print
Ecdl v5 module 7 printMichael Lew
 
Ecdl v5 module 6 print
Ecdl v5 module 6 printEcdl v5 module 6 print
Ecdl v5 module 6 printMichael Lew
 
Ecdl v5 module 5 print
Ecdl v5 module 5 printEcdl v5 module 5 print
Ecdl v5 module 5 printMichael Lew
 
Ecdl v5 module 4 print
Ecdl v5 module 4 printEcdl v5 module 4 print
Ecdl v5 module 4 printMichael Lew
 
Ecdl v5 module 3 print
Ecdl v5 module 3 printEcdl v5 module 3 print
Ecdl v5 module 3 printMichael Lew
 

Plus de Michael Lew (20)

Big Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson ScheduleBig Data & Text Analytics - Lesson Schedule
Big Data & Text Analytics - Lesson Schedule
 
ICDL Computer Fundamentals (MS Windows 10 & Office 2016)
ICDL Computer Fundamentals (MS Windows 10 & Office 2016)ICDL Computer Fundamentals (MS Windows 10 & Office 2016)
ICDL Computer Fundamentals (MS Windows 10 & Office 2016)
 
ICDL Image Editing (GIMP)
ICDL Image Editing (GIMP)ICDL Image Editing (GIMP)
ICDL Image Editing (GIMP)
 
Web browsing and communication using Outlook
Web browsing and communication using OutlookWeb browsing and communication using Outlook
Web browsing and communication using Outlook
 
Online collaboration
Online collaborationOnline collaboration
Online collaboration
 
Secure Use of IT
Secure Use of ITSecure Use of IT
Secure Use of IT
 
Scenario (Evaluation)
Scenario (Evaluation)Scenario (Evaluation)
Scenario (Evaluation)
 
Manage online information
Manage online informationManage online information
Manage online information
 
CE Diagnostic answers
CE Diagnostic answersCE Diagnostic answers
CE Diagnostic answers
 
OE Diagnostic Test Questions
OE Diagnostic Test QuestionsOE Diagnostic Test Questions
OE Diagnostic Test Questions
 
ICDL Module 2 - Using Computers & Managing Files (Windows XP) - Presentation ...
ICDL Module 2 - Using Computers & Managing Files (Windows XP) - Presentation ...ICDL Module 2 - Using Computers & Managing Files (Windows XP) - Presentation ...
ICDL Module 2 - Using Computers & Managing Files (Windows XP) - Presentation ...
 
ICDL Advanced Excel 2010 - Tutorial
ICDL Advanced Excel 2010 - TutorialICDL Advanced Excel 2010 - Tutorial
ICDL Advanced Excel 2010 - Tutorial
 
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
 
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
ICDL Module 1 - Concepts of ICT (Information and Communication Technology) - ...
 
ICT Blog1
ICT Blog1ICT Blog1
ICT Blog1
 
Ecdl v5 module 7 print
Ecdl v5 module 7 printEcdl v5 module 7 print
Ecdl v5 module 7 print
 
Ecdl v5 module 6 print
Ecdl v5 module 6 printEcdl v5 module 6 print
Ecdl v5 module 6 print
 
Ecdl v5 module 5 print
Ecdl v5 module 5 printEcdl v5 module 5 print
Ecdl v5 module 5 print
 
Ecdl v5 module 4 print
Ecdl v5 module 4 printEcdl v5 module 4 print
Ecdl v5 module 4 print
 
Ecdl v5 module 3 print
Ecdl v5 module 3 printEcdl v5 module 3 print
Ecdl v5 module 3 print
 

Dernier

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Pooja Bhuva
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the ClassroomPooky Knightsmith
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 

Dernier (20)

Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
Sensory_Experience_and_Emotional_Resonance_in_Gabriel_Okaras_The_Piano_and_Th...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Fostering Friendships - Enhancing Social Bonds in the Classroom
Fostering Friendships - Enhancing Social Bonds  in the ClassroomFostering Friendships - Enhancing Social Bonds  in the Classroom
Fostering Friendships - Enhancing Social Bonds in the Classroom
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 

Data Mining & Predictive Analytics - Lesson 14 - Concepts Recapitulation and Conclusions - The Penultimate Lesson - Mr Rudy Ridwen

  • 1. C3249C - Data Mining and Predictive Analytics SpecialistDiplomainBusinessAnalytics(SDBA) Lesson 14 – Concepts Recapitulation and Conclusions: The Penultimate Lesson 6th June 2019 Rudy Ridwen school•of•inforcomm republic•polytechnic
  • 2. ©2020 Republic Polytechnic Data Mining Methodologies A process guide for analytics projects 2
  • 3. ©2020 Republic Polytechnic Analytics Why “Analytics”? 3https://www.freepik.com/free-vector/mechanical- brain_769574.htm#term=sketch&page=5&position=2
  • 4. ©2020 Republic Polytechnic Data Never Sleeps 4 Source: Domo, Inc.
  • 5. ©2020 Republic Polytechnic Data Never Sleeps 5 Source: Domo, Inc. “By 2025, it’s estimated that 463 exabytes of data will be created each day globally – that’s the equivalent of 212,765,957 DVDs per day!” World Economic Forum, 2019 NB: • a Gigabyte (GB) is 1,000 Megabytes (MB); • a Terabyte (TB) is 1,000 Gigabytes; • a Petabyte (PB) is 1,000 Gigabytes; • an Exabyte (EB) is 1,000 Petabytes
  • 6. ©2020 Republic Polytechnic Data-Driven Innovation 6 “Data is a resource, much like water or energy, and like any resource, data does nothing on its own. Rather, it is world-changing in how it is employed in human decision making.” Justin Hienz Owner of Cogent Writing, LLC
  • 7. ©2020 Republic Polytechnic Data is the New Oil 7 “Data is the new oil." Coined in 2006 by British Mathematician, Clive Humby. This now famous phrase was embraced by the World Economic Forum in a 2011 report.
  • 8. ©2020 Republic Polytechnic Data: the Basis of Everything Cloud Applications (e.g. social media) • Pervasive digitization exploded the amount of data being created and collected. • This provide the opportunity to make use of the data to gain insights and to make better decision. Social Needs Environment Studies Public Services Company Operations 8
  • 9. ©2020 Republic Polytechnic From Data to Wisdom 9 ©2018 Republic Polytechnic
  • 10. ©2020 Republic Polytechnic Decision Making with Analytics 10 Analytics can overcome human limitations to improve the speed, accuracy, consistency, and transparency of decisions.
  • 11. ©2020 Republic Polytechnic 11 Analytics 1.0 – the era of “business intelligence” • This was the era of the enterprise data warehouse, used to capture information, and of business intelligence software, used to query and report it. Analytics 2.0 – the era of big data • Analytics 2.0 employed next-generation quantitative analysts were called data scientists, and they possessed both computational and analytical skills. Analytics 3.0 – the era of data-enriched offerings • Analytics 3.0 creates products and services from analyses of data. Since every digital activity leaves a trail, it provide the ability to embed analytics and optimization into every business decision made at the operation front lines. The Evolution of Analytics
  • 12. ©2020 Republic Polytechnic 12 Business Analytics: • Mathematical and statistical process of transforming data into insight for making better decisions. • The data-driven analytics insights are used as a complement to the decision maker’s experience and “gut-feel”. Business Analytics - Defined
  • 13. ©2020 Republic Polytechnic 2018 Gartner Magic Quadrant for BI and Analytics 13
  • 14. ©2020 Republic Polytechnic 2019 Gartner Magic Quadrant for BI and Analytics 14
  • 15. ©2020 Republic Polytechnic Spectrum of Business Analytics 15
  • 16. ©2020 Republic Polytechnic Achieving Success with Business Analytics 16 CompetitiveAdvantage Basic Reporting What happened? Ad Hoc Reporting How many, how often, where? Dynamic Reporting Where exactly are the problems? Reporting with Early Warning What actions are needed? Basic Statistical Analysis Why is this happening? Forecasting What if these trends continue? Predictive Modeling What will happen next? Decision Optimization What is the best decision? Data Information Intelligence Advanced Analytics Basic Analytics Reporting Decision Support Decision Guidance
  • 17. ©2020 Republic Polytechnic 17 Planning from the Top Down Analyze suitable analytics or modeling that can answer the business questions? Define mission-critical business questions that must be answered. Identify data do you have that that can help to build the model.
  • 18. ©2020 Republic Polytechnic 18 Planning from the Bottom Up Determine suitable analytics or modeling can be done using the available data. Suggest business problem that can be solved using analytics. Identify the data that you have.
  • 19. ©2020 Republic Polytechnic 19 Data Mining: • Finding patterns or relationships among elements of the data. [unsupervised and supervised learning] Predictive Analytics: • Finding a pattern (from historical data) so that an opportunity outcome can be identified before it occurred. [supervised learning] Business Analytics
  • 20. ©2020 Republic Polytechnic Analytics Expertise Required for Success Domain Knowledge Intimate knowledge of related industry critical to analytics project success. Data Availability Data always impose the constraints of analytics Analytical Methods and Principles Data Analytics Skills 20
  • 21. ©2020 Republic Polytechnic 21 Deployment and use of BA: • Financial analytics • Human resource (HR) analytics • Marketing analytics • Health care analytics • Supply chain analytics • Analytics for government and non-profits • Sports analytics • Web and Social Media analytics Business Analytics in Practice
  • 22. ©2020 Republic Polytechnic Analytics Frameworks The Process of an Analytics Project 22https://www.freepik.com/free-vector/vintage-aircraft- illustration_3043533.htm#term=sketch&page=11&position=12
  • 23. ©2020 Republic Polytechnic 23 Several methodologies Data Mining have been developed, each with their own perspective. The popular methodologies are: • SEMMA (SAS) • SAS Enterprise Miner • Fayyad et al. (Computer science) • WEKA • CRISP-DM (IBM) • SPSS Modeler Methodologies for Data Mining
  • 24. ©2020 Republic Polytechnic SEMMA Methodology 24 Supported by SAS Enterprise Mining environment SAMPLE Input data, Sampling, Data partition EXPLORE Distribution explorer, Multiplot, Insight, Association, Variable selection MODEL Regression, Tree, Neural Network, Ensemble MODIFY Transform variable, Filter outliers, Clustering, SOM / Kohonen ASSESS Assessment, Score, Report
  • 25. ©2020 Republic Polytechnic Fayyad’s KDD Methodology 25 KDD: knowledge discovery and data mining data Target data Processed data Transformed data Patterns Knowledge Selection Preprocessing & cleaning Transformation & feature selection Data Mining Interpretation Evaluation Reproduced from: maastrichtuniversity.nl lecture notes
  • 26. ©2020 Republic Polytechnic CRISP-DM Methodology 26 CRISP-DM: Cross-industry standard process for data mining Business understanding • Business objective • Assess situation • Data mining goals • Project plan Data understanding • Collect data • Describe data • Explore data • Verify data quality Data Preparation • Select data • Clean data • Construct data • Integrate data • Format data Modeling • Select modeling techniques • Design the test • Build model • Assess model Evaluation • Evaluate results • Review process • Determine next steps Deployment • Plan deployment • Plan monitoring and maintenance • Final report • Review project
  • 27. ©2020 Republic Polytechnic CRISP-DM 27 What is CRISP-DM? • Cross Industry Standard Process for Data Mining (CRISP-DM) is a methodology that describes the approach use in tackling data mining problems. [http://www.crisp-dm.org/] • CRISP-DM allow data analytics practitioners to follow a systematic process in generating an analytics solution that is: 1. Well-understood 2. Well-planned 3. Well-executed 4. Well-documented
  • 28. ©2020 Republic Polytechnic General Data-Mining Process Data-mining process comprises the following steps: Data Preparation • Data Sampling: Extract a sample of data that is relevant to the business problem under consideration. • Data Preparation: Manipulate the data to put it in a form suitable for formal modeling. Model Construction • Apply the appropriate data- mining technique (e.g. k-means, classification trees) to accomplish the desired data- mining task (prediction, classification, clustering, etc.). Model Assessment • Evaluate models by comparing performance on appropriate data sets. • Decide on the champion model. 28
  • 29. ©2020 Republic Polytechnic 29 Analytics Framework in a Nutshell 1. Frame a sharp question to be answered (i.e. the business question) 2. Identify the data and prepare it 3. Create models to answer the question 4. Interpret and rationalise the results 5. Consolidate findings and tell a story (i.e. present findings)
  • 30. ©2020 Republic Polytechnic Data Data Data Everywhere 30 https://www.freepik.com/free-vector/sketchy-robot_794262.htm
  • 31. ©2020 Republic Polytechnic 31 Data Understanding & Quality Select useful inputs Before any analytics adventure, the analyst must have a clear understanding of the data: • What each field/variable means • Where did the data come from • When data was saved (i.e. data frequency and latency) • How the data was created or collected Quality of Data is Critical • No quality data, no quality results e.g. duplicate data may cause incorrect or misleading statistics
  • 32. ©2020 Republic Polytechnic Data Preparation 32 Major Tasks in Data Preparation: 1. Data cleaning 2. Data integration 3. Data transformation 4. Data reduction Expansion of tasks: • Sampling: select a representative subset from a large population of data • Outlier data: investigate and accord appropriate treatment of the data • Missing data: investigate and have strategies to handle this issue • Normalisation or standardisation data
  • 33. ©2020 Republic Polytechnic 33 Data Preparation Select useful inputs Preparing data for analytics work is very time consuming. At least 70% of time, in an analytics project, will be spent on data understanding, cleaning and preparation. Image Source: https://pixabay.com/en/pie-chart-pacman-portion-shape-27359/ 70%
  • 34. ©2020 Republic Polytechnic Supervised Learning Make a Prediction 34 https://www.freepik.com/index.php?goto=74&idfoto=3043535
  • 35. ©2020 Republic Polytechnic Supervised Learning 35 Predictive Analytics (PA): • Finding a pattern (from historical data) so that an opportunity outcome can be identified before it occurred. • PA is a supervised learning, where a target (i.e. the data we want to predict) is required. • A supervised learning algorithm analyses the historical (i.e. training) data and produces an inferred function, which can be used for mapping new examples (i.e. predictions).
  • 36. ©2020 Republic Polytechnic 36 Two Prediction Types estimates decisions inputs prediction A predictive model uses input measurements to make the best decision for each case. prediction primary secondary secondary primary tertiary A predictive model uses input measurements to optimally estimate the target value. prediction 0.65 0.33 0.75 0.28 0.54 Decision Predictions Estimate Predictions
  • 37. ©2020 Republic Polytechnic 37 Predictive Modeling Overview Data Training Data Testing Data Model A Model B Model C Model D Model D is the champion model Training data creates model Test data tests model
  • 38. ©2020 Republic Polytechnic 38 Data Partitioning • This data partitioning distribution is a Rule of Thumb • Generally, the Training dataset is bigger than Validation dataset. And Test dataset is smaller than modeling dataset. 70% 15% 15% Full Dataset Dataset for Modeling Dataset to Assess Model
  • 39. ©2020 Republic Polytechnic 39 The Curse of Dimensionality 1–D 2–D 3–D
  • 40. ©2020 Republic Polytechnic 40 Model Complexity Too flexible Just right
  • 41. ©2020 Republic Polytechnic 41 Model Performance Assessment and Selection 5 4 2 1 5 4 3 2 1 Training Data Validation Data Model Complexity Validation Assessment Select the simplest model with the highest validation assessment. inputs target inputs target
  • 42. ©2020 Republic Polytechnic 42 Accuracy: Overall, how often is the classifier correct? (TP+TN)/(TP+TN+FP+FN) Misclassification Rate or Error Rate: Overall, how often is the classifier wrong? (FP+FN)/(TP+TN+FP+FN) {or equivalent to 1 minus Accuracy} Sensitivity, Recall, or True Positive Rate: When it's actually YES, how often does it predict YES? TP/(TP+FN) Specificity: When it's actually NO, how often does it predict NO? TN/(TN+FP) Precision: When it predicts YES, how often is it correct? TP/(TP+FP) Prevalence: How often does the YES condition actually occur in our sample? (TP+FN)/(TP+TN+FP+FN) Confusion Matrix Rates
  • 43. ©2020 Republic Polytechnic Supervised Learning 43 Determining the target’s datatype is important, as it will affect the choice of algorithms. Target can be: • Classification • Binary • Multiclass • Regression Model assessment is dependant on the type of target on hand. Assessment can be: • Classification • Binary – Confusion Matrix • Multiclass – F1 score [1] • Regression – RMSE [2] [1] F1 Score is not covered in SDBA programme [2] Root mean square error (RMSE) metric is not covered in SDBA programme
  • 44. ©2020 Republic Polytechnic Algorithms Models are created from… algorithms 44 https://www.freepik.com/index.php?goto=74&idfoto=2782996
  • 45. ©2020 Republic Polytechnic Supervised Learning 45 Decision Trees Algorithm • Decision Trees can be used to predict a categorical or a continuous target (called regression trees in the latter case) • Unlike logistic regression and neural networks, no equations are estimated in decision trees • A tree structure of rules over the input variables are used to classify or predict the cases according to the target variable • The rules are of an IF-THEN form – for example: If Risk = Low, then predict on-time payment of a loan
  • 46. ©2020 Republic Polytechnic Supervised Learning 46 Algorithm: Regression (Logistic Regression) • Regression is the attempt to explain the variation in a dependent variable using the variation in independent variables. • If the independent variables sufficiently explain the variation in the dependent variable, the model can be used for prediction. • There are many important research topics for which the dependent variable is "limited." • For example: whether or not a person smokes, or a fraud is committed. For these the outcome is not continuous or distributed normally. • Logistic regression is a type of regression analysis where the dependent variable is a dummy variable: coded 0 (did not smoke) or 1(did smoke)
  • 47. ©2020 Republic Polytechnic Supervised Learning 47 Algorithm: Neural Networks • Neural networks are exceptionally good at performing pattern recognition that are very difficult to program using conventional techniques. • Programs that employ neural nets are also capable of learning on their own and adapting to changing conditions. • Neural networks pattern recognition can be achieved by using the Backpropagation algorithm. The algorithm searches for weight values that minimize the total error of the network over the set of training examples (i.e. training set).
  • 48. ©2020 Republic Polytechnic 48 Min-Max normalization Min/Max normalization to [0,1] 40 2001 7 0 1 0 0.25 0.5 0.75 1 Min/Max normalization to [-1,1] (where 0 is the central point) 1 7 0 1 -1 0.5 0 0.5 1
  • 49. ©2020 Republic Polytechnic 49 Choosing Champion Model • Models created using various algorithms will invariably produce different results. • Model assessment is required to determine the which of the many models create is the champion model. • ROC chart can be used to determine the champion. Other model assessment measurement can also be used (e.g. Confusion Matrix, RMSE).
  • 50. ©2020 Republic Polytechnic 50 • Training data includes both the input (i.e. independent variables) and the desired results (i.e. dependent variable or target). • Predictive models are constructed using the training data. • Testing data includes both the input and known target. • A model’s results from the test data will ascertain its predictive prowess. • A good model will be able to generalise. It will give correct results when new input data are given without knowing the target. Recap: Supervised Learning
  • 51. ©2020 Republic Polytechnic 51 Machine Learning Algorithms Source: https://s3.amazonaws.com/MLMastery/MachineLearningAlgorithms.png?__s= yxwb9fsmnfj72ypjei1f
  • 53. ©2020 Republic Polytechnic Unsupervised Learning “Tell me what you see” 53 https://www.freepik.com/index.php?goto=74&idfoto=945899
  • 54. ©2020 Republic Polytechnic 54 • The model is not provided with the correct results (i.e. target) during the training. In other words, there is no target to aim for. • The aim is to explore the data to find some intrinsic structures in them. • Model is the results of their statistical or mathematical results only. • Interpretation of the results from the unsupervised learning is still done by humans. • Unsupervised learning is unlike supervised learning, there is no correct answers (i.e. no target to compare against). Algorithms are left to their own devises to discover and present the interesting structure in the data for humans to interpret. Unsupervised Learning
  • 55. ©2020 Republic Polytechnic Unsupervised Learning 55 Algorithm: Association Analysis • Association Rule: Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction. Collectively these items coupling is called, itemset. • Rule Evaluation Metrics: Support and Confidence calculations will give an indication of the itemset status. • Commonly used algorithm for association analysis is Apriori principle.
  • 56. ©2020 Republic Polytechnic Unsupervised Learning 56 Algorithm: Cluster Analysis • Cluster analysis is used to segment (i.e. group) data objects without any instructions or target. • Data objects within a group are similar (or related) to one another and different from (or unrelated to) the data objects in other groups. • Cluster analysis constructs a partition of a set of n records into a set of k clusters • Each record belongs to exactly one cluster • The number of clusters k is given in advance • Commonly used algorithm for clustering is the k-means.
  • 57. ©2020 Republic Polytechnic 57 Beyond the module demonstrations Data Mining Tools
  • 59. ©2020 Republic Polytechnic 59 Machine Learning Select useful inputs • Data Mining/Predictive Analytics is a subset of Machine Learning. • Machine learning is a field of computer science that gives computers the ability to learn without being explicitly programmed.[1] [1] Samuel, Arthur (1959). "Some Studies in Machine Learning Using the Game of Checkers"
  • 60. ©2020 Republic Polytechnic 60 Data Mining Select useful inputs • Data Mining is about automating the process of searching for patterns in the data. • Two types of Machine Learning: • Supervised • Unsupervised • In supervised learning, a good model will be able to generalise. It will give correct results when new input data are given without knowing the target. • In unsupervised learning, interpretation of the results from the unsupervised learning is still done by humans.
  • 61. ©2020 Republic Polytechnic 61 Proof is in the Pudding Select useful inputs • A model is only as good as its test results (i.e. from model assessment) • A model must give better prediction than the population’s probability to be useful. • The best model is when it stood the test after deployment to the real-world.
  • 62. ©2020 Republic Polytechnic The Analytics Landscape The Big Picture View 62 https://www.freepik.com/index.php?goto=74&idfoto=2783060
  • 63. ©2020 Republic Polytechnic 63 Analytics Use within 3 Years Source: Operationalizing and Embedding Analytics for Action by Fern Halper. TDWI Research.
  • 64. ©2020 Republic Polytechnic 64 Transform with Predictive Insights Source: SAP (www.sap.com/predictive)
  • 65. ©2020 Republic Polytechnic 65 An Analytics Architecture
  • 66. ©2020 Republic Polytechnic 66 An Analytics Architecture
  • 67. ©2020 Republic Polytechnic 67 The Analytics Challenges Source: Operationalizing and Embedding Analytics for Action by Fern Halper. TDWI Research.
  • 68. ©2020 Republic Polytechnic Conclusion and Reflection What is the future of data analytics? 68
  • 69. ©2020 Republic Polytechnic 69 Why smart statistics are the key to fighting crime by Anne Milgram at TED@BCG https://www.youtube.com/watch?v=ZJNESMhIxQ0 What is the Cambridge Analytica scandal? by The Guardian https://www.youtube.com/watch?v=Q91nvbJSmS4 Real-World Predictive Analytics in Action
  • 70. ©2020 Republic Polytechnic 70 The Analytics Challenges Source: https://mashable.com/2017/04/27/man-tweets-pie-charts/
  • 72. C3249C - Data Mining and Predictive Analytics SpecialistDiplomainBusinessAnalytics(SDBA) Lesson 14 – Concepts Recapitulation and Conclusions: The Penultimate Lesson 6th June 2019 Rudy Ridwen school•of•inforcomm republic•polytechnic
  • 73. ©2020 Republic Polytechnic 2 Why smart statistics are the key to fighting crime by Anne Milgram at TED@BCG https://www.youtube.com/watch?v=ZJNESMhIxQ0 What is the Cambridge Analytica scandal? by The Guardian https://www.youtube.com/watch?v=Q91nvbJSmS4 Real-World Predictive Analytics in Action