SlideShare a Scribd company logo
1 of 38
Download to read offline
data & content design
Frieda Brioschi - frieda.brioschi@gmail.com
Emma Tracanella - emma.tracanella@gmail.com
DATA MINING AND DATA AGGREGATION BASICS
LESSON 6 - 2020
DATA MINING
CLASSICAL
Photo by ev on Unsplash
data & content design
LESSON 5
CONTEXT
You don’t have to be a fancy statistician to do data mining, but you do
have to know something about what the data signifies and how the
business works.
Only when you understand the data and the problem that you need to
solve can data-mining processes help you to discover useful
information and put it to use.
3
data & content design
LESSON 5
NINE LAWS OF DATA MINING - 1
Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining”
to guide new data miners as they get down to work
▸ 1 - “Business Goals Law” 

Business objectives are the origin of every data mining solution.
A data miner is someone who discovers useful information from data to support
specific business goals. Data mining isn’t defined by the tool you use.
▸ 2 - “Business Knowledge Law”

Business Knowledge is central to every step of the data mining process.
You don’t have to be a fancy statistician to do data mining, but you do have to
know something about what the data signifies and how the business works.
4
data & content design
LESSON 5
NINE LAWS OF DATA MINING - 2
▸ 3. “Data Preparation Law”

Data preparation is more than half of every data mining process.
Pretty much every data miner will spend more time on data preparation than on
analysis.
▸ 4. “No Free Lunch for the Data Miner”

The right model for a given application can only be discovered by experiment.
In data mining, models are selected through trial and error.
▸ 5 - “Patterns”

There are always patterns in the data.
As a data miner, you explore data in search of useful patterns. Understanding patterns
in the data enables you to influence what happens in the future.
5
data & content design
LESSON 5
NINE LAWS OF DATA MINING - 3
▸ 6.  “Insight Law”

Data mining amplifies perception in the business domain.
Data mining methods enable you to understand your business better than you
could have done without them.
▸ 7 - “Prediction Law”

Prediction increases information locally by generalization.
Data mining helps us use what we know to make better predictions (or
estimates) of things we don’t know.
6
data & content design
LESSON 5
NINE LAWS OF DATA MINING - 4
▸ 8. “Value Law”

The value of data mining results is not determined by the accuracy or stability
of predictive models.
Your model must produce good predictions, consistently. That’s it.
▸ 9. “Law of Change”

All patterns are subject to change.
Any model that gives you great predictions today may be useless tomorrow.
7
data & content design
LESSON 5
PHASES OF THE DATA MINING PROCESS
The Cross-Industry Standard Process for
Data Mining (CRISP-DM) is the dominant
data-mining process framework. It’s an
open standard; anyone may use it.
8
data & content design
LESSON 5
BUSINESS UNDERSTANDING
Get a clear understanding of the problem you’re out to solve, how it impacts your
organization, and your goals for addressing it.
Tasks in this phase include:
▸ Identifying your business goals
▸ Assessing your situation
▸ Defining your data mining goals
▸ Producing your project plan
9
data & content design
LESSON 5
DATA UNDERSTANDING
Review the data that you have, document it, identify data management and data quality
issues.
Tasks in this phase include:
▸ Gathering data
▸ Describing
▸ Exploring
▸ Verifying quality
10
data & content design
LESSON 5
DATA PREPARATION
Get your data ready to use for modeling.
Tasks in this phase include:
▸ Selecting data
▸ Cleaning data
▸ Constructing
▸ Integrating
▸ Formatting
11
data & content design
LESSON 5
MODELING
Use mathematical techniques to identify patterns within your data.
Tasks in this phase include:
▸ Selecting techniques
▸ Designing tests
▸ Building models
▸ Assessing models
12
data & content design
LESSON 5
EVALUATION
Review the patterns you have discovered and assess their potential for business
use.
Tasks in this phase include:
▸ Evaluating results
▸ Reviewing the process
▸ Determining the next steps
13
data & content design
LESSON 5
DEPLOYMENT
Put your discoveries to work in everyday business. 
Tasks in this phase include:
▸ Planning deployment (your methods for integrating data mining discoveries
into use)
▸ Reporting final results
▸ Reviewing final results
14
DATA AGGREGATION
CLASSICAL
Photo by ev on Unsplash
data & content design
LESSON 5
DATA AGGREGATION
Data aggregation is the process where raw data is gathered and expressed in a summary
form for statistical analysis.
For example, raw data can be aggregated over a given time period to provide statistics. After
the data is aggregated and written to a view or report, you can analyze the aggregated data
to gain insights about particular resources or resource groups.
There are two types of data aggregation:
▸ Time aggregation - All data points for a single resource over a specified time period.
▸ Spatial aggregation - All data points for a group of resources over a specified
geographical area.
16
data & content design
LESSON 5
SUMMARY STATISTICS
When data is aggregated, groups of observations are replaced with summary statistics based on those observations.
Summary statistics are used tto communicate the largest amount of information as simply as possible.
▸ Mean
▸ Count
▸ Maximum
▸ Median
▸ Minimum
▸ Mode
▸ Range
▸ Sum
17
data & content design
LESSON 5
TABLES
Tables are the format in which most numerical data are initially stored and analysed and
are likely to be the means you use to organise data collected during experiments and
dissertation research.
Tables are an effective way of presenting data:
• when you wish to show how a single category of information varies when
measured at different points (in time or space).
• when the dataset contains relatively few numbers.
• when the precise value is crucial to your argument and a graph would not convey
18
data & content design
LESSON 5
BAR CHARTS
Bar charts are one of the most commonly
used types of graph and are used to display
and compare the number, frequency or other
measure for different discrete categories or
groups.
The bars can be drawn either vertically or
horizontally depending upon the number of
categories and length or complexity of the
category labels.
19
data & content design
LESSON 5
HISTOGRAMS
Histograms are a special form of bar chart
where the data represent continuous rather
than discrete categories. Since a
continuous category may have a large
number of possible values the data are
often grouped to reduce the number of data
points.
20
data & content design
LESSON 5
PIE CHARTS
Pie charts are a visual way of displaying how
the total data are distributed between different
categories. Pie charts should only be used for
displaying nominal data. They are generally
best for showing information grouped into a
small number of categories and are a
graphical way of displaying data that might
otherwise be presented as a simple table.
21
Pie chart of populations of English native speakers
data & content design
LESSON 5
LINE GRAPHS
Line graphs are usually used to show time
series data – that is how one or more
variables vary over a continuous period of
time. Line graphs are particularly useful for
identifying patterns and trends in the data
such as seasonal effects, large changes and
turning points. As well as time series data,
line graphs can also be appropriate for
displaying data that are measured over other
continuous variables such as distance.
22
DATA SCIENCE
WHAT IS
Photo by ev on Unsplash
data & content design
LESSON 5
DEFINITION
Data Science is a blend of various tools, algorithms, and machine learning
principles with the goal to discover hidden patterns from the raw data and solve
analytically complicated problems.
24
data & content design
LESSON 5
APPLICATION OF DATA SCIENCE
25
data & content design
LESSON 5
26
data & content design
LESSON 5
EXPLAINING VS PREDICTING
27
By 2020 more than 80 % of the data
will be unstructured. This data is
generated from different sources like
financial logs, text files, multimedia
forms, sensors, and instruments.
data & content design
LESSON 5
28https://databasetown.com/introduction-to-data-science-a-beginners-guide/#What_is_Data_Science
data & content design
LESSON 5
29
data & content design
LESSON 5
30
The Data Scientist has the ability to handle the crude data using the latest
technologies and techniques, can perform the necessary analysis, and can
present the acquired knowledge to his associates in an informative way.
data & content design
LESSON 5
31
The Data Analyst works with R, Python and SQL; the role combines technical
and analytical knowledge.
data & content design
LESSON 5
32
The Data Architect integrates, centralizes, protects and maintains data
sources.
data & content design
LESSON 5
33
The Statistician can be seen as the pioneer of the data science field. It is often
he who reaps the information from the data and transforms it into actionable
insights.
data & content design
LESSON 5
34
The Database Administrator ensures that the database is accessible to every
stakeholder in the organizations and performs the necessary safety measures
to keep the stored data safe.
data & content design
LESSON 5
35
The Business Analyst is probably the least technical profile, he has a deep
understanding of the various business processes that are in place. He often
performs the role of the middle person between the business folks and the
technicians.
data & content design
LESSON 5
36
The Data and Analytics Manager steers the direction of the data science
team. He consolidates strong and specialized skills in a various arrangement
of advancements (SQL, R, SAS, … ) with the social aptitudes required to deal
with a group.
EXAMPLES
SOME
PHOTO BY JAREDD CRAIG ON UNSPLASH
data & content design
LESSON 5
THE NY TIMES
https://www.nytimes.com/interactive/2019/11/02/us/politics/trump-twitter-
disinformation.html
38

More Related Content

What's hot

How we perceive information
How we perceive informationHow we perceive information
How we perceive informationFrieda Brioschi
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Frieda Brioschi
 
How to collect and organize data
How to collect and organize dataHow to collect and organize data
How to collect and organize dataFrieda Brioschi
 
Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...
Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...
Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...Prof. Dr. Diego Kuonen
 
Visual communication of quantitative data
Visual communication of quantitative dataVisual communication of quantitative data
Visual communication of quantitative dataFrieda Brioschi
 
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -...
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -..."Data as the Fuel and Analytics as the Engine of the Digital Transformation -...
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -...Prof. Dr. Diego Kuonen
 
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Prof. Dr. Diego Kuonen
 
The Future Of Data Visualization
The Future Of Data VisualizationThe Future Of Data Visualization
The Future Of Data VisualizationFITC
 
Data Visualization
Data VisualizationData Visualization
Data VisualizationFreddy San
 
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...Prof. Dr. Diego Kuonen
 
Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Frieda Brioschi
 
Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...
Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...
Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...Prof. Dr. Diego Kuonen
 
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...Prof. Dr. Diego Kuonen
 
9 Visualization In E Social Science
9 Visualization In E Social Science9 Visualization In E Social Science
9 Visualization In E Social ScienceWebometrics Class
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industryStefano Perfetti
 
What is data visualization
What is data visualizationWhat is data visualization
What is data visualizationintellect808
 
The data science revolution in insurance
The data science revolution in insuranceThe data science revolution in insurance
The data science revolution in insuranceStefano Perfetti
 

What's hot (20)

How we perceive information
How we perceive informationHow we perceive information
How we perceive information
 
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
Artificial Intelligence, Machine Learning & Tools (v. 2020 ITA)
 
How to collect and organize data
How to collect and organize dataHow to collect and organize data
How to collect and organize data
 
Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...
Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...
Big Data as the Fuel and Visual Analytics as the Engine Mount of the Digital ...
 
Visual communication of quantitative data
Visual communication of quantitative dataVisual communication of quantitative data
Visual communication of quantitative data
 
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -...
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -..."Data as the Fuel and Analytics as the Engine of the Digital Transformation -...
"Data as the Fuel and Analytics as the Engine of the Digital Transformation -...
 
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
Data as the Fuel and Analytics as the Engine of the Digital Transformation: D...
 
The Future Of Data Visualization
The Future Of Data VisualizationThe Future Of Data Visualization
The Future Of Data Visualization
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...
Big Data, Data Science, Machine Intelligence and Learning: Demystification, C...
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)Digital communication (v. 2021 ITA)
Digital communication (v. 2021 ITA)
 
Data visualization
Data visualizationData visualization
Data visualization
 
Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...
Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...
Glocalised Smart Statistics and Analytics of Things: Core Challenges and Key ...
 
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
The Power of Data Insights - Big Data as the Fuel and Analytics as the Engine...
 
Data Visualization
Data VisualizationData Visualization
Data Visualization
 
9 Visualization In E Social Science
9 Visualization In E Social Science9 Visualization In E Social Science
9 Visualization In E Social Science
 
Data science landscape in the insurance industry
Data science landscape in the insurance industryData science landscape in the insurance industry
Data science landscape in the insurance industry
 
What is data visualization
What is data visualizationWhat is data visualization
What is data visualization
 
The data science revolution in insurance
The data science revolution in insuranceThe data science revolution in insurance
The data science revolution in insurance
 

Similar to Data mining and data aggregation basics

TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueMehmet Beyaz
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationDr. Abdul Ahad Abro
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdfWinduGata3
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introductionBasma Gamal
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?DIGITALSAI1
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification courseKumarNaik21
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabadVamsiNihal
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabadsaitejavella
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training HyderabadNithinsunil1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)SayyedYusufali
 
data science training and placement
data science training and placementdata science training and placement
data science training and placementSaiprasadVella
 
online data science training
online data science trainingonline data science training
online data science trainingDIGITALSAI1
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabadVamsiNihal
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabadVamsiNihal
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in HyderabadKumarNaik21
 

Similar to Data mining and data aggregation basics (20)

TTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining TechniqueTTG Int.LTD Data Mining Technique
TTG Int.LTD Data Mining Technique
 
Welcome to CS310!
Welcome to CS310!Welcome to CS310!
Welcome to CS310!
 
Data mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, ClassificationData mining , Knowledge Discovery Process, Classification
Data mining , Knowledge Discovery Process, Classification
 
What is business analytics
What is business analyticsWhat is business analytics
What is business analytics
 
Data Science.pdf
Data Science.pdfData Science.pdf
Data Science.pdf
 
Data mining introduction
Data mining introductionData mining introduction
Data mining introduction
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Which institute is best for data science?
Which institute is best for data science?Which institute is best for data science?
Which institute is best for data science?
 
Best Selenium certification course
Best Selenium certification courseBest Selenium certification course
Best Selenium certification course
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
Data science training institute in hyderabad
Data science training institute in hyderabadData science training institute in hyderabad
Data science training institute in hyderabad
 
Data science training in Hyderabad
Data science  training in HyderabadData science  training in Hyderabad
Data science training in Hyderabad
 
Data science training Hyderabad
Data science training HyderabadData science training Hyderabad
Data science training Hyderabad
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
Data science training in hyd ppt (1)
Data science training in hyd ppt (1)Data science training in hyd ppt (1)
Data science training in hyd ppt (1)
 
data science training and placement
data science training and placementdata science training and placement
data science training and placement
 
online data science training
online data science trainingonline data science training
online data science training
 
Data science online training in hyderabad
Data science online training in hyderabadData science online training in hyderabad
Data science online training in hyderabad
 
data science online training in hyderabad
data science online training in hyderabaddata science online training in hyderabad
data science online training in hyderabad
 
Best data science training in Hyderabad
Best data science training in HyderabadBest data science training in Hyderabad
Best data science training in Hyderabad
 

More from Frieda Brioschi

Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Frieda Brioschi
 
Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Frieda Brioschi
 
How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)Frieda Brioschi
 
Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Frieda Brioschi
 
Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Frieda Brioschi
 
How to collect and organize data (v. ITA 2021)
How to collect and organize data (v. ITA 2021)How to collect and organize data (v. ITA 2021)
How to collect and organize data (v. ITA 2021)Frieda Brioschi
 
What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)Frieda Brioschi
 
Digital communication (v. 2020 ITA)
Digital communication (v. 2020 ITA)Digital communication (v. 2020 ITA)
Digital communication (v. 2020 ITA)Frieda Brioschi
 
Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Frieda Brioschi
 
Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Frieda Brioschi
 
Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Frieda Brioschi
 
What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)Frieda Brioschi
 
Information Classification
Information ClassificationInformation Classification
Information ClassificationFrieda Brioschi
 
What are data and information, why they matter
What are data and information, why they matterWhat are data and information, why they matter
What are data and information, why they matterFrieda Brioschi
 
Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Frieda Brioschi
 

More from Frieda Brioschi (17)

Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)Storytelling with data (v. 2021 ITA)
Storytelling with data (v. 2021 ITA)
 
Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)Visual communication of qualitative and quantitative data (v. 2021 ITA)
Visual communication of qualitative and quantitative data (v. 2021 ITA)
 
How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)How we perceive information (v. 2021 ITA)
How we perceive information (v. 2021 ITA)
 
Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)Data Lingo (v. ITA 2021)
Data Lingo (v. ITA 2021)
 
Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)Information Classification (v. ITA 2021)
Information Classification (v. ITA 2021)
 
How to collect and organize data (v. ITA 2021)
How to collect and organize data (v. ITA 2021)How to collect and organize data (v. ITA 2021)
How to collect and organize data (v. ITA 2021)
 
What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)What are data and information, why they matter (v. ITA 2021)
What are data and information, why they matter (v. ITA 2021)
 
Digital communication (v. 2020 ITA)
Digital communication (v. 2020 ITA)Digital communication (v. 2020 ITA)
Digital communication (v. 2020 ITA)
 
Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)Storytelling with data (v. 2020 ITA)
Storytelling with data (v. 2020 ITA)
 
Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)Data Lingo (v. ITA 2020)
Data Lingo (v. ITA 2020)
 
Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)Information Classification (v. ITA 2020)
Information Classification (v. ITA 2020)
 
What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)What are data and information, why they matter (v. ITA 2020)
What are data and information, why they matter (v. ITA 2020)
 
Storytelling with data
Storytelling with dataStorytelling with data
Storytelling with data
 
Data Lingo
Data LingoData Lingo
Data Lingo
 
Information Classification
Information ClassificationInformation Classification
Information Classification
 
What are data and information, why they matter
What are data and information, why they matterWhat are data and information, why they matter
What are data and information, why they matter
 
Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)Communication for beginners (v. 2019 ita)
Communication for beginners (v. 2019 ita)
 

Recently uploaded

Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptshraddhaparab530
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Seán Kennedy
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 

Recently uploaded (20)

Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Integumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.pptIntegumentary System SMP B. Pharm Sem I.ppt
Integumentary System SMP B. Pharm Sem I.ppt
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...Student Profile Sample - We help schools to connect the data they have, with ...
Student Profile Sample - We help schools to connect the data they have, with ...
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 

Data mining and data aggregation basics

  • 1. data & content design Frieda Brioschi - frieda.brioschi@gmail.com Emma Tracanella - emma.tracanella@gmail.com DATA MINING AND DATA AGGREGATION BASICS LESSON 6 - 2020
  • 3. data & content design LESSON 5 CONTEXT You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signifies and how the business works. Only when you understand the data and the problem that you need to solve can data-mining processes help you to discover useful information and put it to use. 3
  • 4. data & content design LESSON 5 NINE LAWS OF DATA MINING - 1 Pioneering data miner Thomas Khabaza developed his “Nine Laws of Data Mining” to guide new data miners as they get down to work ▸ 1 - “Business Goals Law” 
 Business objectives are the origin of every data mining solution. A data miner is someone who discovers useful information from data to support specific business goals. Data mining isn’t defined by the tool you use. ▸ 2 - “Business Knowledge Law”
 Business Knowledge is central to every step of the data mining process. You don’t have to be a fancy statistician to do data mining, but you do have to know something about what the data signifies and how the business works. 4
  • 5. data & content design LESSON 5 NINE LAWS OF DATA MINING - 2 ▸ 3. “Data Preparation Law”
 Data preparation is more than half of every data mining process. Pretty much every data miner will spend more time on data preparation than on analysis. ▸ 4. “No Free Lunch for the Data Miner”
 The right model for a given application can only be discovered by experiment. In data mining, models are selected through trial and error. ▸ 5 - “Patterns”
 There are always patterns in the data. As a data miner, you explore data in search of useful patterns. Understanding patterns in the data enables you to influence what happens in the future. 5
  • 6. data & content design LESSON 5 NINE LAWS OF DATA MINING - 3 ▸ 6.  “Insight Law”
 Data mining amplifies perception in the business domain. Data mining methods enable you to understand your business better than you could have done without them. ▸ 7 - “Prediction Law”
 Prediction increases information locally by generalization. Data mining helps us use what we know to make better predictions (or estimates) of things we don’t know. 6
  • 7. data & content design LESSON 5 NINE LAWS OF DATA MINING - 4 ▸ 8. “Value Law”
 The value of data mining results is not determined by the accuracy or stability of predictive models. Your model must produce good predictions, consistently. That’s it. ▸ 9. “Law of Change”
 All patterns are subject to change. Any model that gives you great predictions today may be useless tomorrow. 7
  • 8. data & content design LESSON 5 PHASES OF THE DATA MINING PROCESS The Cross-Industry Standard Process for Data Mining (CRISP-DM) is the dominant data-mining process framework. It’s an open standard; anyone may use it. 8
  • 9. data & content design LESSON 5 BUSINESS UNDERSTANDING Get a clear understanding of the problem you’re out to solve, how it impacts your organization, and your goals for addressing it. Tasks in this phase include: ▸ Identifying your business goals ▸ Assessing your situation ▸ Defining your data mining goals ▸ Producing your project plan 9
  • 10. data & content design LESSON 5 DATA UNDERSTANDING Review the data that you have, document it, identify data management and data quality issues. Tasks in this phase include: ▸ Gathering data ▸ Describing ▸ Exploring ▸ Verifying quality 10
  • 11. data & content design LESSON 5 DATA PREPARATION Get your data ready to use for modeling. Tasks in this phase include: ▸ Selecting data ▸ Cleaning data ▸ Constructing ▸ Integrating ▸ Formatting 11
  • 12. data & content design LESSON 5 MODELING Use mathematical techniques to identify patterns within your data. Tasks in this phase include: ▸ Selecting techniques ▸ Designing tests ▸ Building models ▸ Assessing models 12
  • 13. data & content design LESSON 5 EVALUATION Review the patterns you have discovered and assess their potential for business use. Tasks in this phase include: ▸ Evaluating results ▸ Reviewing the process ▸ Determining the next steps 13
  • 14. data & content design LESSON 5 DEPLOYMENT Put your discoveries to work in everyday business.  Tasks in this phase include: ▸ Planning deployment (your methods for integrating data mining discoveries into use) ▸ Reporting final results ▸ Reviewing final results 14
  • 16. data & content design LESSON 5 DATA AGGREGATION Data aggregation is the process where raw data is gathered and expressed in a summary form for statistical analysis. For example, raw data can be aggregated over a given time period to provide statistics. After the data is aggregated and written to a view or report, you can analyze the aggregated data to gain insights about particular resources or resource groups. There are two types of data aggregation: ▸ Time aggregation - All data points for a single resource over a specified time period. ▸ Spatial aggregation - All data points for a group of resources over a specified geographical area. 16
  • 17. data & content design LESSON 5 SUMMARY STATISTICS When data is aggregated, groups of observations are replaced with summary statistics based on those observations. Summary statistics are used tto communicate the largest amount of information as simply as possible. ▸ Mean ▸ Count ▸ Maximum ▸ Median ▸ Minimum ▸ Mode ▸ Range ▸ Sum 17
  • 18. data & content design LESSON 5 TABLES Tables are the format in which most numerical data are initially stored and analysed and are likely to be the means you use to organise data collected during experiments and dissertation research. Tables are an effective way of presenting data: • when you wish to show how a single category of information varies when measured at different points (in time or space). • when the dataset contains relatively few numbers. • when the precise value is crucial to your argument and a graph would not convey 18
  • 19. data & content design LESSON 5 BAR CHARTS Bar charts are one of the most commonly used types of graph and are used to display and compare the number, frequency or other measure for different discrete categories or groups. The bars can be drawn either vertically or horizontally depending upon the number of categories and length or complexity of the category labels. 19
  • 20. data & content design LESSON 5 HISTOGRAMS Histograms are a special form of bar chart where the data represent continuous rather than discrete categories. Since a continuous category may have a large number of possible values the data are often grouped to reduce the number of data points. 20
  • 21. data & content design LESSON 5 PIE CHARTS Pie charts are a visual way of displaying how the total data are distributed between different categories. Pie charts should only be used for displaying nominal data. They are generally best for showing information grouped into a small number of categories and are a graphical way of displaying data that might otherwise be presented as a simple table. 21 Pie chart of populations of English native speakers
  • 22. data & content design LESSON 5 LINE GRAPHS Line graphs are usually used to show time series data – that is how one or more variables vary over a continuous period of time. Line graphs are particularly useful for identifying patterns and trends in the data such as seasonal effects, large changes and turning points. As well as time series data, line graphs can also be appropriate for displaying data that are measured over other continuous variables such as distance. 22
  • 23. DATA SCIENCE WHAT IS Photo by ev on Unsplash
  • 24. data & content design LESSON 5 DEFINITION Data Science is a blend of various tools, algorithms, and machine learning principles with the goal to discover hidden patterns from the raw data and solve analytically complicated problems. 24
  • 25. data & content design LESSON 5 APPLICATION OF DATA SCIENCE 25
  • 26. data & content design LESSON 5 26
  • 27. data & content design LESSON 5 EXPLAINING VS PREDICTING 27 By 2020 more than 80 % of the data will be unstructured. This data is generated from different sources like financial logs, text files, multimedia forms, sensors, and instruments.
  • 28. data & content design LESSON 5 28https://databasetown.com/introduction-to-data-science-a-beginners-guide/#What_is_Data_Science
  • 29. data & content design LESSON 5 29
  • 30. data & content design LESSON 5 30 The Data Scientist has the ability to handle the crude data using the latest technologies and techniques, can perform the necessary analysis, and can present the acquired knowledge to his associates in an informative way.
  • 31. data & content design LESSON 5 31 The Data Analyst works with R, Python and SQL; the role combines technical and analytical knowledge.
  • 32. data & content design LESSON 5 32 The Data Architect integrates, centralizes, protects and maintains data sources.
  • 33. data & content design LESSON 5 33 The Statistician can be seen as the pioneer of the data science field. It is often he who reaps the information from the data and transforms it into actionable insights.
  • 34. data & content design LESSON 5 34 The Database Administrator ensures that the database is accessible to every stakeholder in the organizations and performs the necessary safety measures to keep the stored data safe.
  • 35. data & content design LESSON 5 35 The Business Analyst is probably the least technical profile, he has a deep understanding of the various business processes that are in place. He often performs the role of the middle person between the business folks and the technicians.
  • 36. data & content design LESSON 5 36 The Data and Analytics Manager steers the direction of the data science team. He consolidates strong and specialized skills in a various arrangement of advancements (SQL, R, SAS, … ) with the social aptitudes required to deal with a group.
  • 38. data & content design LESSON 5 THE NY TIMES https://www.nytimes.com/interactive/2019/11/02/us/politics/trump-twitter- disinformation.html 38