SlideShare une entreprise Scribd logo
1  sur  27
Télécharger pour lire hors ligne
1 Het begint met een idee
Data Analysis
Descriptive Statistics and EDA
Giuseppe Procaccianti
Vrije Universiteit Amsterdam
2 Giuseppe Procaccianti / S2 group / The Green Lab
Quick Recap
Experiment
scoping
Experiment
planning
Idea
Experiment
operation
Analysis &
interpretation
Presentation &
package
Vrije Universiteit Amsterdam
3 Giuseppe Procaccianti / S2 group / The Green Lab
Analysis and Interpretation
● Understanding the data
○ descriptive statistics
○ exploratory data analysis (EDA, e.g. boxplots, scatter plots)
● (Optional) data reduction
● Hypothesis testing
● Results interpretation
Vrije Universiteit Amsterdam
4 Giuseppe Procaccianti / S2 group / The Green Lab
Descriptive Statistics
● Goal: get a ‘feeling’ about how data is distributed
● Properties:
○ Central Tendency (e.g. Mean, Median)
○ Dispersion (e.g. Frequency, Standard Deviation)
○ Dependency (e.g. Correlation)
Vrije Universiteit Amsterdam
5 Giuseppe Procaccianti / S2 group / The Green Lab
Parameter vs. statistic
● Parameter: feature of the population
○ μ: mean
○ σ: standard deviation
● Statistic: feature of the sample
○ : mean
○ s: standard deviation
● Statistics are an estimation of parameters
Vrije Universiteit Amsterdam
6 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Arithmetic mean:
● Geometric Mean:
Vrije Universiteit Amsterdam
7 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency: example
● Average of scores:
6 - 7 - 8 - 9 - 10
● Arithmetic mean: 8
● Geometric mean: ~7.87
Vrije Universiteit Amsterdam
8 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency: example
● Average of returns of investments:
90% ; 10% ; 20% ; 30% ; -90%
● Arithmetic mean:
(90+10+20+30-90)/5= 12%
● Geometric mean:
[(1.9 x 1.1 x 1.2 x 1.3 x 0.1) ^ 1/5] - 1 =0.2008= -20.08%
Vrije Universiteit Amsterdam
9 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Median (or 50% percentile): middle value separating the
greater and lesser halves of a data set
X = [13, 18, 13, 14, 13, 16, 14, 21, 13]
Xsort
= [13, 13, 13, 13, 14, 14, 16, 18, 21]
Vrije Universiteit Amsterdam
10 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency
● Mode: most frequent value in data set
X = [13, 18, 13, 14, 13, 16, 14, 21, 13]
Mox
= 13
Vrije Universiteit Amsterdam
11 Giuseppe Procaccianti / S2 group / The Green Lab
Central Tendency - Skewness
Vrije Universiteit Amsterdam
12 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion
● Sample variance:
● Standard Deviation:
● Standard Deviation is dimensionally equivalent to the data
Vrije Universiteit Amsterdam
13 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - three-sigma-rule
"Empirical Rule" by Dan Kernler - Own work. Licensed under CC BY-SA 4.0 via Wikimedia Commons -
http://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG#/media/File:Empirical_Rule.PNG
Vrije Universiteit Amsterdam
14 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - three-sigma-rule
● Range:
● Coefficient of variation:
(in percentage of mean)
● Coefficient of variation only has meaning if all values are
positive (ratio scale, not interval scale e.g. temperatures)
Vrije Universiteit Amsterdam
15 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [100, 100, 100]
Mean: 100
● Variance: 0
● Standard Deviation: 0
● Coeff. Variation: 0
● Range: 0
Vrije Universiteit Amsterdam
16 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [90, 100, 110]
Mean: 100
● Sample Variance: 100
● Standard Deviation: 10
● Coeff. Variation: 10%
● Range: 20
Vrije Universiteit Amsterdam
17 Giuseppe Procaccianti / S2 group / The Green Lab
Dispersion - example
● Dataset: [1, 5, 6, 8, 10, 40, 65, 88]
Mean: 27.875
● Sample Variance: 1082.69
● Standard Deviation: 32.9
● Coeff. Variation: 1.18%
● Range: 87
Vrije Universiteit Amsterdam
18 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
Median
3rd quartile
1st quartile
Vrije Universiteit Amsterdam
19 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
Vrije Universiteit Amsterdam
20 Giuseppe Procaccianti / S2 group / The Green Lab
Basic visualizations
Box Plot
By Gbdivers (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0
(http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons
outliers positive
skewness
Vrije Universiteit Amsterdam
21 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: correlation
● Sample correlation coefficient (Pearson):
● Meaningful when comparing paired values/datasets
Vrije Universiteit Amsterdam
22 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: correlation
● Spearman’s rank correlation coefficient:
● Kendall’s rank correlation coefficient:
○ smaller values
○ more accurate on small samples
● Pearson correlation coefficient assumes normally distributed
data
Vrije Universiteit Amsterdam
23 Giuseppe Procaccianti / S2 group / The Green Lab
Dependency: example
Age vs. body fat %
● Pearson: r = 0.7921
● Spearman: = 0.7539
● Kendall: = 0.5762
Vrije Universiteit Amsterdam
24 Giuseppe Procaccianti / S2 group / The Green Lab
Basic Visualizations
Scatter Plot
Vrije Universiteit Amsterdam
25 Giuseppe Procaccianti / S2 group / The Green Lab
Basic Visualizations
Image Source:
http://www.cqeacademy.com/cqe-body-of-knowledge/continuous-improvement/quality-control-tools/the-scatter-
plot-linear-regression/
Scatter plots per different
values of r
Vrije Universiteit Amsterdam
26 Giuseppe Procaccianti / S2 group / The Green Lab
Correlation does NOT imply causation!
● Spurious Correlations: http://tylervigen.com/
Vrije Universiteit Amsterdam
Thank you!
g.procaccianti@vu.nl
i.malavolta@vu.nl
27 Giuseppe Procaccianti / S2 group / The Green Lab

Contenu connexe

Tendances

The Green Lab - [09 B] Experiment validity
The Green Lab - [09  B] Experiment validityThe Green Lab - [09  B] Experiment validity
The Green Lab - [09 B] Experiment validityIvano Malavolta
 
[05-A] Experiment design (basics)
[05-A] Experiment design (basics)[05-A] Experiment design (basics)
[05-A] Experiment design (basics)Ivano Malavolta
 
[05-B] Experiment design (advanced)
[05-B] Experiment design (advanced)[05-B] Experiment design (advanced)
[05-B] Experiment design (advanced)Ivano Malavolta
 
[13 - A] Experiment validity
[13 - A] Experiment validity[13 - A] Experiment validity
[13 - A] Experiment validityIvano Malavolta
 
The Green Lab - [03 A] Experiment planning
The Green Lab - [03 A] Experiment planningThe Green Lab - [03 A] Experiment planning
The Green Lab - [03 A] Experiment planningIvano Malavolta
 
The Green Lab - [01 C] Empirical software engineering
The Green Lab - [01 C] Empirical software engineeringThe Green Lab - [01 C] Empirical software engineering
The Green Lab - [01 C] Empirical software engineeringIvano Malavolta
 
[03-A] Experiment planning
[03-A] Experiment planning[03-A] Experiment planning
[03-A] Experiment planningIvano Malavolta
 
[02-A] The experimental process
[02-A] The experimental process[02-A] The experimental process
[02-A] The experimental processIvano Malavolta
 
[07-B] Statistical hypothesis testing
[07-B] Statistical hypothesis testing[07-B] Statistical hypothesis testing
[07-B] Statistical hypothesis testingIvano Malavolta
 
[03-B] Measurement theory basics
[03-B] Measurement theory basics[03-B] Measurement theory basics
[03-B] Measurement theory basicsIvano Malavolta
 
[02-B] Experiment scoping
[02-B] Experiment scoping[02-B] Experiment scoping
[02-B] Experiment scopingIvano Malavolta
 
The Green Lab - [04-A] Lab environment and tools
The Green Lab - [04-A] Lab environment and toolsThe Green Lab - [04-A] Lab environment and tools
The Green Lab - [04-A] Lab environment and toolsGiuseppe Procaccianti
 
[01-B] Empirical software engineering
[01-B] Empirical software engineering[01-B] Empirical software engineering
[01-B] Empirical software engineeringIvano Malavolta
 
Data visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problemData visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problemVivAde1
 
OHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisOHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisCamille Maumet
 
Business Research Methods Chap017
Business Research Methods Chap017Business Research Methods Chap017
Business Research Methods Chap017Mazhar Masood
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125Displayr
 

Tendances (20)

The Green Lab - [09 B] Experiment validity
The Green Lab - [09  B] Experiment validityThe Green Lab - [09  B] Experiment validity
The Green Lab - [09 B] Experiment validity
 
[05-A] Experiment design (basics)
[05-A] Experiment design (basics)[05-A] Experiment design (basics)
[05-A] Experiment design (basics)
 
[05-B] Experiment design (advanced)
[05-B] Experiment design (advanced)[05-B] Experiment design (advanced)
[05-B] Experiment design (advanced)
 
[13 - A] Experiment validity
[13 - A] Experiment validity[13 - A] Experiment validity
[13 - A] Experiment validity
 
The Green Lab - [03 A] Experiment planning
The Green Lab - [03 A] Experiment planningThe Green Lab - [03 A] Experiment planning
The Green Lab - [03 A] Experiment planning
 
The Green Lab - [01 C] Empirical software engineering
The Green Lab - [01 C] Empirical software engineeringThe Green Lab - [01 C] Empirical software engineering
The Green Lab - [01 C] Empirical software engineering
 
[03-A] Experiment planning
[03-A] Experiment planning[03-A] Experiment planning
[03-A] Experiment planning
 
[02-A] The experimental process
[02-A] The experimental process[02-A] The experimental process
[02-A] The experimental process
 
[07-B] Statistical hypothesis testing
[07-B] Statistical hypothesis testing[07-B] Statistical hypothesis testing
[07-B] Statistical hypothesis testing
 
[03-B] Measurement theory basics
[03-B] Measurement theory basics[03-B] Measurement theory basics
[03-B] Measurement theory basics
 
[02-B] Experiment scoping
[02-B] Experiment scoping[02-B] Experiment scoping
[02-B] Experiment scoping
 
The Green Lab - [04-A] Lab environment and tools
The Green Lab - [04-A] Lab environment and toolsThe Green Lab - [04-A] Lab environment and tools
The Green Lab - [04-A] Lab environment and tools
 
[01-B] Empirical software engineering
[01-B] Empirical software engineering[01-B] Empirical software engineering
[01-B] Empirical software engineering
 
Data visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problemData visualization via Tableau solving an excel problem
Data visualization via Tableau solving an excel problem
 
OHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysisOHBM 2016: Practical intensity based meta-analysis
OHBM 2016: Practical intensity based meta-analysis
 
Business Research Methods Chap017
Business Research Methods Chap017Business Research Methods Chap017
Business Research Methods Chap017
 
On e-Assessment
On e-AssessmentOn e-Assessment
On e-Assessment
 
DIY market segmentation 20170125
DIY market segmentation 20170125DIY market segmentation 20170125
DIY market segmentation 20170125
 
3701552978
37015529783701552978
3701552978
 
Iannacci Cornford BAM_2017
Iannacci Cornford BAM_2017Iannacci Cornford BAM_2017
Iannacci Cornford BAM_2017
 

En vedette

The Green Lab - [07-B] Hypothesis Testing
The Green Lab - [07-B] Hypothesis TestingThe Green Lab - [07-B] Hypothesis Testing
The Green Lab - [07-B] Hypothesis TestingGiuseppe Procaccianti
 
The Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setupThe Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setupIvano Malavolta
 
The Green Lab - [02 B] Experiment scoping
The Green Lab - [02 B] Experiment scopingThe Green Lab - [02 B] Experiment scoping
The Green Lab - [02 B] Experiment scopingIvano Malavolta
 
The Green Lab - [01-B] Case study presentation
The Green Lab - [01-B] Case study presentationThe Green Lab - [01-B] Case study presentation
The Green Lab - [01-B] Case study presentationGiuseppe Procaccianti
 
The Green Lab - [13 B] Future research challenges
The Green Lab - [13 B] Future research challengesThe Green Lab - [13 B] Future research challenges
The Green Lab - [13 B] Future research challengesIvano Malavolta
 
The Green Lab - [02 C] [case study] Progressive web apps
The Green Lab - [02 C] [case study] Progressive web appsThe Green Lab - [02 C] [case study] Progressive web apps
The Green Lab - [02 C] [case study] Progressive web appsIvano Malavolta
 
The Green Lab - [02 A] The experimental process
The Green Lab - [02 A] The experimental processThe Green Lab - [02 A] The experimental process
The Green Lab - [02 A] The experimental processIvano Malavolta
 
Beyond Native Apps: Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
Beyond Native Apps:  Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...Beyond Native Apps:  Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
Beyond Native Apps: Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...Ivano Malavolta
 
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...Lora Aroyo
 
WebSci2013 Harnessing Disagreement in Crowdsourcing
WebSci2013 Harnessing Disagreement in CrowdsourcingWebSci2013 Harnessing Disagreement in Crowdsourcing
WebSci2013 Harnessing Disagreement in CrowdsourcingLora Aroyo
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors Victor de Boer
 
Agora User Committee Meeting 2013
Agora User Committee Meeting 2013Agora User Committee Meeting 2013
Agora User Committee Meeting 2013Lora Aroyo
 
Talk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament ProceedingsTalk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament ProceedingsAstrid van Aggelen
 
SealincMedia Accurator Demos
SealincMedia Accurator DemosSealincMedia Accurator Demos
SealincMedia Accurator DemosLora Aroyo
 
Dive exploring history presentation
Dive exploring history presentationDive exploring history presentation
Dive exploring history presentationVictor de Boer
 
Future TV is Now: Personalized & Social
Future TV is Now: Personalized & SocialFuture TV is Now: Personalized & Social
Future TV is Now: Personalized & SocialLora Aroyo
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigData_Europe
 

En vedette (17)

The Green Lab - [07-B] Hypothesis Testing
The Green Lab - [07-B] Hypothesis TestingThe Green Lab - [07-B] Hypothesis Testing
The Green Lab - [07-B] Hypothesis Testing
 
The Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setupThe Green Lab - [04 B] [PWA] Experiment setup
The Green Lab - [04 B] [PWA] Experiment setup
 
The Green Lab - [02 B] Experiment scoping
The Green Lab - [02 B] Experiment scopingThe Green Lab - [02 B] Experiment scoping
The Green Lab - [02 B] Experiment scoping
 
The Green Lab - [01-B] Case study presentation
The Green Lab - [01-B] Case study presentationThe Green Lab - [01-B] Case study presentation
The Green Lab - [01-B] Case study presentation
 
The Green Lab - [13 B] Future research challenges
The Green Lab - [13 B] Future research challengesThe Green Lab - [13 B] Future research challenges
The Green Lab - [13 B] Future research challenges
 
The Green Lab - [02 C] [case study] Progressive web apps
The Green Lab - [02 C] [case study] Progressive web appsThe Green Lab - [02 C] [case study] Progressive web apps
The Green Lab - [02 C] [case study] Progressive web apps
 
The Green Lab - [02 A] The experimental process
The Green Lab - [02 A] The experimental processThe Green Lab - [02 A] The experimental process
The Green Lab - [02 A] The experimental process
 
Beyond Native Apps: Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
Beyond Native Apps:  Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...Beyond Native Apps:  Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
Beyond Native Apps: Web Technologies to the Rescue! [SPLASH 2016 - Mobile! k...
 
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...
CHIP Project: Personalized Museum Tour with Real-Time Adaptation on a Mobile ...
 
WebSci2013 Harnessing Disagreement in Crowdsourcing
WebSci2013 Harnessing Disagreement in CrowdsourcingWebSci2013 Harnessing Disagreement in Crowdsourcing
WebSci2013 Harnessing Disagreement in Crowdsourcing
 
PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors PiLOD talk: Dutch Ships and Sailors
PiLOD talk: Dutch Ships and Sailors
 
Agora User Committee Meeting 2013
Agora User Committee Meeting 2013Agora User Committee Meeting 2013
Agora User Committee Meeting 2013
 
Talk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament ProceedingsTalk of Europe – Linking European Parliament Proceedings
Talk of Europe – Linking European Parliament Proceedings
 
SealincMedia Accurator Demos
SealincMedia Accurator DemosSealincMedia Accurator Demos
SealincMedia Accurator Demos
 
Dive exploring history presentation
Dive exploring history presentationDive exploring history presentation
Dive exploring history presentation
 
Future TV is Now: Personalized & Social
Future TV is Now: Personalized & SocialFuture TV is Now: Personalized & Social
Future TV is Now: Personalized & Social
 
BigDataEurope - Big Data & Health
BigDataEurope - Big Data & HealthBigDataEurope - Big Data & Health
BigDataEurope - Big Data & Health
 

Similaire à The Green Lab - [07-A] Data Analysis

[07-A] Descriptive Statistics and data exploration
[07-A] Descriptive Statistics and data exploration[07-A] Descriptive Statistics and data exploration
[07-A] Descriptive Statistics and data explorationIvano Malavolta
 
Circular Analysis in Neuroscience
Circular Analysis in NeuroscienceCircular Analysis in Neuroscience
Circular Analysis in NeuroscienceAna Luísa Pinho
 
Lecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxLecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxshakirRahman10
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptNAGESH108233
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptNAGESH108233
 
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...Shuhei Iitsuka
 
Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Sara Magliacane
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Claudia Wagner
 
2.7.21 sampling methods data analysis
2.7.21 sampling methods data analysis2.7.21 sampling methods data analysis
2.7.21 sampling methods data analysisAshish965416
 
Regulative Supports for Inquiry Learning with Simulations and Modeling
Regulative Supports for Inquiry Learning with Simulations and ModelingRegulative Supports for Inquiry Learning with Simulations and Modeling
Regulative Supports for Inquiry Learning with Simulations and ModelingSarah Manlove
 
Teknik sampling.pptx
Teknik sampling.pptxTeknik sampling.pptx
Teknik sampling.pptxCitraCirebon
 
Research Methods for Business 6-ch06 (research design).pptx
Research Methods for Business 6-ch06 (research design).pptxResearch Methods for Business 6-ch06 (research design).pptx
Research Methods for Business 6-ch06 (research design).pptxAhmedAlrashid7
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research dataAtula Ahuja
 
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic SystemEnsemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic SystemMariangel (Angie) Garcia, Ph.D
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2Daria Bogdanova
 

Similaire à The Green Lab - [07-A] Data Analysis (20)

[07-A] Descriptive Statistics and data exploration
[07-A] Descriptive Statistics and data exploration[07-A] Descriptive Statistics and data exploration
[07-A] Descriptive Statistics and data exploration
 
Circular Analysis in Neuroscience
Circular Analysis in NeuroscienceCircular Analysis in Neuroscience
Circular Analysis in Neuroscience
 
Lecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptxLecture 11 Paired t test.pptx
Lecture 11 Paired t test.pptx
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.ppt
 
UNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.pptUNIT I -Data and Data Collection1.ppt
UNIT I -Data and Data Collection1.ppt
 
Data in science
Data in science Data in science
Data in science
 
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
Improving the Sensitivity of Online Controlled Experiments by Utilizing Pre-E...
 
Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...Talk: Joint causal inference on observational and experimental data - NIPS 20...
Talk: Joint causal inference on observational and experimental data - NIPS 20...
 
Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014Datascience Introduction WebSci Summer School 2014
Datascience Introduction WebSci Summer School 2014
 
2.7.21 sampling methods data analysis
2.7.21 sampling methods data analysis2.7.21 sampling methods data analysis
2.7.21 sampling methods data analysis
 
Regulative Supports for Inquiry Learning with Simulations and Modeling
Regulative Supports for Inquiry Learning with Simulations and ModelingRegulative Supports for Inquiry Learning with Simulations and Modeling
Regulative Supports for Inquiry Learning with Simulations and Modeling
 
Teknik sampling.pptx
Teknik sampling.pptxTeknik sampling.pptx
Teknik sampling.pptx
 
Statistical test
Statistical testStatistical test
Statistical test
 
Research Methods for Business 6-ch06 (research design).pptx
Research Methods for Business 6-ch06 (research design).pptxResearch Methods for Business 6-ch06 (research design).pptx
Research Methods for Business 6-ch06 (research design).pptx
 
Analyzing experimental research data
Analyzing experimental research dataAnalyzing experimental research data
Analyzing experimental research data
 
Statistic
StatisticStatistic
Statistic
 
Pt 12 Mixed Research.pptx
Pt 12 Mixed Research.pptxPt 12 Mixed Research.pptx
Pt 12 Mixed Research.pptx
 
How to prepare a thesis
How to prepare a thesisHow to prepare a thesis
How to prepare a thesis
 
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic SystemEnsemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
Ensemble Filters to Reduce Uncertanties in San Quintin Bay Hidrodynamic System
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2
 

Plus de Giuseppe Procaccianti

The Green Lab - Experimentation in Software Energy Efficiency (ICSE)
The Green Lab - Experimentation in Software Energy Efficiency (ICSE)The Green Lab - Experimentation in Software Energy Efficiency (ICSE)
The Green Lab - Experimentation in Software Energy Efficiency (ICSE)Giuseppe Procaccianti
 
Four-dimensional Sustainable E-Services
Four-dimensional Sustainable E-ServicesFour-dimensional Sustainable E-Services
Four-dimensional Sustainable E-ServicesGiuseppe Procaccianti
 
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013Giuseppe Procaccianti
 
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...Giuseppe Procaccianti
 
SEIT 2013: A Categorization of Green Practices used by Dutch data centers
SEIT 2013: A Categorization of Green Practices used by Dutch data centersSEIT 2013: A Categorization of Green Practices used by Dutch data centers
SEIT 2013: A Categorization of Green Practices used by Dutch data centersGiuseppe Procaccianti
 
EnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
EnviroInfo 2013: Energy Efficiency in Cloud Software ArchitecturesEnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
EnviroInfo 2013: Energy Efficiency in Cloud Software ArchitecturesGiuseppe Procaccianti
 

Plus de Giuseppe Procaccianti (7)

Energy Efficiency of ORM Approaches
Energy Efficiency of ORM ApproachesEnergy Efficiency of ORM Approaches
Energy Efficiency of ORM Approaches
 
The Green Lab - Experimentation in Software Energy Efficiency (ICSE)
The Green Lab - Experimentation in Software Energy Efficiency (ICSE)The Green Lab - Experimentation in Software Energy Efficiency (ICSE)
The Green Lab - Experimentation in Software Energy Efficiency (ICSE)
 
Four-dimensional Sustainable E-Services
Four-dimensional Sustainable E-ServicesFour-dimensional Sustainable E-Services
Four-dimensional Sustainable E-Services
 
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
Energy Efficiency in Cloud Software Architectures - ICT.OPEN 2013
 
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
Delegating Data Management to the Cloud: A Case Study in a Telecommunications...
 
SEIT 2013: A Categorization of Green Practices used by Dutch data centers
SEIT 2013: A Categorization of Green Practices used by Dutch data centersSEIT 2013: A Categorization of Green Practices used by Dutch data centers
SEIT 2013: A Categorization of Green Practices used by Dutch data centers
 
EnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
EnviroInfo 2013: Energy Efficiency in Cloud Software ArchitecturesEnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
EnviroInfo 2013: Energy Efficiency in Cloud Software Architectures
 

Dernier

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17Celine George
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfDr Vijay Vishwakarma
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxDenish Jangid
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...Nguyen Thanh Tu Collection
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structuredhanjurrannsibayan2
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Association for Project Management
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 

Dernier (20)

How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdfUnit 3 Emotional Intelligence and Spiritual Intelligence.pdf
Unit 3 Emotional Intelligence and Spiritual Intelligence.pdf
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024Mehran University Newsletter Vol-X, Issue-I, 2024
Mehran University Newsletter Vol-X, Issue-I, 2024
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...Making communications land - Are they received and understood as intended? we...
Making communications land - Are they received and understood as intended? we...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 

The Green Lab - [07-A] Data Analysis

  • 1. 1 Het begint met een idee Data Analysis Descriptive Statistics and EDA Giuseppe Procaccianti
  • 2. Vrije Universiteit Amsterdam 2 Giuseppe Procaccianti / S2 group / The Green Lab Quick Recap Experiment scoping Experiment planning Idea Experiment operation Analysis & interpretation Presentation & package
  • 3. Vrije Universiteit Amsterdam 3 Giuseppe Procaccianti / S2 group / The Green Lab Analysis and Interpretation ● Understanding the data ○ descriptive statistics ○ exploratory data analysis (EDA, e.g. boxplots, scatter plots) ● (Optional) data reduction ● Hypothesis testing ● Results interpretation
  • 4. Vrije Universiteit Amsterdam 4 Giuseppe Procaccianti / S2 group / The Green Lab Descriptive Statistics ● Goal: get a ‘feeling’ about how data is distributed ● Properties: ○ Central Tendency (e.g. Mean, Median) ○ Dispersion (e.g. Frequency, Standard Deviation) ○ Dependency (e.g. Correlation)
  • 5. Vrije Universiteit Amsterdam 5 Giuseppe Procaccianti / S2 group / The Green Lab Parameter vs. statistic ● Parameter: feature of the population ○ μ: mean ○ σ: standard deviation ● Statistic: feature of the sample ○ : mean ○ s: standard deviation ● Statistics are an estimation of parameters
  • 6. Vrije Universiteit Amsterdam 6 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency ● Arithmetic mean: ● Geometric Mean:
  • 7. Vrije Universiteit Amsterdam 7 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency: example ● Average of scores: 6 - 7 - 8 - 9 - 10 ● Arithmetic mean: 8 ● Geometric mean: ~7.87
  • 8. Vrije Universiteit Amsterdam 8 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency: example ● Average of returns of investments: 90% ; 10% ; 20% ; 30% ; -90% ● Arithmetic mean: (90+10+20+30-90)/5= 12% ● Geometric mean: [(1.9 x 1.1 x 1.2 x 1.3 x 0.1) ^ 1/5] - 1 =0.2008= -20.08%
  • 9. Vrije Universiteit Amsterdam 9 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency ● Median (or 50% percentile): middle value separating the greater and lesser halves of a data set X = [13, 18, 13, 14, 13, 16, 14, 21, 13] Xsort = [13, 13, 13, 13, 14, 14, 16, 18, 21]
  • 10. Vrije Universiteit Amsterdam 10 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency ● Mode: most frequent value in data set X = [13, 18, 13, 14, 13, 16, 14, 21, 13] Mox = 13
  • 11. Vrije Universiteit Amsterdam 11 Giuseppe Procaccianti / S2 group / The Green Lab Central Tendency - Skewness
  • 12. Vrije Universiteit Amsterdam 12 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion ● Sample variance: ● Standard Deviation: ● Standard Deviation is dimensionally equivalent to the data
  • 13. Vrije Universiteit Amsterdam 13 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - three-sigma-rule "Empirical Rule" by Dan Kernler - Own work. Licensed under CC BY-SA 4.0 via Wikimedia Commons - http://commons.wikimedia.org/wiki/File:Empirical_Rule.PNG#/media/File:Empirical_Rule.PNG
  • 14. Vrije Universiteit Amsterdam 14 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - three-sigma-rule ● Range: ● Coefficient of variation: (in percentage of mean) ● Coefficient of variation only has meaning if all values are positive (ratio scale, not interval scale e.g. temperatures)
  • 15. Vrije Universiteit Amsterdam 15 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - example ● Dataset: [100, 100, 100] Mean: 100 ● Variance: 0 ● Standard Deviation: 0 ● Coeff. Variation: 0 ● Range: 0
  • 16. Vrije Universiteit Amsterdam 16 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - example ● Dataset: [90, 100, 110] Mean: 100 ● Sample Variance: 100 ● Standard Deviation: 10 ● Coeff. Variation: 10% ● Range: 20
  • 17. Vrije Universiteit Amsterdam 17 Giuseppe Procaccianti / S2 group / The Green Lab Dispersion - example ● Dataset: [1, 5, 6, 8, 10, 40, 65, 88] Mean: 27.875 ● Sample Variance: 1082.69 ● Standard Deviation: 32.9 ● Coeff. Variation: 1.18% ● Range: 87
  • 18. Vrije Universiteit Amsterdam 18 Giuseppe Procaccianti / S2 group / The Green Lab Basic visualizations Box Plot Median 3rd quartile 1st quartile
  • 19. Vrije Universiteit Amsterdam 19 Giuseppe Procaccianti / S2 group / The Green Lab Basic visualizations Box Plot
  • 20. Vrije Universiteit Amsterdam 20 Giuseppe Procaccianti / S2 group / The Green Lab Basic visualizations Box Plot By Gbdivers (Own work) [GFDL (http://www.gnu.org/copyleft/fdl.html) or CC BY-SA 3.0 (http://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons outliers positive skewness
  • 21. Vrije Universiteit Amsterdam 21 Giuseppe Procaccianti / S2 group / The Green Lab Dependency: correlation ● Sample correlation coefficient (Pearson): ● Meaningful when comparing paired values/datasets
  • 22. Vrije Universiteit Amsterdam 22 Giuseppe Procaccianti / S2 group / The Green Lab Dependency: correlation ● Spearman’s rank correlation coefficient: ● Kendall’s rank correlation coefficient: ○ smaller values ○ more accurate on small samples ● Pearson correlation coefficient assumes normally distributed data
  • 23. Vrije Universiteit Amsterdam 23 Giuseppe Procaccianti / S2 group / The Green Lab Dependency: example Age vs. body fat % ● Pearson: r = 0.7921 ● Spearman: = 0.7539 ● Kendall: = 0.5762
  • 24. Vrije Universiteit Amsterdam 24 Giuseppe Procaccianti / S2 group / The Green Lab Basic Visualizations Scatter Plot
  • 25. Vrije Universiteit Amsterdam 25 Giuseppe Procaccianti / S2 group / The Green Lab Basic Visualizations Image Source: http://www.cqeacademy.com/cqe-body-of-knowledge/continuous-improvement/quality-control-tools/the-scatter- plot-linear-regression/ Scatter plots per different values of r
  • 26. Vrije Universiteit Amsterdam 26 Giuseppe Procaccianti / S2 group / The Green Lab Correlation does NOT imply causation! ● Spurious Correlations: http://tylervigen.com/
  • 27. Vrije Universiteit Amsterdam Thank you! g.procaccianti@vu.nl i.malavolta@vu.nl 27 Giuseppe Procaccianti / S2 group / The Green Lab