SlideShare une entreprise Scribd logo
1  sur  28
Télécharger pour lire hors ligne
Dr. D. Sugumar
Associate Professor/ECE
Karunya
Overview: Analytics Problem Solving
• Goal:
• Understand the basic characteristics of the data
• Examples for characteristics:
• Structure
• Size
• Completeness
• Relationships
• Usually interactive and semi-automated
• Text editors, system calls (head/more/less), etc. to look at raw data
directly
• Helps to understand the structure
• Statistics and visualizations to learn about distributions and
relationships
• Exploration should also include meta data
• Feature names, trace links, etc.
• Overview
• Summary Statistics
• Visualization for Data Exploration
• Summary
• Summarize data through single value
• Do not predict anything about the data (→ inductive statistics)
• Common statistics covered in this course
• Central tendency (mean/median/mode)
• Variability (standard deviation, interquartile range)
• Range of data (min/max)
• Other important statistics
• Kurtosis and skewness for the shape of distributions
• More measures for central tendency, e.g., trimmed means, harmonic mean
• „Typical“ value of the data
• Arithmetic mean
• 𝑚𝑒𝑎𝑛 𝑥 =
1
𝑛
σ𝑖=1
𝑛
𝑥𝑖 with 𝑥 = 𝑥1, … , 𝑥𝑛 ∈ ℝ𝑛
• Median
• The value that separates the higher half from the data of the lower half
• Mode
• The value that appears most in the data
• Measure for the spread of the data
• Also called dispersion
• Standard deviation
• Measure for the difference of observation to the arithmetic mean
• 𝑠𝑑 𝑥 =
σ𝑖=1
𝑛 𝑥𝑖−𝑚𝑒𝑎𝑛 𝑥
2
𝑛−1
• Interquartile Range (IQR)
• Percentile: value below which a given percentage falls
• Difference between the 75% percentile and the 25% percentile
The median is the 50% percentile
• Range for which values are observed
• Can be infinite!
• Minimum
• Smallest observed value
• Maximum
• Largest observed value
• May be strongly distorted by invalid data
• Makes it also a good tool to discover invalid data
• Random typing on the keypad
• 𝑥 = (1,2,1,1,3,4,5,2,3,4,5,1,3,2,1,6,5,4,9,4,3,6,1,5,6,8,4,6,5,1,3,2,1,6,8,7,6,1,3,1,6,8,4,7,6,4,3,5,4,9,7,4,3,1,4,6,8,7,9, 1,4,6,1,3,8,6,7,4,9,6,5,1,3,6,8,7)
• central tendency:
• mean: 4.46052631579
• median: 4.0
• mode (count): 1 (14)
• variability
• sd: 2.41944311488
• IQR: 3.0
• range
• min: 1
• max: 9
• Overview
• Summary Statistics
• Visualization for Data Exploration
• Summary
Numbers are made up and pie charts should actually be avoided
i
X y
10.00 8.04
8.00 6.95
13.00 7.58
9.00 8.81
11.00 8.33
14.00 9.96
6.00 7.24
4.00 4.26
12.00 10.84
7.00 4.82
5.00 5.68
ii
x y
10.00 9.14
8.00 8.14
13.00 8.74
9.00 8.77
11.00 9.26
14.00 8.10
6.00 6.13
4.00 3.10
12.00 9.13
7.00 7.26
5.00 4.74
iii
x y
10.00 7.46
8.00 6.77
13.00 12.74
9.00 7.11
11.00 7.81
14.00 8.84
6.00 6.08
4.00 5.39
12.00 8.15
7.00 6.42
5.00 5.73
iv
x y
8.00 6.58
8.00 5.76
8.00 7.71
8.00 8.84
8.00 8.47
8.00 7.04
8.00 5.25
19.00 12.50
8.00 5.56
8.00 7.91
8.00 6.89
Have the same
• Mean
• standard
deviation
• correlation
between x and y
• linear regression
Anscombe‘s Quartet
Plots of the Boston house prices data set
http://archive.ics.uci.edu/ml/machine-learning-databases/housing/
Extremley skewed
Mixture of two
normals after
taking the
logarithm
Looks like an artificially high value
→ Groups all higher incomes
Median
75%
percentile
25%
percentile
Range of
data except
outliers
Outlier
The outlier definition can change. We
used „more than 1.5 times the IQR
away from the 25%/75% percentile.”
You should always check this in the
package you use.
No correlation
visible
Strong linear
correlation
Histogram of data in
the column
Good separation
of blue, but green
and orange are
overlapping
Good
separation of
all three
classes
Density plots of
data in the
column
separated by
classes
Colors show
strength of
correlation
Correlation
between
premiums and
losses
Correlation
between reasons
for accidents
There are different correlation
coefficients. We used Pearson‘s
coefficient, which measures linear
correlations.
Cannot see
structure due to
amount of data
Hexagonal bins
reveal the structure
Linear trend
Regular noise pattern →
Seasonal?
Range of values
• Overview
• Summary Statistics
• Visualization for Data Exploration
• Summary
• Important to understand the data available
• Summary statistics provide a good overview
• Can be deceptive!
• Visualization is a powerful way to understand data
• Understanding of meta data and how domain experts
understand data equally important!

Contenu connexe

Similaire à 07 Data-Exploration.pdf

Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingSOMASUNDARAM T
 
Biostatistics.pptx
Biostatistics.pptxBiostatistics.pptx
Biostatistics.pptxTawhid4
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineeringJannatulFerdous160
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2Daria Bogdanova
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISBabasID2
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityIkbal Ahmed
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in MalaysiaAhmed Elmalla
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data scienceTanujaSomvanshi1
 
data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023ayesha455941
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptxgina458018
 
Role of Statistics In Research.pptx
Role of Statistics In Research.pptxRole of Statistics In Research.pptx
Role of Statistics In Research.pptxDipsanuPaul
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxJANNU VINAY
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep LearningExperfy
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisAashish Patel
 
advance data analyst course in mohali
advance  data  analyst  course in mohaliadvance  data  analyst  course in mohali
advance data analyst course in mohaliseekhodigitalindia
 

Similaire à 07 Data-Exploration.pdf (20)

Data Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report WritingData Analysis & Interpretation and Report Writing
Data Analysis & Interpretation and Report Writing
 
Biostatistics.pptx
Biostatistics.pptxBiostatistics.pptx
Biostatistics.pptx
 
Real life application of statistics in engineering
Real life application of statistics in engineeringReal life application of statistics in engineering
Real life application of statistics in engineering
 
RM UNIT 6.pptx
RM UNIT 6.pptxRM UNIT 6.pptx
RM UNIT 6.pptx
 
Data in science
Data in science Data in science
Data in science
 
Applied statistics lecture_2
Applied statistics lecture_2Applied statistics lecture_2
Applied statistics lecture_2
 
EXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSISEXPLORATORY DATA ANALYSIS
EXPLORATORY DATA ANALYSIS
 
statistical analysis
statistical analysisstatistical analysis
statistical analysis
 
Data Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & NormalityData Analysis in Research: Descriptive Statistics & Normality
Data Analysis in Research: Descriptive Statistics & Normality
 
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
Data Science  & AI Road Map by Python & Computer science tutor in MalaysiaData Science  & AI Road Map by Python & Computer science tutor in Malaysia
Data Science & AI Road Map by Python & Computer science tutor in Malaysia
 
Introduction of data science
Introduction of data scienceIntroduction of data science
Introduction of data science
 
Biostatistics
BiostatisticsBiostatistics
Biostatistics
 
data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023data analysis in Statistics-2023 guide 2023
data analysis in Statistics-2023 guide 2023
 
Exploring Data (1).pptx
Exploring Data (1).pptxExploring Data (1).pptx
Exploring Data (1).pptx
 
Data Analysis, Intepretation
Data Analysis, IntepretationData Analysis, Intepretation
Data Analysis, Intepretation
 
Role of Statistics In Research.pptx
Role of Statistics In Research.pptxRole of Statistics In Research.pptx
Role of Statistics In Research.pptx
 
Unit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptxUnit 2_ Descriptive Analytics for MBA .pptx
Unit 2_ Descriptive Analytics for MBA .pptx
 
Graph Models for Deep Learning
Graph Models for Deep LearningGraph Models for Deep Learning
Graph Models for Deep Learning
 
PG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data AnalysisPG STAT 531 Lecture 4 Exploratory Data Analysis
PG STAT 531 Lecture 4 Exploratory Data Analysis
 
advance data analyst course in mohali
advance  data  analyst  course in mohaliadvance  data  analyst  course in mohali
advance data analyst course in mohali
 

Plus de SugumarSarDurai (19)

Parking NYC.pdf
Parking NYC.pdfParking NYC.pdf
Parking NYC.pdf
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Power BI.pdf
Power BI.pdfPower BI.pdf
Power BI.pdf
 
Unit 6.pdf
Unit 6.pdfUnit 6.pdf
Unit 6.pdf
 
Unit 5.pdf
Unit 5.pdfUnit 5.pdf
Unit 5.pdf
 
06 Excel.pdf
06 Excel.pdf06 Excel.pdf
06 Excel.pdf
 
05 python.pdf
05 python.pdf05 python.pdf
05 python.pdf
 
00-01 DSnDA.pdf
00-01 DSnDA.pdf00-01 DSnDA.pdf
00-01 DSnDA.pdf
 
03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf03-Data-Analysis-Final.pdf
03-Data-Analysis-Final.pdf
 
UNit4.pdf
UNit4.pdfUNit4.pdf
UNit4.pdf
 
UNit4d.pdf
UNit4d.pdfUNit4d.pdf
UNit4d.pdf
 
Unit 4 Time Study.pdf
Unit 4 Time Study.pdfUnit 4 Time Study.pdf
Unit 4 Time Study.pdf
 
Unit 3 Micro and Memo motion study.pdf
Unit 3 Micro and Memo motion study.pdfUnit 3 Micro and Memo motion study.pdf
Unit 3 Micro and Memo motion study.pdf
 
02 Work study -Part_1.pdf
02 Work study -Part_1.pdf02 Work study -Part_1.pdf
02 Work study -Part_1.pdf
 
02 Method Study part_2.pdf
02 Method Study part_2.pdf02 Method Study part_2.pdf
02 Method Study part_2.pdf
 
01 Production_part_2.pdf
01 Production_part_2.pdf01 Production_part_2.pdf
01 Production_part_2.pdf
 
01 Production_part_1.pdf
01 Production_part_1.pdf01 Production_part_1.pdf
01 Production_part_1.pdf
 
01 Industrial Management_Part_1a .pdf
01 Industrial Management_Part_1a .pdf01 Industrial Management_Part_1a .pdf
01 Industrial Management_Part_1a .pdf
 
01 Industrial Management_Part_1 .pdf
01 Industrial Management_Part_1 .pdf01 Industrial Management_Part_1 .pdf
01 Industrial Management_Part_1 .pdf
 

Dernier

Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsTechSoup
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptxMaritesTamaniVerdade
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfAdmir Softic
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxVishalSingh1417
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxdhanalakshmis0310
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxcallscotland1987
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxVishalSingh1417
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Dernier (20)

Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
Introduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The BasicsIntroduction to Nonprofit Accounting: The Basics
Introduction to Nonprofit Accounting: The Basics
 
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
2024-NATIONAL-LEARNING-CAMP-AND-OTHER.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Unit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptxUnit-V; Pricing (Pharma Marketing Management).pptx
Unit-V; Pricing (Pharma Marketing Management).pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

07 Data-Exploration.pdf

  • 1. Dr. D. Sugumar Associate Professor/ECE Karunya
  • 2.
  • 4. • Goal: • Understand the basic characteristics of the data • Examples for characteristics: • Structure • Size • Completeness • Relationships
  • 5. • Usually interactive and semi-automated • Text editors, system calls (head/more/less), etc. to look at raw data directly • Helps to understand the structure • Statistics and visualizations to learn about distributions and relationships • Exploration should also include meta data • Feature names, trace links, etc.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. • Overview • Summary Statistics • Visualization for Data Exploration • Summary
  • 12. • Summarize data through single value • Do not predict anything about the data (→ inductive statistics) • Common statistics covered in this course • Central tendency (mean/median/mode) • Variability (standard deviation, interquartile range) • Range of data (min/max) • Other important statistics • Kurtosis and skewness for the shape of distributions • More measures for central tendency, e.g., trimmed means, harmonic mean
  • 13. • „Typical“ value of the data • Arithmetic mean • 𝑚𝑒𝑎𝑛 𝑥 = 1 𝑛 σ𝑖=1 𝑛 𝑥𝑖 with 𝑥 = 𝑥1, … , 𝑥𝑛 ∈ ℝ𝑛 • Median • The value that separates the higher half from the data of the lower half • Mode • The value that appears most in the data
  • 14. • Measure for the spread of the data • Also called dispersion • Standard deviation • Measure for the difference of observation to the arithmetic mean • 𝑠𝑑 𝑥 = σ𝑖=1 𝑛 𝑥𝑖−𝑚𝑒𝑎𝑛 𝑥 2 𝑛−1 • Interquartile Range (IQR) • Percentile: value below which a given percentage falls • Difference between the 75% percentile and the 25% percentile The median is the 50% percentile
  • 15. • Range for which values are observed • Can be infinite! • Minimum • Smallest observed value • Maximum • Largest observed value • May be strongly distorted by invalid data • Makes it also a good tool to discover invalid data
  • 16. • Random typing on the keypad • 𝑥 = (1,2,1,1,3,4,5,2,3,4,5,1,3,2,1,6,5,4,9,4,3,6,1,5,6,8,4,6,5,1,3,2,1,6,8,7,6,1,3,1,6,8,4,7,6,4,3,5,4,9,7,4,3,1,4,6,8,7,9, 1,4,6,1,3,8,6,7,4,9,6,5,1,3,6,8,7) • central tendency: • mean: 4.46052631579 • median: 4.0 • mode (count): 1 (14) • variability • sd: 2.41944311488 • IQR: 3.0 • range • min: 1 • max: 9
  • 17. • Overview • Summary Statistics • Visualization for Data Exploration • Summary
  • 18. Numbers are made up and pie charts should actually be avoided
  • 19. i X y 10.00 8.04 8.00 6.95 13.00 7.58 9.00 8.81 11.00 8.33 14.00 9.96 6.00 7.24 4.00 4.26 12.00 10.84 7.00 4.82 5.00 5.68 ii x y 10.00 9.14 8.00 8.14 13.00 8.74 9.00 8.77 11.00 9.26 14.00 8.10 6.00 6.13 4.00 3.10 12.00 9.13 7.00 7.26 5.00 4.74 iii x y 10.00 7.46 8.00 6.77 13.00 12.74 9.00 7.11 11.00 7.81 14.00 8.84 6.00 6.08 4.00 5.39 12.00 8.15 7.00 6.42 5.00 5.73 iv x y 8.00 6.58 8.00 5.76 8.00 7.71 8.00 8.84 8.00 8.47 8.00 7.04 8.00 5.25 19.00 12.50 8.00 5.56 8.00 7.91 8.00 6.89 Have the same • Mean • standard deviation • correlation between x and y • linear regression Anscombe‘s Quartet
  • 20. Plots of the Boston house prices data set http://archive.ics.uci.edu/ml/machine-learning-databases/housing/ Extremley skewed Mixture of two normals after taking the logarithm Looks like an artificially high value → Groups all higher incomes
  • 21. Median 75% percentile 25% percentile Range of data except outliers Outlier The outlier definition can change. We used „more than 1.5 times the IQR away from the 25%/75% percentile.” You should always check this in the package you use.
  • 23. Good separation of blue, but green and orange are overlapping Good separation of all three classes Density plots of data in the column separated by classes
  • 24. Colors show strength of correlation Correlation between premiums and losses Correlation between reasons for accidents There are different correlation coefficients. We used Pearson‘s coefficient, which measures linear correlations.
  • 25. Cannot see structure due to amount of data Hexagonal bins reveal the structure
  • 26. Linear trend Regular noise pattern → Seasonal? Range of values
  • 27. • Overview • Summary Statistics • Visualization for Data Exploration • Summary
  • 28. • Important to understand the data available • Summary statistics provide a good overview • Can be deceptive! • Visualization is a powerful way to understand data • Understanding of meta data and how domain experts understand data equally important!