SlideShare une entreprise Scribd logo
1  sur  25
Télécharger pour lire hors ligne
Back to the basics, Part2
Data exploration:
representing and testing data properties
Spyros Veronikis
Electrical and Electronics Engineer
Dept. of Archives and Library Sciences
Ionian University
spver@ionio.gr
http://dlib.ionio.gr/~spver/seminars/statistics/




                                                   More information
Seminar Content
   Basic statistics concepts
   Data exploration
   Correlation
   Comparing two means
   Comparing several means (ANOVA)
   Non-parametric test
   Nominal data


                                      2
Today's Content
   Statistical significance
   Parametric Data
   Histograms and Boxplots
   Descriptive statistics
   Correcting problems in the data
   Exploring groups of data
   Testing whether a distribution is normal
   Testing for homogeneity of variance
   Summary
                                               3
Statistical significance
      Ronald Fisher and Muriel Bristol, The Lady tasting
       tea
     Given 6 cups of tea and milk (in 3 of which the tea was placed first and 3 had the
     milk added first)
      “What is the probability that lady Bristol finds all 3 cups where tea was placed
        first?”            (Answer: 1/20= 0.05 ή 5%)
     Fisher suggested that only when we are at least 95% certain that a result is
     genuine (not a chance finding) should we accept it as true, and therefore
     statistically significant.
     Frequency distributions can be used to assess the probability. In a typical normal
     distribution, chance of z occuring more than:
      
          Z= 1.96 is 0.05 or 5%
      
          Z= 2.58 is 0.01 or 1%
      
          Z= 3.29 is 0.001 or 0,1%



                                                                                         4
Parametric data
 Assumptions of parametric tests:
     1. Normally distributed data:
           
               Checked with histograms and Kolmogorov-Smirnov or Shapiro-Wilk criterion
     2. Homogeneity of variance
           
               In several groups: data in samples come from populations of same variance
           
               In correlational designs: variance of one variable remains stable across all levels of
               other variable(s)
     3. Interval data
           
               Arithmetic values, equal differences among successive points in measurement scale
     4. Independence
           
               Data come from different participants who don't influence each other.


Only requirements 1 and 2 are tested by objective criteria (tests). Requirements 3 and 4 are
    tested by common sense.


                                                                                                 5
Graphing and screening data
 Histograms
   Show the number of
    times each recorded
    value occurs
   The horizontal axis
    represents the levels of
    measurement of the
    variable
   They make outliers easy
    to spot



                               6
Outliers
 These are scores very different from the rest of the data
 They occur rarely
 They can bias the model we fit to the data
 They need to be identified and omitted from the dataset




                                                         7
Graphing and screening data
 Boxplots (box-whisker diagrams)
          – Show the lowest and highest scores
          – Show the quartile (25%) ranges
          – Show the interquartile range (50%)
          – Show the median*
 Normal distribution has a symmetrical
  boxplot
 Skewed distributions don't
 Platykurtic distributions have wide
  boxplots
 * The median of a list of numbers is the number that splits the dataset in half.
 E.g. for dataset A={1,2,2,4,13,15,51} the median is 4 (the average is 12.57)


                                                                                    8
Graphing and screening data




                              9
Descriptive statistics
 “An alternative search tool was provided to the library patrons and
  we assess its adoption by recording the number of hours spent on
  the previous tool before and after 2 demonstration sessions”.
 Descriptive statistics
       Mean, Standard error of mean
       Median
       Mode
       Standard deviation
       Variance
       Skewness and std. Error
       Kurtosis and std. Error
       Range
       Min, Max

                                                                  10
Descriptive statistics




  First Period   Second Period   Third Period




                                            11
Skewness and kurtosis
 These are 0 for normal distribution
 z-values are more informative because they are
  standardized
      – Skewness:     zsk= SK/SEsk
      – Kurtosis:     zkur= KUR/SEkur
1st period: zsk= -.004/.086= .047,      zkur= -2.38
2nd period: zsk= 1.095/.150= 7.30,      zkur= -2.75
3rd period: zsk= 1.033/.218= 4.73,      zkur= 1.69

                                                      12
Correcting problems in data
 How do we deal with problems in data (problems in
  distribution, outliers, missing values)
      – Remove problematic cases
      – Transform the data (X → Y= function(X))
 Transformations can correct distribution form, i.e.,
  remove skewness by “smoothening” outliers
 Transform ALL data, of ALL variables that are going to be
  compared/related even if there are variables that aren't
  skewed. This isn't cheating!
 Do the statistic tests and analysis

                                                        13
Correcting positively skewed data
 Log transformation, Xtr → Log(X):
        – Reduces positive skew.
        – X must be greater than 0 (shifting might be needed)
 Square root transformation, Xtr → sqrt(X)
        – Brings large scores closer to the center
        – X must be greater than 0 (shifting might be needed)
 Reciprocal transformation, Xtr → 1/X
        – Reduces the impact of large scores
        – Reverses the scale. Therefore, prior to transformation we need to
            reverse scores ourselves (e.g. Xrev ← HighestX – X). Then X tr →
              1 /Xrev
 For negatively skewed data, reverse the scores prior to the above
  transformations.
                                                                        14
Transformed data




                   15
Explore groups of data
  Example:
 A professor is recording the score of a project report delivered by
  his students. He also takes a note to:
        – The percentage of the given bibliography studied (biblio)
       – Their attendaces in the lectures (lectures)
       – The number of student collaborated to deliver the report
           (participants).
 He collects the same data a year later (from the students of the
  new class) and looks for differences in performance.


        Score= function(biblio, lectures, participants, year)


                                                                  16
Explore groups of data
       Project score     Bibliography studied




     Lectures attended      Participants




                                                17
Explore groups of data




                         18
Explore groups of data
  Comments on overall descriptives (previous table):
 The distribution of score for both years seems bimodal (could be a
  difference in performance by year)
 We can compare scores of two years (because each one comes
  from a normal distribution), but we can compare the whole dataset
  of scores to another similar dataset.
 The participants' distribution might also be due to different
  collaborations among years.


 We ask for descriptives per year




                                                                19
Explore groups of data




                         20
Testing normality of a distribution
   Normality is not assessed visually (i.e., it looks normal to me)
   We mathematically examine whether a given distribution as a whole
    deviates from a comparable normal distribution (having same mean
    and same standard deviation) .
   We use Kolmogorov-Smirnov and Shapiro-Wilk tests
    “Is the given distribution different than normal?”
   None significant test outcome (p>. 05) indicates similar distribution,
    therefore normality
   A difference (outcome) found significant (p< 0.05) shows non-
    normality




                                                                             21
Testing for homogeneity of variance
 Homogeneity of variance is a requirement for parametric tests to
  be applied
 As you go through the levels of one variable, the variance of
  another variable must not change
   “For how many hours do you feel sick after an extra pint of beer?”




                                                                   22
Testing for homogeneity of variance
   For groups of data we use Levene's test which reveals if there is homogeneity of
    variance.
   It test the hypothesis that “There is no difference between variances in the
    groups”, i.e., the difference between variances is zero.
          – If the test outcome (difference) is found significantly different from 0
                (i.e., there is a difference and that is not a chance finding, p<.05) then
                we reject the null hypothesis and acknowledge heterogeneous
                variances.
          – If the test outcome is found non-significant (error probability p>.05) then
                we accept the null hypothesis and consider the group data to be of
                homogeneous variance.

                          Levene          df1        df2      Significance
                        statistic (d)
      Score                2.584            1        98        0.111 > 0.05

      Participants         7.368            1        98        0.008 < 0.05

                                                                                      23
Summary
 What we 've seen
   Examine data properly before proceeding to
    analysis
   Look at the data distribution
   Spot any problems (e.g. outliers)
   In case of non-normality try to transform data
   When comparing data from different groups look
    at distributions within each group
   Also test for homogeneity of variance

                                                24
References
 Field, A. (2005). Discovering Statistics Using SPSS,
  2nd ed., Sage Publications.
 Statsoft, Inc. (2011). Electronic Statistics Textbook.
  Tulsa, OK: Statsoft. WEB:
  http://www.statsoft.com/textbook/




                                                       25

Contenu connexe

Tendances

Tendances (20)

Malhotra16
Malhotra16Malhotra16
Malhotra16
 
Testing Assumptions in repeated Measures Design using SPSS
Testing Assumptions in repeated Measures Design using SPSSTesting Assumptions in repeated Measures Design using SPSS
Testing Assumptions in repeated Measures Design using SPSS
 
T Test For Two Independent Samples
T Test For Two Independent SamplesT Test For Two Independent Samples
T Test For Two Independent Samples
 
Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8Discriminant Analysis-lecture 8
Discriminant Analysis-lecture 8
 
Statistical Distributions
Statistical DistributionsStatistical Distributions
Statistical Distributions
 
Analysis of Variance and Repeated Measures Design
Analysis of Variance and Repeated Measures DesignAnalysis of Variance and Repeated Measures Design
Analysis of Variance and Repeated Measures Design
 
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 
Malhotra08
Malhotra08Malhotra08
Malhotra08
 
Parametric tests seminar
Parametric tests seminarParametric tests seminar
Parametric tests seminar
 
Logistic regression
Logistic regressionLogistic regression
Logistic regression
 
The two sample t-test
The two sample t-testThe two sample t-test
The two sample t-test
 
The t Test for Two Related Samples
The t Test for Two Related SamplesThe t Test for Two Related Samples
The t Test for Two Related Samples
 
Eda sri
Eda sriEda sri
Eda sri
 
Correlation and Simple Regression
Correlation  and Simple RegressionCorrelation  and Simple Regression
Correlation and Simple Regression
 
T test statistics
T test statisticsT test statistics
T test statistics
 
Statistics
StatisticsStatistics
Statistics
 
Day 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdfDay 12 t test for dependent samples and single samples pdf
Day 12 t test for dependent samples and single samples pdf
 
T test independent samples
T test  independent samplesT test  independent samples
T test independent samples
 
Discriminant analysis
Discriminant analysisDiscriminant analysis
Discriminant analysis
 

En vedette

Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Giannis Tsakonas
 
{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.Giannis Tsakonas
 
B) implicaciones en el uso curricular de las tic
B) implicaciones en el uso curricular de las ticB) implicaciones en el uso curricular de las tic
B) implicaciones en el uso curricular de las ticsac30
 
Affective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stressAffective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stressGiannis Tsakonas
 
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικέςΑκαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικέςGiannis Tsakonas
 
Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...Giannis Tsakonas
 
We were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshopWe were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshopGiannis Tsakonas
 
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταΒιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταGiannis Tsakonas
 

En vedette (8)

Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
Charting the Digital Library Evaluation Domain with a Semantically Enhanced M...
 
{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.{Tech}changes: the technological state of Greek Libraries.
{Tech}changes: the technological state of Greek Libraries.
 
B) implicaciones en el uso curricular de las tic
B) implicaciones en el uso curricular de las ticB) implicaciones en el uso curricular de las tic
B) implicaciones en el uso curricular de las tic
 
Affective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stressAffective relationships between users & libraries in times of economic stress
Affective relationships between users & libraries in times of economic stress
 
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικέςΑκαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
Ακαδημαϊκές Βιβλιοθήκες στην Πάτρα: Προηγμένες υπηρεσίες, δικτύωση & προοπτικές
 
Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...Increasing traceability of physical library items through Koha: the case of S...
Increasing traceability of physical library items through Koha: the case of S...
 
We were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshopWe were group no 2: notes for the MLAS2015 workshop
We were group no 2: notes for the MLAS2015 workshop
 
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόηταΒιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
Βιβλιοθήκες & Πολιτισμός: τα προφανή και τα ευνόητα
 

Similaire à Back to the basics-Part2: Data exploration: representing and testing data properties

Anova lecture
Anova lectureAnova lecture
Anova lecturedoublem44
 
Lect 2 basic ppt
Lect 2 basic pptLect 2 basic ppt
Lect 2 basic pptTao Hong
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysisgokulprasath06
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfkobra22
 
Standard deviation
Standard deviationStandard deviation
Standard deviationM K
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.pptmousaderhem1
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variationleblance
 
BRM_Data Analysis, Interpretation and Reporting Part II.ppt
BRM_Data Analysis, Interpretation and Reporting Part II.pptBRM_Data Analysis, Interpretation and Reporting Part II.ppt
BRM_Data Analysis, Interpretation and Reporting Part II.pptAbdifatahAhmedHurre
 
250Lec5INFERENTIAL STATISTICS FOR RESEARC
250Lec5INFERENTIAL STATISTICS FOR RESEARC250Lec5INFERENTIAL STATISTICS FOR RESEARC
250Lec5INFERENTIAL STATISTICS FOR RESEARCLeaCamillePacle
 
Descriptive And Inferential Statistics for Nursing Research
Descriptive And Inferential Statistics for Nursing ResearchDescriptive And Inferential Statistics for Nursing Research
Descriptive And Inferential Statistics for Nursing Researchenamprofessor
 

Similaire à Back to the basics-Part2: Data exploration: representing and testing data properties (20)

Anova lecture
Anova lectureAnova lecture
Anova lecture
 
Data in science
Data in science Data in science
Data in science
 
Lect 2 basic ppt
Lect 2 basic pptLect 2 basic ppt
Lect 2 basic ppt
 
Basic statistics
Basic statisticsBasic statistics
Basic statistics
 
Descriptive Analytics: Data Reduction
 Descriptive Analytics: Data Reduction Descriptive Analytics: Data Reduction
Descriptive Analytics: Data Reduction
 
Chapter 11 Psrm
Chapter 11 PsrmChapter 11 Psrm
Chapter 11 Psrm
 
statistics
statisticsstatistics
statistics
 
Data analysis
Data analysisData analysis
Data analysis
 
Exploratory data analysis
Exploratory data analysisExploratory data analysis
Exploratory data analysis
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
 
Standard deviation
Standard deviationStandard deviation
Standard deviation
 
Kinds Of Variable
Kinds Of VariableKinds Of Variable
Kinds Of Variable
 
Ds vs Is discuss 3.1
Ds vs Is discuss 3.1Ds vs Is discuss 3.1
Ds vs Is discuss 3.1
 
Quantitative_analysis.ppt
Quantitative_analysis.pptQuantitative_analysis.ppt
Quantitative_analysis.ppt
 
3.2 measures of variation
3.2 measures of variation3.2 measures of variation
3.2 measures of variation
 
Descriptive Analysis.pptx
Descriptive Analysis.pptxDescriptive Analysis.pptx
Descriptive Analysis.pptx
 
BRM_Data Analysis, Interpretation and Reporting Part II.ppt
BRM_Data Analysis, Interpretation and Reporting Part II.pptBRM_Data Analysis, Interpretation and Reporting Part II.ppt
BRM_Data Analysis, Interpretation and Reporting Part II.ppt
 
Stat2013
Stat2013Stat2013
Stat2013
 
250Lec5INFERENTIAL STATISTICS FOR RESEARC
250Lec5INFERENTIAL STATISTICS FOR RESEARC250Lec5INFERENTIAL STATISTICS FOR RESEARC
250Lec5INFERENTIAL STATISTICS FOR RESEARC
 
Descriptive And Inferential Statistics for Nursing Research
Descriptive And Inferential Statistics for Nursing ResearchDescriptive And Inferential Statistics for Nursing Research
Descriptive And Inferential Statistics for Nursing Research
 

Plus de Giannis Tsakonas

Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόΑρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόGiannis Tsakonas
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...Giannis Tsakonas
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουGiannis Tsakonas
 
FRBR και Linked Data - Σεμινάριο Αθήνας
FRBR και Linked Data -  Σεμινάριο ΑθήναςFRBR και Linked Data -  Σεμινάριο Αθήνας
FRBR και Linked Data - Σεμινάριο ΑθήναςGiannis Tsakonas
 
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Giannis Tsakonas
 
Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...Giannis Tsakonas
 
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Giannis Tsakonas
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Giannis Tsakonas
 
Path-based MXML Storage and Querying
Path-based MXML Storage and QueryingPath-based MXML Storage and Querying
Path-based MXML Storage and QueryingGiannis Tsakonas
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataGiannis Tsakonas
 
Open Bibliographic Data and E-LIS
Open Bibliographic Data and E-LISOpen Bibliographic Data and E-LIS
Open Bibliographic Data and E-LISGiannis Tsakonas
 
Dileo Presentation (in English)
Dileo Presentation (in English)Dileo Presentation (in English)
Dileo Presentation (in English)Giannis Tsakonas
 
Evaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital RepositoriesEvaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital RepositoriesGiannis Tsakonas
 
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...Giannis Tsakonas
 
Conceptual similarity: why, where and how
Conceptual similarity: why, where and howConceptual similarity: why, where and how
Conceptual similarity: why, where and howGiannis Tsakonas
 
User studies: enquiry foundations and methodological considerations
User studies: enquiry foundations and methodological considerationsUser studies: enquiry foundations and methodological considerations
User studies: enquiry foundations and methodological considerationsGiannis Tsakonas
 
Developing a formal model for mind maps representation
Developing a formal model for mind maps representation Developing a formal model for mind maps representation
Developing a formal model for mind maps representation Giannis Tsakonas
 
Geographic collections development policies and GIS services: a research in U...
Geographic collections development policies and GIS services: a research in U...Geographic collections development policies and GIS services: a research in U...
Geographic collections development policies and GIS services: a research in U...Giannis Tsakonas
 
Failed queries: a morpho-syntactic analysis based on transaction log files
Failed queries: a morpho-syntactic analysis based on transaction log filesFailed queries: a morpho-syntactic analysis based on transaction log files
Failed queries: a morpho-syntactic analysis based on transaction log filesGiannis Tsakonas
 
The exploitation of social tagging systems in libraries
The exploitation of social tagging systems in librariesThe exploitation of social tagging systems in libraries
The exploitation of social tagging systems in librariesGiannis Tsakonas
 

Plus de Giannis Tsakonas (20)

Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο ΙστόΑρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
Αρχειακά Μεταδεδομένα: Πρότυπα και Διαχείριση στον Παγκόσμιο Ιστό
 
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
The “Nomenclature of Multidimensionality” in the Digital Libraries Evaluation...
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο ΚύπρουΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - Σεμινάριο Κύπρου
 
FRBR και Linked Data - Σεμινάριο Αθήνας
FRBR και Linked Data -  Σεμινάριο ΑθήναςFRBR και Linked Data -  Σεμινάριο Αθήνας
FRBR και Linked Data - Σεμινάριο Αθήνας
 
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
Automatic Medical Document Generation via Spatial SNOMED Elements in Hysteros...
 
Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...Policies for geospatial collections: a research in US and Canadian academic l...
Policies for geospatial collections: a research in US and Canadian academic l...
 
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
Developing a Metadata Model for Historic Buildings: Describing and Linking Ar...
 
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
Query Expansion and Context: Thoughts on Language, Meaning and Knowledge Orga...
 
Path-based MXML Storage and Querying
Path-based MXML Storage and QueryingPath-based MXML Storage and Querying
Path-based MXML Storage and Querying
 
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked DataΔεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
Δεδομένα Βιβλιοθηκών στο μελλοντικό ψηφιακό περιβάλλον - FRBR και Linked Data
 
Open Bibliographic Data and E-LIS
Open Bibliographic Data and E-LISOpen Bibliographic Data and E-LIS
Open Bibliographic Data and E-LIS
 
Dileo Presentation (in English)
Dileo Presentation (in English)Dileo Presentation (in English)
Dileo Presentation (in English)
 
Evaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital RepositoriesEvaluation Insights to Key Processes of Digital Repositories
Evaluation Insights to Key Processes of Digital Repositories
 
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
E-LIS, το ηλεκτρονικό αρχείο για τη Βιβλιοθηκονομία και την Επιστήμη της Πληρ...
 
Conceptual similarity: why, where and how
Conceptual similarity: why, where and howConceptual similarity: why, where and how
Conceptual similarity: why, where and how
 
User studies: enquiry foundations and methodological considerations
User studies: enquiry foundations and methodological considerationsUser studies: enquiry foundations and methodological considerations
User studies: enquiry foundations and methodological considerations
 
Developing a formal model for mind maps representation
Developing a formal model for mind maps representation Developing a formal model for mind maps representation
Developing a formal model for mind maps representation
 
Geographic collections development policies and GIS services: a research in U...
Geographic collections development policies and GIS services: a research in U...Geographic collections development policies and GIS services: a research in U...
Geographic collections development policies and GIS services: a research in U...
 
Failed queries: a morpho-syntactic analysis based on transaction log files
Failed queries: a morpho-syntactic analysis based on transaction log filesFailed queries: a morpho-syntactic analysis based on transaction log files
Failed queries: a morpho-syntactic analysis based on transaction log files
 
The exploitation of social tagging systems in libraries
The exploitation of social tagging systems in librariesThe exploitation of social tagging systems in libraries
The exploitation of social tagging systems in libraries
 

Dernier

How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxDr.Ibrahim Hassaan
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTiammrhaywood
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 

Dernier (20)

YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Gas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptxGas measurement O2,Co2,& ph) 04/2024.pptx
Gas measurement O2,Co2,& ph) 04/2024.pptx
 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPTECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
ECONOMIC CONTEXT - LONG FORM TV DRAMA - PPT
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 

Back to the basics-Part2: Data exploration: representing and testing data properties

  • 1. Back to the basics, Part2 Data exploration: representing and testing data properties Spyros Veronikis Electrical and Electronics Engineer Dept. of Archives and Library Sciences Ionian University spver@ionio.gr http://dlib.ionio.gr/~spver/seminars/statistics/ More information
  • 2. Seminar Content  Basic statistics concepts  Data exploration  Correlation  Comparing two means  Comparing several means (ANOVA)  Non-parametric test  Nominal data 2
  • 3. Today's Content  Statistical significance  Parametric Data  Histograms and Boxplots  Descriptive statistics  Correcting problems in the data  Exploring groups of data  Testing whether a distribution is normal  Testing for homogeneity of variance  Summary 3
  • 4. Statistical significance  Ronald Fisher and Muriel Bristol, The Lady tasting tea Given 6 cups of tea and milk (in 3 of which the tea was placed first and 3 had the milk added first) “What is the probability that lady Bristol finds all 3 cups where tea was placed first?” (Answer: 1/20= 0.05 ή 5%) Fisher suggested that only when we are at least 95% certain that a result is genuine (not a chance finding) should we accept it as true, and therefore statistically significant. Frequency distributions can be used to assess the probability. In a typical normal distribution, chance of z occuring more than:  Z= 1.96 is 0.05 or 5%  Z= 2.58 is 0.01 or 1%  Z= 3.29 is 0.001 or 0,1% 4
  • 5. Parametric data  Assumptions of parametric tests: 1. Normally distributed data:  Checked with histograms and Kolmogorov-Smirnov or Shapiro-Wilk criterion 2. Homogeneity of variance  In several groups: data in samples come from populations of same variance  In correlational designs: variance of one variable remains stable across all levels of other variable(s) 3. Interval data  Arithmetic values, equal differences among successive points in measurement scale 4. Independence  Data come from different participants who don't influence each other. Only requirements 1 and 2 are tested by objective criteria (tests). Requirements 3 and 4 are tested by common sense. 5
  • 6. Graphing and screening data  Histograms  Show the number of times each recorded value occurs  The horizontal axis represents the levels of measurement of the variable  They make outliers easy to spot 6
  • 7. Outliers  These are scores very different from the rest of the data  They occur rarely  They can bias the model we fit to the data  They need to be identified and omitted from the dataset 7
  • 8. Graphing and screening data  Boxplots (box-whisker diagrams) – Show the lowest and highest scores – Show the quartile (25%) ranges – Show the interquartile range (50%) – Show the median*  Normal distribution has a symmetrical boxplot  Skewed distributions don't  Platykurtic distributions have wide boxplots * The median of a list of numbers is the number that splits the dataset in half. E.g. for dataset A={1,2,2,4,13,15,51} the median is 4 (the average is 12.57) 8
  • 10. Descriptive statistics  “An alternative search tool was provided to the library patrons and we assess its adoption by recording the number of hours spent on the previous tool before and after 2 demonstration sessions”.  Descriptive statistics  Mean, Standard error of mean  Median  Mode  Standard deviation  Variance  Skewness and std. Error  Kurtosis and std. Error  Range  Min, Max 10
  • 11. Descriptive statistics First Period Second Period Third Period 11
  • 12. Skewness and kurtosis  These are 0 for normal distribution  z-values are more informative because they are standardized – Skewness: zsk= SK/SEsk – Kurtosis: zkur= KUR/SEkur 1st period: zsk= -.004/.086= .047, zkur= -2.38 2nd period: zsk= 1.095/.150= 7.30, zkur= -2.75 3rd period: zsk= 1.033/.218= 4.73, zkur= 1.69 12
  • 13. Correcting problems in data  How do we deal with problems in data (problems in distribution, outliers, missing values) – Remove problematic cases – Transform the data (X → Y= function(X))  Transformations can correct distribution form, i.e., remove skewness by “smoothening” outliers  Transform ALL data, of ALL variables that are going to be compared/related even if there are variables that aren't skewed. This isn't cheating!  Do the statistic tests and analysis 13
  • 14. Correcting positively skewed data  Log transformation, Xtr → Log(X): – Reduces positive skew. – X must be greater than 0 (shifting might be needed)  Square root transformation, Xtr → sqrt(X) – Brings large scores closer to the center – X must be greater than 0 (shifting might be needed)  Reciprocal transformation, Xtr → 1/X – Reduces the impact of large scores – Reverses the scale. Therefore, prior to transformation we need to reverse scores ourselves (e.g. Xrev ← HighestX – X). Then X tr → 1 /Xrev  For negatively skewed data, reverse the scores prior to the above transformations. 14
  • 16. Explore groups of data Example:  A professor is recording the score of a project report delivered by his students. He also takes a note to: – The percentage of the given bibliography studied (biblio) – Their attendaces in the lectures (lectures) – The number of student collaborated to deliver the report (participants).  He collects the same data a year later (from the students of the new class) and looks for differences in performance.  Score= function(biblio, lectures, participants, year) 16
  • 17. Explore groups of data Project score Bibliography studied Lectures attended Participants 17
  • 18. Explore groups of data 18
  • 19. Explore groups of data Comments on overall descriptives (previous table):  The distribution of score for both years seems bimodal (could be a difference in performance by year)  We can compare scores of two years (because each one comes from a normal distribution), but we can compare the whole dataset of scores to another similar dataset.  The participants' distribution might also be due to different collaborations among years.  We ask for descriptives per year 19
  • 20. Explore groups of data 20
  • 21. Testing normality of a distribution  Normality is not assessed visually (i.e., it looks normal to me)  We mathematically examine whether a given distribution as a whole deviates from a comparable normal distribution (having same mean and same standard deviation) .  We use Kolmogorov-Smirnov and Shapiro-Wilk tests “Is the given distribution different than normal?”  None significant test outcome (p>. 05) indicates similar distribution, therefore normality  A difference (outcome) found significant (p< 0.05) shows non- normality 21
  • 22. Testing for homogeneity of variance  Homogeneity of variance is a requirement for parametric tests to be applied  As you go through the levels of one variable, the variance of another variable must not change “For how many hours do you feel sick after an extra pint of beer?” 22
  • 23. Testing for homogeneity of variance  For groups of data we use Levene's test which reveals if there is homogeneity of variance.  It test the hypothesis that “There is no difference between variances in the groups”, i.e., the difference between variances is zero. – If the test outcome (difference) is found significantly different from 0 (i.e., there is a difference and that is not a chance finding, p<.05) then we reject the null hypothesis and acknowledge heterogeneous variances. – If the test outcome is found non-significant (error probability p>.05) then we accept the null hypothesis and consider the group data to be of homogeneous variance. Levene df1 df2 Significance statistic (d) Score 2.584 1 98 0.111 > 0.05 Participants 7.368 1 98 0.008 < 0.05 23
  • 24. Summary  What we 've seen  Examine data properly before proceeding to analysis  Look at the data distribution  Spot any problems (e.g. outliers)  In case of non-normality try to transform data  When comparing data from different groups look at distributions within each group  Also test for homogeneity of variance 24
  • 25. References  Field, A. (2005). Discovering Statistics Using SPSS, 2nd ed., Sage Publications.  Statsoft, Inc. (2011). Electronic Statistics Textbook. Tulsa, OK: Statsoft. WEB: http://www.statsoft.com/textbook/ 25