statistics

Intro to Research in Information Studies Inferential Statistics Standard Error of the Mean Significance Inferential tests you can use

Do you speak the language? t = n 1 - X B  2 X B  2 ( ) n 2 - 1 n 1 + ( ) x - ( n 1 -1) + (n 2 -1) X A — X B — X A  2 X A  2 ( ) ( ) ( ) + [ ] 1 n 2

Don’t Panic ! t = n 1 - X B  2 X B  2 ( ) n 2 - 1 n 1 + ( ) x - Compare with SD formula ( n 1 -1) + (n 2 -1) Difference between means X A — X B — X A  2 X A  2 ( ) ( ) ( ) + [ ] 1 n 2

Basic types of statistical treatment ,[object Object],[object Object],Statistical tests are inferential

Two kinds of descriptive statistic: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Or where about on the measurement scale most of the data fall Or how spread out they are The different measures have different sensitivity and should be used at the appropriate times…

Symbol check ,[object Object],[object Object],[object Object]

Mean ,[object Object],[object Object],[object Object],Refer to handout on notation See example on next slide

Variance and standard deviation ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],To overcome problems with range etc. we need a better measure of spread

Symbol check ,[object Object],[object Object]

Two ways to get SD ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

If we recalculate the variance with the 60 instead of the 5 in the data…

If we include a large outlier : Note increase in SD Like the mean, the standard deviation uses every piece of data and is therefore sensitive to extreme values

Two sets of data can have the same mean but different standard deviations. The bigger the SD, the more s-p-r-e-a-d out are the data.

On the use of N or N-1 ,[object Object],[object Object]

Summary Mode • Median • Mean • Range • Interquartile Range • Variance / Standard Deviation • Most frequent observation. Use with nominal data ‘ Middle’ of data. Use with ordinal data or when data contain outliers ‘ Average’. Use with interval and ratio data if no outliers Dependent on two extreme values More useful than range. Often used with median Same conditions as mean. With mean, provides excellent summary of data Measures of Central Tendency Measures of Dispersion

Deviation units: Z scores Any data point can be expressed in terms of its Distance from the mean in SD units: A positive z score implies a value above the mean A negative z score implies a value below the mean Andrew Dillon: Move this to later in the course, after distributions?

Interpreting Z scores ,[object Object],[object Object],[object Object],[object Object],[object Object]

Comparing data with Z scores ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

With normal distributions ,[object Object],[object Object],[object Object],[object Object]

Graphing data - the histogram Number Of errors The categories of data we are studying, e.g., task or interface, or user group etc. The frequency of occurrence for measure of interest, e.g., errors, time, scores on a test etc. 1 2 3 4 5 6 7 8 9 10 Graph gives instant summary of data - check spread, similarity, outliers, etc.

Very large data sets tend to have distinct shape:

Normal distribution ,[object Object],[object Object],[object Object],[object Object],[object Object]

The Normal Curve NB: position of measures of central tendency Mean Median Mode 50% of scores fall below mean f

Positively skewed distribution Note how the various measures of central tendency separate now - note the direction of the change…mode moves left of other two, mean stays highest, indicating frequency of scores less than the mean Mode Median Mean f

Negatively skewed distribution Here the tendency to have higher values more common serves to increase the value of the mode Mean Median Mode f

Other distributions ,[object Object],[object Object],[object Object],[object Object],[object Object]

Bimodal f Mean Median Mode Mode Will occur in situations where there might be distinct groups being tested e.g., novices and experts Note how each mode is itself part of a normal distribution (more later)

Standard deviations and the normal curve Mean 1 sd f 1 sd 68% of observations fall within ± 1 s.d. 95% of observations fall within ± 2 s.d. (approx) 1 sd 1 sd

Z scores and tables ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Remember: ,[object Object],[object Object]

Importance of distribution ,[object Object],[object Object]

So - for your research: ,[object Object],[object Object],[object Object]

Inference is built on Probability ,[object Object],[object Object],[object Object]

Calculating probability ,[object Object],[object Object],[object Object],[object Object],[object Object],At this point I ask people to take out a coin and toss it 10 times, noting the exact sequence of outcomes e.g., h,h,t,h,t,t,h,t,t,h. Then I have people compare outcomes….

Sampling distribution for 3 coin tosses

Probability and normal curves ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

What use is probability to us? ,[object Object],[object Object]

Determining probability ,[object Object],[object Object],[object Object],[object Object],Introduce simple stats tables here :

What is a significance level? ,[object Object],[object Object],[object Object],[object Object]

What levels might we chose? ,[object Object],[object Object],[object Object],[object Object]

Using other levels ,[object Object],[object Object]

Thinking about p levels ,[object Object],[object Object],[object Object],[object Object],[object Object]

Putting probability to work ,[object Object],[object Object],[object Object]

Sampling error and the mean ,[object Object],[object Object],[object Object],[object Object],[object Object],I find that this is the hardest part of stats for novices to grasp, since it is the bridge between descriptive and inferential stats…..needs to be explained slowly!!

How can we relate our sample to everyone else? ,[object Object],[object Object],[object Object],[object Object]

2 4 6 8 10 12 14 16 18 The distribution of the means forms a smaller normal distribution about the true mean:

True for skewed distributions too Mean f Plot of means from samples Here the tendency to have higher values more common serves to increase the value of the mode

How means behave.. ,[object Object],[object Object],[object Object]

But... ,[object Object],[object Object]

Implications ,[object Object],[object Object]

Example ,[object Object],[object Object],[object Object],[object Object]

The Standard Error of the Means

If standard error of mean = 0.89 ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Issues to note ,[object Object],[object Object],[object Object],[object Object],[object Object]

Exercise: ,[object Object],[object Object],[object Object],[object Object],Answers: 9-11 8.66-11.33 4-16 2-18

Exercise answers: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Recap ,[object Object],[object Object],[object Object]

Comparing 2 means ,[object Object],[object Object],This is the beginning of significance testing

SE of difference between means This lets us set up confidence limits for the differences between the two means

Regardless of population mean: ,[object Object],[object Object]

Consider two interfaces: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Calculate the SE difference between the means ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

But what else? ,[object Object],[object Object],[object Object],[object Object],[object Object]

Hold it! ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]

Why t? ,[object Object],[object Object],[object Object]

Simple t-test: ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Mean = 79.17 Sd=13.17

T-test: From t-tables, we can see that this value of t exceeds t value (with 5 d.f.) for p.10 level So we are confident at 90% level that our new interface leads to improvement

T-test: SE mean Sample mean Thus - we can still talk in confidence intervals, e.g., We are 68% confident the mean of population =79.17  5.38

Predicting the direction of the difference ,[object Object],[object Object]

One tail (directional) test ,[object Object],[object Object],[object Object],[object Object]

So to recap ,[object Object],[object Object],[object Object]

Why would you predict the direction? ,[object Object],[object Object],[object Object],[object Object]

statistics

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

En vedette

En vedette (7)

Similaire à statistics

Similaire à statistics (20)

Dernier

Dernier (20)

statistics