32. Chi Square Test Pepsi Challenge Observed : Pepsi 85, Coke 57, RC 78 Expected (equal) = 73.33 Degrees of freedom = rows - 1 = 3 - 1 = 2 Critical value of χ 2 = 5.99 at alpha = 0.05 Observed value of χ 2 = 5.8 Decision: Fail to reject H 0 5.8 χ 2 = 219.99 220 Totals 0.3 21.81 4.67 73.33 78 RC 3.64 266.67 -16.33 73.33 57 Coke 1.86 136.19 11.67 73.33 85 Pepsi (O-E) 2 /E (O-E) 2 O-E E O
33.
34.
35.
36. “ Changing the Face of Instruction…” Is an online tutorial as effective in teaching library instruction as a classroom setting? H3. Students will report as much or more satisfaction with online instruction as students taking traditional instruction. Research Question Hypotheses H1. Students will have higher scores in information literacy tests after library instruction. H2. Students will have the same or higher scores in info-lit tests after taking online tutorials as students taking traditional instruction.
37. “ Changing the Face of Instruction…” Variables: Test scores & survey results Data Collection: Pretest/Posttest & Survey Variables & Data Collection Statistical Tests Conclusions Accept H1: Instruction improves literacy. Desc Stats incl. mean, standard deviation, standard error, T-tests (1 & 2 tailed) Accept H3 alternative hypothesis – Student satisfaction is equal with both methods. Accept H2 alternative hypothesis – Online has no significant difference from traditional.
38.
39. “ Do Open-Access Articles…” Do freely available articles have a greater research impact? Research impact: citation rates Open Access: freely available Research Question Hypotheses H1. Scholarly articles have a greater research impact if the articles are freely available online than if they are not. Ho: (null hypothesis): There is no difference between the mean citation rates: Ho: d1 = d0 Measures
40. “ Do Open-Access Articles…” Variables: Mean citation rates Data Collection: At least 50 articles from 10 leading journals in 4 disciplines. Variables & Data Collection Statistical Tests Conclusions Reject Ho: Open Access articles are citation more than those that are not OA. Desc Stats incl. mean, standard deviation, standard error, Wilcoxon sign-rank Validity? Reliability of Measures? Generalizability? Alternate hypotheses? Discussion
Science & Electronic Resources Librarian Libraries of the Claremont Colleges
Part I will be an overview of developing a research project with the aim of using statistics as a methodology in the analysis. Part II will be an overview of statistical concepts and language. Part III will be an exercise in evaluating library statistics.
Three key concepts to remember when designing a research project of any kind, but especially statistical projects are Validitiy, Reliability and Generalizability. Validity is how well a variable measures a particular concept; For example – if we are measuring use, is it valid to count reshelving figures as use? Fulltext downloads? Reliability, the consistency of the variable, measurement, or test; One basic of scientific and statistical analysis is that the results can be confirmed by others repeating the experiment or data analysis. Without reliability, we would have Generalizability, means can the results be applied to other situations. For example: if you observed students using group study rooms in this library, could you generalize that use across all hours, days, buildings, user groups, or other institutions? If you can, your research can become a general model, a universal law, or immutable truth. But if you can’t, that just means that the results are applicable in common situations.
Designing research includes formulating initial hypotheses, or statements about what the researcher thinks the data will show, data collection (and manipulation) through a variety of techniques, and statistical analysis that is suited to the hypotheses and data. Here are the key stages for doing research. Statistics are only a tool to help us understand the outcome of the research. Much research can be done not employing statistical techniques – most ethnographic research relies on direct observation and not on analysis of statistics. Take medicine for example: drug works, drug is safe, prescribe drug. Observational data or microscopic data may suffice. But most research relies on statistical analysis of research data, no matter how it’s collected.
There are two basic designs to sample: a simple random sample and a stratified random sample, and they are pretty similar. Draw a single circle and a circle composed of other circles inside to provide visual aid. A simple random sample is what it sounds. A group of subjects is chosen from the whole population and each subject has an equal chance of being samples. If you took the campus directory and randomly selected a 100 names, that would be a simple random sample. Draw examples comparing simple and stratified. A stratified random sample is a bit more complex. It assumes that your population is composed of different types of individuals and that you want some knowledge about each group. For example, libraries often want to know how well they serve their communities and want to know something about students, faculty and staff. Are they meeting each of their needs? The solution to this problem is to divide up the population into each group and then randomly sample each group. Samples from each group are generally proportional to the size of each population.
Now comes to a really fun and interactive part of the workshop. In this study, we are going to sample M&Ms and try to figure out the frequency of colors. Not only that, but we’re going to test our results against what the Mars Candy Company says should be the frequency. Lets think about our M&M packs. At the plant, the company loads millions of these little candies into a big hopper and tries to mix them so that they are randomly distributed. When they get packaged, the company wants you to get some of each color, but does not regulate the number of colored candies going into your package – some may have more blues, some may have more yellows. Each of these packages, you can consider a random sample of the large hopper or bin of M&M candies. And if we sample enough of these packages, we should start getting close to the distribution of colors at the company. Remember, we are doing samples because we don’t have enough money to count all M&Ms sold in every store.
How is accuracy affected by size of sample? What would explain a difference between our observed results and M&M’s reported figures? Was our sample a good representation of the population? Is our methodology valid? Are our results generalizable?
A review of the Basic Statistics for Librarians workshop. The five components were statistical concepts, evaluation of literature, sampling, an introduction to usage statistics, and designing a research study. Concepts included frequency distributions including flat (no change), normal ( a bell curve shape), and skewed (very many sloping to very few or vice versa). Mean is the average of a group and median is the middle value of a set of ordered values. A standard deviation is the measure of the dispersion or variation in a sample. For a normal distribution, 68% of the data is found within +/- 1 SD, 95% is +/- 2, and over 99% is +/- 3. Three key concepts to remember when evaluating literature are Validity, how well a variable measures the concept being studied; Reliability, the consistency of the variable, measurement, or test; and Generalizability, can the results be applied to other situations. Sampling is the act of drawing a portion of subjects to measure from a larger population. A random sample is the strongest type of sample since it is assumed to be a fair representation of the population. More complex sampling includes stratified random sampling, where a portion from each representation group of the population is taken, or convenience sampling where the sample includes a non-random sample. Sample size is important and for small populations, more subjects are needed. Usage statistics are very important in applied librarianship today but researchers need to remember to ask questions such as what is being measured, who did what is measured, why they did it, and how many of them did it. Most datasets will include outliers and missing data that can impact the statistical tests but there are many techniques for dealing with these problem data.
Quiz – scale these types of data? Take a few minutes to write down what type of data these are, then we’ll go over them: Salary: ratio Author: nominal Hours: ratio Patron: nominal Publication: interval Ranked: Ordinal Tests: interval Articles: interval FTE: ratio
This is a histogram of fulltime enrollments at ARL schools Fulltime students average about 22 thousand The standard deviation is about 10 thousand. QUIZ How many schools fall between 12 and 32 thousand students? Answer: 68%
Lets now look at some real data from libraries to apply the concepts of mean and standard deviation This is histogram I generated from data I collected from American Research Libraries on total salaries and wages. There are 114 libraries included in this histogram Mean salary and wages at ARL libraries is about 10 million SD is about 6 and a half million
How do you get a Law named after you? The key stages of statistical research, for collection & analysis, I’ve just listed a few examples and will briefly go over them.
I’ll explain this in depth – how to get DF, how to do Chi-Square, etc.
I’ll explain this in depth – how to get DF, how to do Chi-Square, etc.
Designing research includes formulating initial hypotheses, or statements about what the researcher thinks the data will show, data collection (and manipulation) through a variety of techniques, and statistical analysis that is suited to the hypotheses and data. Here are the key stages for doing research. Statistics are only a tool to help us understand the outcome of the research. Much research can be done not employing statistical techniques – most ethnographic research relies on direct observation and not on analysis of statistics. Take medicine for example: drug works, drug is safe, prescribe drug. Observational data or microscopic data may suffice. But most research relies on statistical analysis of research data, no matter how it’s collected.
I’ll state some general introduction about each types of analysis. And then introduce Nichols’ et. al study as the first example.
Everyone will read the article and then we’ll go through these together, with each item coming out after someone states it.
After going through this, we’ll discuss what the study did right (pretest, posttest, survey), and did wrong, including assumptions (Not stating the null hypotheses, accepting the alternate hypothesis when should have been rejected.
As a group, the participants will read through this study and come up with the answers to the 5 questions, with discussion centering around the reliabilibity, validity, and generalizability, with a focus on finding out if the methods, variables, and tests fit the question.
Everyone will read the article and then we’ll go through these together, with each item coming out after someone states it.
After going through this, we’ll discuss what the study did right (pretest, posttest, survey), and did wrong, including assumptions (Not stating the null hypotheses, accepting the alternate hypothesis when should have been rejected.