SlideShare une entreprise Scribd logo
1  sur  153
Télécharger pour lire hors ligne
expl
ore
analyze
t
en
s
re
p

your data
Guillaume Calmettes
“Bonjour”, I am Guillaume!
Sacre Bleu!

Bordeaux

gcalmettes@mednet.ucla.edu
Office: MRL 3645
Disclaimer

I am not a
statistician
Statistics are scary

Statistics

(You at the beginning of the talk)
Statistics are scary
not so

Statistics

(You at the middle of the talk)
Statistics are scary
cool

Statistics

(You at the end of the talk)
Statistics are scary
cool

We have to deal with
them anyways, so we
had better enjoy them!

Statistics

(You at the end of the talk)
Press the 	

t-test button and
you’ll be done!

Did you check
the normality of
your data first?
Why should you care about statistics?

http://www.nature.com/nature/authors/gta/2e_Statistical_checklist.pdf
Why should you care about statistics?
Advances in Physiological Education

“Explorations in Statistics” series (2008-present)	

(Douglas Curran-Everett)
Why should you care about statistics?
“Statistical Perspectives” series (2011-present)	

(Gordon Drummond)
The Journal of Physiology	

Experimental Physiology	

The British Journal of Pharmacology	

Microcirculation	

The British Journal of Nutrition
http://jp.physoc.org/cgi/collection/stats_reporting
Why should you care about statistics?

Importance of being uncertain – September 2013

How samples are used to estimate population statistics and what this means in terms of
uncertainty.

Error Bars – October 2013

The use of error bars to represent uncertainty and advice on how to interpret them.

Significance, P values and t-tests – November 2013

Introduction to the concept of statistical significance and the one-sample t-test.
http://blogs.nature.com/methagora/2013/08/giving_statistics_the_attention_it_deserves.html
Why should you care about statistics?

“Journals […] fail to exert sufficient scrutiny over the results
that they publish”
“Nature research journals will introduce editorial measures to
address the problem by improving the consistency and quality of
reporting in life-sciences articles”
“We will examine statistics more closely and encourage authors
to be transparent, for example by including their raw data”
Look at

your data
A picture is worth a thousand words

John Snow
(1813-1858)
Location of deaths in the 1854 London Cholera Epidemic
Why visualize your data?
The Anscombe’s quartet example
Dataset #1

Dataset #2

Dataset #3

Dataset #4

x

y

x

y

x

y

x

y

10

8.04

10

9.14

10

7.46

8

6.58

8

6.95

8

8.14

8

6.77

8

5.76

13

7.58

13

8.74

13 12.74

8

7.71

9

8.81

9

8.77

9

7.11

8

8.84

11

8.33

11

9.26

11

7.81

8

8.47

14

9.96

14

8.1

14

8.84

8

7.04

6

7.24

6

6.13

6

6.08

8

5.25

4

4.26

4

3.1

4

5.39

19

12.5

12 10.84

12

9.13

12

8.15

8

5.56

7

4.82

7

7.26

7

6.42

8

7.91

5

5.68

5

4.74

5

5.73

8

6.89

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
Why visualize your data?
The Anscombe’s quartet example
Property in each case

Value

Mean of x

9 (exact)

Variance of x

11 (exact)

Mean of y

7.5

Variance of y

4.122 or 4.127

Correlation of x and y

0.816

Linear regression line

y = 3.00 + 0.500x

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
Why visualize your data?
The Anscombe’s quartet example
Dataset #1

Dataset #2

Dataset #3

Dataset #4

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
Why visualize your data?
The Anscombe’s quartet example
Dataset #1

Dataset #2

Dataset #3

Dataset #4

Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
Visualize your data in their raw form!
Aim for revelation rather than mere summary
A great graphic with raw data will reveal
unexpected patterns and invites us to
make comparisons we might not have
thought of beforehand.
If you are still not convinced …
Mean: 16 / Stdv: 5
If you are still not convinced …
Mean: 16 / Stdv: 5
If you are still not convinced …
Mean: 16 / Stdv: 5
e
WBM secondary transplantation
(16 weeks)

Daniel’s Journal Club paper

Donor engraftment (%)

80

P < 0.05

60
40
20
0

flDMR/+

DMR/+
mH19
Avoid making bar graphs
“To maintain the highest level of trustworthiness of data,
we are encouraging authors to display data in their raw
form and not in a fashion that conceals their variance.
Presenting data as columns with error bars (dynamite
plunger plots) conceals data. We recommend that
individual data be presented as dot plots shown next to
the average for the group with appropriate error bars
(Figure 1).”
Rockman H.A. (2012). "Great expectations". J Clin Invest 122 (4): 1133
Avoid making bar graphs

Error bars

Different types, different meanings
100
SORRY
,
WE JUST

75

YOU...

• descriptive statistics (Range, SD)
• inferential statistics (SE, CI)

50
25
0

Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
Avoid making bar graphs

Error bars

Different types, different meanings

• descriptive statistics (Range, SD)
• inferential statistics (SE, CI)
Often, they also imply a
symmetrical distribution of the
data.

Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
Avoid making bar graphs
Mean and Standard deviation are only useful in the	

context of a “normal distribution”
95%

µ

95% of a normal distribution lies within two
standard deviations (σ) of the mean (µ)
Avoid making bar graphs
symmetrical
distribution

skewed
distribution

Data presentation to reveal the distribution of the data
• Display data in their raw form.
• A dot plot is a good start.
• “Dynamite plunger plots” conceal data.
• Check the pattern of distribution of the values.
Avoid making bar graphs
symmetrical
distribution

skewed
distribution

• First set: Gaussian (or normal) distribution (symmetrically distributed)
• Second set: right skewed, lognormal (few large values)

“ This type of distribution of values is quite common in biology (ex: plasma concentrations
of immune or inflammatory mediators)”
“Plunger plots only: who would know that the values were skewed – ...
... and that the common statistical tests would be inappropriate?”
Avoid making bar graphs
Don't tell me no one warned you before!

Bar graph

Dynamite plunger
Summary
Why visualize your data?

For others ...
Providing a narrative for the reader
But primarily for you ...
Looking for patterns and relationships
Summarize complex data structures
Help avoid erroneous conclusions based upon questionable or
unexpected data
Chose the right descriptor for

your data
Averages can be misleading
Averages can be misleading
Averages can be misleading
Averages can be misleading
Is the mean always a good descriptor?
# of children per household in China (2012)

• mean: 1.35

http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
Is the mean always a good descriptor?
# of children per household in China (2012)

• mean: 1.35	

• median: 1
more representative of the 	

“typical” family (One child policy)

http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
Any measure is wrong!
“Whenever you make a measurement, you must
know the uncertainty otherwise it is meaningless”
Walter Lewis (MIT)

183.3cm

185.7cm
http://www.youtube.com/watch?v=JUxHebuXviM
Any measure is wrong!
“Whenever you make a measurement, you must
know the uncertainty otherwise it is meaningless”
Walter Lewis (MIT)

The same concept applies when you
report your data!
Provide the uncertainty of your descriptor	

hint: this is NOT the standard deviation
Any measure is wrong!
“Whenever you make a measurement, you must
know the uncertainty otherwise it is meaningless”
Walter Lewis (MIT)

The same concept applies when you
report your data!
Provide the uncertainty of your descriptor	

hint: this is NOT the standard deviation
Report the Confidence Interval of your descriptor
The Bootstrap: origin

Modern electronic computation has encouraged a host of new statistical methods
that require fewer distributional assumptions than their predecessors and
can be applied to more complicated statistical estimators. These methods allow
[...] to explore and describe data and draw valid statistical inferences without the
usual concerns for mathematical tractability.
Efron B. and Tibshirani R. (1991), Science, Jul 26;253(5018):390-5
Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
A1 A2
a4 a5
a3 a2
a1 an
a2 a1
a2 a3
a1 a5
mA1 mA2

A2
an
a1
an
a1
a3
a4
mA3

A2
a4
a3
an
a5
a1
a3
mA4

...

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
A1 A2
a4 a5
a3 a2
a1 an
a2 a1
a2 a3
a1 a5
mA1 mA2

A2
an
a1
an
a1
a3
a4
mA3

A2
a4
a3
an
a5
a1
a3
mA4

...

...

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
A1 A2
a4 a5
a3 a2
a1 an
a2 a1
a2 a3
a1 a5
mA1 mA2

A2
an
a1
an
a1
a3
a4
mA3

A2
a4
a3
an
a5
a1
a3
mA4

...

Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Computing the bootstrap 95% CI
A0 (m0)
a1 a4
a5 a2
a3 an
A1 A2
a4 a5
a3 a2
a1 an
a2 a1
a2 a3
a1 a5
mA1 mA2

A2
an
a1
an
a1
a3
a4
mA3

A2
a4
a3
an
a5
a1
a3
mA4

...

5.18 [4.91, 4.47]
Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
Analyze

your data
Choose your statistical test wisely
Authors Guidelines
Every paper that contains statistical testing should state
[...] a justification for the use of that test (including, for
example, a discussion of the normality of the data when the
test is appropriate only for normal data), [...], whether the
tests were one-tailed or two-tailed, and the actual P value
for each test (not merely "significant" or "P < 0.5").
http://www.nature.com/nature/authors/gta/#a5.6
The simple case (How to)
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male
The simple case (How to)
Distribution of the data?

mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male

• fit of the histogram
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male

• fit of the histogram
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8

• fit of the histogram	

• QQ plot

Male
ith point

A(i)

Theoretical quantiles of the distribution

Φ

−1

i − 3/8
n + 1/4
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male

• fit of the histogram	

• QQ plot
not “normal”
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8

• fit of the histogram	

• QQ plot
Female

Male

Male
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

visual	

inspection
mean/std	

187.0 ± 19.8

• fit of the histogram	

• QQ plot
Female

Male

Male
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

visual	

inspection
mean/std	

 test
187.0 ± 19.8
Male

• fit of the histogram	

• QQ plot	

• Shapiro-Wilk test
The simple case (How to)
Distribution of the data?
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

visual	

inspection
mean/std	

 test
187.0 ± 19.8
Male

• fit of the histogram	

• QQ plot	

• Shapiro-Wilk test

Null Hypothesis for the SW test:	

Data are normally distributed
Female	

p-value: 0.9195
Male	

p-value: 0.3866
The simple case (How to)
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male

Distribution of the data?
Normally distributed
The simple case (How to)
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male

Distribution of the data?
Normally distributed
The simple case (How to)
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male

Distribution of the data?
Normally distributed
The simple case (How to)
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male

Distribution of the data?
Normally distributed
Statistical test?
t-test
The simple case (How to)
difference/ci	

51.2 [50.4, 51.9]
mean/std	

135.9 ± 19.0
Female

mean/std	

187.0 ± 19.8
Male

Distribution of the data?
Normally distributed
Statistical test?
t-test
Null Hypothesis for the t-test:	

Data belong to the same population
t-test	

p-value < 2.2e-16
Usually it is not so simple
The “not so simple” case

S1

S2
The “not so simple” case

S1

S2
The “not so simple” case
S1

S2

S1

S2
The “not so simple” case
S1

S2

Shapiro-Wilk test:
S1 p-value: 7.4e-05
S2 p-value: 6.7e-06

S1

S2
What to do?
What to do?
For the t-test:	

!

Non parametric
alternatives

• Mann-Whitney U	

(independant)	

!

• Wilcoxon	


(dependant)
Choose a new statistical hero
Bootstrapman

t-test
Computing the bootstrap p-value
Are the two samples different?
Observed difference = 0.44
Computing the bootstrap p-value
Are the two samples different?
Observed difference = 0.44

If the two samples were from the same population,
what would the probabilities be that the observed
difference was from chance alone?
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1

D0 = 0.44

D1 = -0.83
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A2
B2
a1
b5
b3
a1
a4
an
a2
b1
b5
b5
b1
b5
mA2
mB2
D2 = mA2-mB2

D0 = 0.44

D1 = -0.83

D2 = 0.84
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1

Repeat
10000 times
(D1 ... D10000)
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1

Repeat
10000 times
(D1 ... D10000)

How many pseudo-differences are
greater or equal than the observed
difference D0 ?

(0.44)
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1

How many pseudo-differences are
greater or equal than the observed
difference D0 ?

Repeat
10000 times
(D1 ... D10000)

(0.44)

9829<D0

171>D0
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1

How many pseudo-differences are
greater or equal than the observed
difference D0 ?
171	

 = 0.0171
p=
10000
(one-tailed)

Repeat
10000 times
(D1 ... D10000)

(0.44)

9829<D0

171>D0
Computing the bootstrap p-value
A0
a1 a4
a5 a2
a3 an

D0 = mA-mB
(0.44)

B0
b2 b3 b1
b4 b5 bn

MW: p = 0.0169
171	

 = 0.0171
p=
10000
(one-tailed)

a4 b5 bn
b3 a b2 an b4
1b
a2 1 a3 a5
A1
B1
a4
b5
b3
b2
a1
an
a2
b1
b2
a3
b1
a5
mA1
mB1
D1 = mA1-mB1

How many pseudo-differences are
greater or equal than the observed
difference D0 ?

Repeat
10000 times
(D1 ... D10000)

(0.44)

9829<D0

171>D0
Summary
How do my data look like?
Distribution?

• visual inspection (hist. / QQ plot)
• normality test

What do I want to compare?

• parametric test
Right statistical test? • non parametric test
• resampling statistics
The dark side of the

p-value
Statistical significance
“The effect of the drug was statistically significant.”
Statistical significance
“The effect of the drug was statistically significant.”

so what?
Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
Training has a larger effect in the mutant
mice than in the control mice!
Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
Training has a larger effect in the mutant
mice than in the control mice!
Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
*
Activity

Extreme scenario:	

- training-induced activity barely reaches
significance in mutant mice (e.g., 0.049) and
barely fails to reach significance for control
mice (e.g., 0.051)

-

+

-

+

control
mutant
Does not test whether training effect for mutant mice differs
statistically from that for control mice.
Statistical significance (example)
“The percentage of neurons showing cue-related activity
increased with training in the mutant mice (P<0.05) but
not in the control mice (P>0.05).”
When making a comparison between two
effects, always report the statistical
significance of their difference rather than
the difference between significance levels.
Nieuwenhuis S. and al. (2011), “Erroneous analyses of interactions in neuroscience: a problem of significance”,
Nat Neuroscience, 14(9):1105-1107
P-values do not convey information
Mean: 16
SD: 5

Mean: 20
SD: 5
Difference = 4
p-value = 0.1090
P-values do not convey information
Mean: 16
SD: 5

Mean: 20
SD: 5
Difference = 4
p-value = 0.1090
0.0367
P-values do not convey information
Mean: 16
SD: 5

Mean: 20
SD: 5
Difference = 4
p-value = 0.1090
0.0367
0.0009
P-values do not convey information
Fact: Most applied scientists use p-values as a measure of evidence
and of the size of the effect
- The probability of hypotheses depends on much more than just the p-value.
- This topic has renewed importance with the advent of the massive multiple
testing often seen in genomics studies
8

“Manhattan plot”

-log10(P)

6

4

2

Loannidis JP, (2005) PLoS Med 2(8):e124

0

1

2

3

4

5

6

7

8

9

10 11 12

13 14 15 16 17 18 19

20
Report effect size and CIs instead
P-value is function of the sample size
Measured Effect Size:
difference = 0.018 mV
Amplitude (mV)

Control

Atropine

0.5 mV
100 ms

0.4
0.2
0

control

atropine

(n=6777) (n=5272)

Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
P-value is function of the sample size
Measured Effect Size:
difference = 0.018 mV
Amplitude (mV)

Control

Atropine

0.5 mV
100 ms

p = 10-5

0.4
0.2
0

control

atropine

(n=6777) (n=5272)

Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
P-value is function of the sample size
P (t-test)

100

not significant

10–2

significant
10–4
101

102

103

Hedges' g

0.4
0.2

0.018 mV

0
–0.2
–0.4
101

102

103

Sample size
Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
Bootstrap effect size and 95% CIs
a1 a2 a4
a5 a3 an

a5
a1
a5
a3

a3
a7
a1
a4

a2
a2
a9
a1

a6
a3
a4
a3

A

b1 b2 b4
b5 b3 bn

etc...
a1
a1 (10000 times)
a8
a6

b4
b2
b2
b1

b7
b5
b3
b4

b2
b1
b1
b1

b3
b8
b4
b5

B

etc...
b1
b1 (10000 times)
b2
b4

etc...

etc...

mA1 mA2 mA3 mA4 mA5

E1
E2
(mA1-mB1 ) (mA1-mB1 ) ...

mB1 mB2 mB3 mB4 mB5

E10000
(mA10000-mB10000 )
Bootstrap effect size and 95% CIs
a1 a2 a4
a5 a3 an

a5
a1
a5
a3

a3
a7
a1
a4

a2
a2
a9
a1

a6
a3
a4
a3

A

b1 b2 b4
b5 b3 bn
etc...

a1
a1 (10000 times)
a8
a6

(0.44)

b4
b2
b2
b1

b7
b5
b3
b4

b2
b1
b1
b1

b3
b8
b4
b5

B

etc...
b1
b1 (10000 times)
b2
b4

etc...

etc...

mA1 mA2 mA3 mA4 mA5

E1
E2
(mA1-mB1 ) (mA1-mB1 ) ...

mB1 mB2 mB3 mB4 mB5

E10000
(mA10000-mB10000 )
Bootstrap effect size and 95% CIs
a1 a2 a4
a5 a3 an

a5
a1
a5
a3

a3
a7
a1
a4

a2
a2
a9
a1

a6
a3
a4
a3

A

b1 b2 b4
b5 b3 bn
etc...

a1
a1 (10000 times)
a8
a6

(0.44)

b4
b2
b2
b1

b7
b5
b3
b4

b2
b1
b1
b1

b3
b8
b4
b5

B

etc...
b1
b1 (10000 times)
b2
b4

etc...

etc...

mA1 mA2 mA3 mA4 mA5

E1
E2
(mA1-mB1 ) (mA1-mB1 ) ...

mB1 mB2 mB3 mB4 mB5

E10000
(mA10000-mB10000 )
Bootstrap effect size and 95% CIs
a1 a2 a4
a5 a3 an

a5
a1
a5
a3

a3
a7
a1
a4

a2
a2
a9
a1

a6
a3
a4
a3

A

b1 b2 b4
b5 b3 bn
etc...

a1
a1 (10000 times)
a8
250th
a6

(0.44)

b4
b2
b
9750th2
b1

b7
b5
b3
b4

b2
b1
b1
b1

b3
b8
b4
b5

B

etc...
b1
b1 (10000 times)
b2
b4

etc...

etc...

mA1 mA2 mA3 mA4 mA5

E1
E2
(mA1-mB1 ) (mA1-mB1 ) ...

mB1 mB2 mB3 mB4 mB5

E10000
(mA10000-mB10000 )
Bootstrap effect size and 95% CIs
Do the 95% confidence intervals of
the observed effect size include
zero (no difference)?
0.44 [0.042, 0.853]
Eff. size = 0.44

A

B

250th

9750th
Statistical vs Biological

significance
Statistical vs Biological significance
“The P value reported by tests is a probabilistic significance, not a
biological one.”
“Statistical significance suggests but does not imply biological
significance.”

Krzywinski M and Altman N (2013) "Points of significance: Significance, P values and t-tests”.
Nature Methods 10, 1041–1042
Statistical vs Biological significance
Statistical significance has a meaning in a specific context

No change
Small change
Large change

Biological consequences?
Statistical vs Biological significance
AB

PD
LP

LP 1

PY

LP 2
“Good enough” solutions

0.60

1,600

0.50

mRNA copy number

Conductances at +15 mV (µS/nF)

Somato-gastric ganglion

0.40
0.30
0.20
0.10
0

1,400
1,200
1,000
800
600
400
200

Kd

K Ca

A-type

0

shab

BK-KC

shal

Schulz D.J. et al. (2006) "Variable channel expression in identified single and electrically coupled neurons
in different animals". Nat Neurosci. 9: 356– 362
Statistical vs Biological significance

Madhvani R.V. et al. (2011) "Shaping a new Ca2+ conductance to suppress early afterdepolarizations in
cardiac myocytes". J Physiol 589(Pt 24):6081-92
Statistical vs Biological significance
Breast cancer study	

Difference in cancer returning between control vs
low-fat diet groups.
Authors conclusions:	

People with low-fat diets had a 25% less chance of cancer returning
Statistical vs Biological significance
Breast cancer study	

Difference in cancer returning between control vs
low-fat diet groups.
Authors conclusions:	

People with low-fat diets had a 25% less chance of cancer returning
Actual return rates:	

- control: 12.4%	

- low-fat diet: 9.8%

Difference	

2.6%

2.6	

9.8 =

26.5%
Beware of false positives

(from the authors)
Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic
Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
Beware of false positives

Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic
Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
Beware of false positives

2012
Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic
Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
Beware of false positives

http://xkcd.com/882/
Present

your data
Know your audience
Know your audience
Who?
Why?
What?
How?
Know your audience
who is my audience? level of understanding?
Who? what do they already know?

Why?
What?
How?
Know your audience
who is my audience? level of understanding?
Who? what do they already know?
why am I presenting?
Why? what do my audience want to achieve?

What?
How?
Know your audience
who is my audience? level of understanding?
Who? what do they already know?
why am I presenting?
Why? what do my audience want to achieve?
what do I want my audience to know?
What? which story will captivate the audience?

How?
Know your audience
who is my audience? level of understanding?
Who? what do they already know?
why am I presenting?
Why? what do my audience want to achieve?
what do I want my audience to know?
What? which story will captivate the audience?
what medium will support the message the best?
How? what format/layout will appeal to the audience?
Color blindness is a common disease
Males: one in 12 (8%) / Females: one in 200 (0.5%)
Color blindness is a common disease
“Anyone who needs to be convinced that making scientific
images more accessible is a worthwhile task [...]: if your next
grant or manuscript submission contains color figures, what if
some of your reviewers are color blind? Will they be able to
appreciate your figures? Considering the competition for funding
and for publication, can you afford the possibility of frustrating
your audience? The solution is at hand."
Clarke, M. (2007). "Making figures comprehensible for color-blind readers" Nature blog
(http://blogs.nature.com/nautilus/2007/02/post_4.html)
Making figures for color blind people

Wong, B. (2011). "Points of view: Color blindness". Nature Methods 8, 441
Making figures for color blind people

http://colororacle.org/
Making figures for color blind people

http://colororacle.org/
Telling stories with data
“The Martini Glass Structure”

http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf
Telling stories with data
“The Martini Glass Structure”
GUIDED
START

!

EXPLORE

NARRATIVE

http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf
Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"
Aesthetic minimalism

Suda B. (2010). "A practical guide to Designing with Data"
Common mistakes in data reporting

Welcome to the FOX “Dishonest Charts” gallery
Common mistakes in data reporting
Common mistakes in data reporting
E. Tufte’s “Lie Factor”
Make things appear to be “better” than they are
by fiddling with the scales of things
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
Fig 1I

“We found that relative to WT mice, the luminal
microbiota of Il10−/− mice exhibited a ~100-fold
increase in E. coli (Fig. 1I)”

Arthur et al, (2012) Science 5;338(6103):120-3
Common mistakes in data reporting
A
B
C
D
E
Common mistakes in data reporting
A
B
C
D
E

20%

20%
20%

20%
20%
Common mistakes in data reporting
Common mistakes in data reporting
Common mistakes in data reporting
Percent Return on Investment
40
30
20
10
0
year1

40

year2

year3

Group
year4 Group A B

Percent Return on Investment
Group A

30

Group B

20
10
0
year1

year2

year3

year4
Thank you!

“The important thing is not to stop questioning.
Curiosity has its own reason for existing”
- Albert Einstein-

Contenu connexe

Similaire à Explore, Analyze and Present your data

Intro to statistics formatted (1)
Intro to statistics   formatted (1)Intro to statistics   formatted (1)
Intro to statistics formatted (1)Ulster BOCES
 
Intro to statistics formatted
Intro to statistics   formattedIntro to statistics   formatted
Intro to statistics formattedUlster BOCES
 
assignment of statistics 2.pdf
assignment of statistics 2.pdfassignment of statistics 2.pdf
assignment of statistics 2.pdfSyedDaniyalKazmi2
 
Chapter 0: the what and why of statistics
Chapter 0: the what and why of statisticsChapter 0: the what and why of statistics
Chapter 0: the what and why of statisticsChristian Robert
 
Contingency tables
Contingency tablesContingency tables
Contingency tablesPaul Gardner
 
1 introduction and basic concepts
1 introduction and basic  concepts1 introduction and basic  concepts
1 introduction and basic conceptsLama K Banna
 
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docxeugeniadean34240
 
Databeers Dub #1 - Cathal Walsh - Statistics in the Big Data World
Databeers Dub #1 - Cathal Walsh - Statistics in the Big Data WorldDatabeers Dub #1 - Cathal Walsh - Statistics in the Big Data World
Databeers Dub #1 - Cathal Walsh - Statistics in the Big Data WorldDatabeers Dublin
 
Debunk bullshit in statistics QN
Debunk bullshit in statistics QNDebunk bullshit in statistics QN
Debunk bullshit in statistics QNQuan Nguyen
 
03 chapter 3 application .pptx
03 chapter 3 application .pptx03 chapter 3 application .pptx
03 chapter 3 application .pptxHendmaarof
 
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxHomework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxpooleavelina
 
Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell COUNTDOWN on NTDs
 
Chapter One (Salkind)Statistics of Sadistics .docx
Chapter One (Salkind)Statistics of Sadistics .docxChapter One (Salkind)Statistics of Sadistics .docx
Chapter One (Salkind)Statistics of Sadistics .docxtiffanyd4
 
Presenting statistics in social media 2012
Presenting statistics in social media 2012Presenting statistics in social media 2012
Presenting statistics in social media 2012University of Pittsburgh
 
Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Damian T. Gordon
 

Similaire à Explore, Analyze and Present your data (20)

Intro to statistics formatted (1)
Intro to statistics   formatted (1)Intro to statistics   formatted (1)
Intro to statistics formatted (1)
 
Intro to statistics formatted
Intro to statistics   formattedIntro to statistics   formatted
Intro to statistics formatted
 
assignment of statistics 2.pdf
assignment of statistics 2.pdfassignment of statistics 2.pdf
assignment of statistics 2.pdf
 
Univariate Analysis
Univariate AnalysisUnivariate Analysis
Univariate Analysis
 
Lab 1 intro
Lab 1 introLab 1 intro
Lab 1 intro
 
Math Statistics Essay
Math Statistics EssayMath Statistics Essay
Math Statistics Essay
 
Chapter 0: the what and why of statistics
Chapter 0: the what and why of statisticsChapter 0: the what and why of statistics
Chapter 0: the what and why of statistics
 
Contingency tables
Contingency tablesContingency tables
Contingency tables
 
1 introduction and basic concepts
1 introduction and basic  concepts1 introduction and basic  concepts
1 introduction and basic concepts
 
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
2. This exercise uses the dataset WholeFoods.” (a) Use Excel to.docx
 
Databeers Dub #1 - Cathal Walsh - Statistics in the Big Data World
Databeers Dub #1 - Cathal Walsh - Statistics in the Big Data WorldDatabeers Dub #1 - Cathal Walsh - Statistics in the Big Data World
Databeers Dub #1 - Cathal Walsh - Statistics in the Big Data World
 
Debunk bullshit in statistics QN
Debunk bullshit in statistics QNDebunk bullshit in statistics QN
Debunk bullshit in statistics QN
 
Statistical Analysis
Statistical AnalysisStatistical Analysis
Statistical Analysis
 
03 chapter 3 application .pptx
03 chapter 3 application .pptx03 chapter 3 application .pptx
03 chapter 3 application .pptx
 
Applied statistics part 5
Applied statistics part 5Applied statistics part 5
Applied statistics part 5
 
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docxHomework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
Homework #1SOCY 3115Spring 20Read the Syllabus and FAQ on ho.docx
 
Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell Epidemiological Analysis Workshop By Dr Suzanne Campbell
Epidemiological Analysis Workshop By Dr Suzanne Campbell
 
Chapter One (Salkind)Statistics of Sadistics .docx
Chapter One (Salkind)Statistics of Sadistics .docxChapter One (Salkind)Statistics of Sadistics .docx
Chapter One (Salkind)Statistics of Sadistics .docx
 
Presenting statistics in social media 2012
Presenting statistics in social media 2012Presenting statistics in social media 2012
Presenting statistics in social media 2012
 
Introduction to Statistics - Part 1
Introduction to Statistics - Part 1Introduction to Statistics - Part 1
Introduction to Statistics - Part 1
 

Dernier

How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseCeline George
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfPrerana Jadhav
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsPooky Knightsmith
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvRicaMaeCastro1
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptxmary850239
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleCeline George
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17Celine George
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptxmary850239
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQuiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfJemuel Francisco
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptxJonalynLegaspi2
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfPatidar M
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQuiz Club NITW
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDhatriParmar
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...DhatriParmar
 

Dernier (20)

How to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 DatabaseHow to Make a Duplicate of Your Odoo 17 Database
How to Make a Duplicate of Your Odoo 17 Database
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Narcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdfNarcotic and Non Narcotic Analgesic..pdf
Narcotic and Non Narcotic Analgesic..pdf
 
Mental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young mindsMental Health Awareness - a toolkit for supporting young minds
Mental Health Awareness - a toolkit for supporting young minds
 
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnvESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
ESP 4-EDITED.pdfmmcncncncmcmmnmnmncnmncmnnjvnnv
 
4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx4.16.24 Poverty and Precarity--Desmond.pptx
4.16.24 Poverty and Precarity--Desmond.pptx
 
prashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Professionprashanth updated resume 2024 for Teaching Profession
prashanth updated resume 2024 for Teaching Profession
 
Multi Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP ModuleMulti Domain Alias In the Odoo 17 ERP Module
Multi Domain Alias In the Odoo 17 ERP Module
 
How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17How to Fix XML SyntaxError in Odoo the 17
How to Fix XML SyntaxError in Odoo the 17
 
4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx4.11.24 Poverty and Inequality in America.pptx
4.11.24 Poverty and Inequality in America.pptx
 
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITWQ-Factor General Quiz-7th April 2024, Quiz Club NITW
Q-Factor General Quiz-7th April 2024, Quiz Club NITW
 
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdfGrade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
Grade 9 Quarter 4 Dll Grade 9 Quarter 4 DLL.pdf
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
week 1 cookery 8 fourth - quarter .pptx
week 1 cookery 8  fourth  -  quarter .pptxweek 1 cookery 8  fourth  -  quarter .pptx
week 1 cookery 8 fourth - quarter .pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Paradigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTAParadigm shift in nursing research by RS MEHTA
Paradigm shift in nursing research by RS MEHTA
 
Active Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdfActive Learning Strategies (in short ALS).pdf
Active Learning Strategies (in short ALS).pdf
 
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITWQ-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
Q-Factor HISPOL Quiz-6th April 2024, Quiz Club NITW
 
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptxDecoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
Decoding the Tweet _ Practical Criticism in the Age of Hashtag.pptx
 
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
Beauty Amidst the Bytes_ Unearthing Unexpected Advantages of the Digital Wast...
 

Explore, Analyze and Present your data

  • 2. “Bonjour”, I am Guillaume! Sacre Bleu! Bordeaux gcalmettes@mednet.ucla.edu Office: MRL 3645
  • 3. Disclaimer I am not a statistician
  • 4. Statistics are scary Statistics (You at the beginning of the talk)
  • 5. Statistics are scary not so Statistics (You at the middle of the talk)
  • 7. Statistics are scary cool We have to deal with them anyways, so we had better enjoy them! Statistics (You at the end of the talk)
  • 8. Press the t-test button and you’ll be done! Did you check the normality of your data first?
  • 9. Why should you care about statistics? http://www.nature.com/nature/authors/gta/2e_Statistical_checklist.pdf
  • 10. Why should you care about statistics? Advances in Physiological Education “Explorations in Statistics” series (2008-present) (Douglas Curran-Everett)
  • 11. Why should you care about statistics? “Statistical Perspectives” series (2011-present) (Gordon Drummond) The Journal of Physiology Experimental Physiology The British Journal of Pharmacology Microcirculation The British Journal of Nutrition http://jp.physoc.org/cgi/collection/stats_reporting
  • 12. Why should you care about statistics? Importance of being uncertain – September 2013
 How samples are used to estimate population statistics and what this means in terms of uncertainty. Error Bars – October 2013
 The use of error bars to represent uncertainty and advice on how to interpret them. Significance, P values and t-tests – November 2013
 Introduction to the concept of statistical significance and the one-sample t-test. http://blogs.nature.com/methagora/2013/08/giving_statistics_the_attention_it_deserves.html
  • 13. Why should you care about statistics? “Journals […] fail to exert sufficient scrutiny over the results that they publish” “Nature research journals will introduce editorial measures to address the problem by improving the consistency and quality of reporting in life-sciences articles” “We will examine statistics more closely and encourage authors to be transparent, for example by including their raw data”
  • 15. A picture is worth a thousand words John Snow (1813-1858) Location of deaths in the 1854 London Cholera Epidemic
  • 16. Why visualize your data? The Anscombe’s quartet example Dataset #1 Dataset #2 Dataset #3 Dataset #4 x y x y x y x y 10 8.04 10 9.14 10 7.46 8 6.58 8 6.95 8 8.14 8 6.77 8 5.76 13 7.58 13 8.74 13 12.74 8 7.71 9 8.81 9 8.77 9 7.11 8 8.84 11 8.33 11 9.26 11 7.81 8 8.47 14 9.96 14 8.1 14 8.84 8 7.04 6 7.24 6 6.13 6 6.08 8 5.25 4 4.26 4 3.1 4 5.39 19 12.5 12 10.84 12 9.13 12 8.15 8 5.56 7 4.82 7 7.26 7 6.42 8 7.91 5 5.68 5 4.74 5 5.73 8 6.89 Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
  • 17. Why visualize your data? The Anscombe’s quartet example Property in each case Value Mean of x 9 (exact) Variance of x 11 (exact) Mean of y 7.5 Variance of y 4.122 or 4.127 Correlation of x and y 0.816 Linear regression line y = 3.00 + 0.500x Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
  • 18. Why visualize your data? The Anscombe’s quartet example Dataset #1 Dataset #2 Dataset #3 Dataset #4 Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
  • 19. Why visualize your data? The Anscombe’s quartet example Dataset #1 Dataset #2 Dataset #3 Dataset #4 Anscombe, F. J. (1973). "Graphs in Statistical Analysis". American Statistician 27 (1): 17–21
  • 20. Visualize your data in their raw form! Aim for revelation rather than mere summary A great graphic with raw data will reveal unexpected patterns and invites us to make comparisons we might not have thought of beforehand.
  • 21. If you are still not convinced … Mean: 16 / Stdv: 5
  • 22. If you are still not convinced … Mean: 16 / Stdv: 5
  • 23. If you are still not convinced … Mean: 16 / Stdv: 5 e WBM secondary transplantation (16 weeks) Daniel’s Journal Club paper Donor engraftment (%) 80 P < 0.05 60 40 20 0 flDMR/+ DMR/+ mH19
  • 24. Avoid making bar graphs “To maintain the highest level of trustworthiness of data, we are encouraging authors to display data in their raw form and not in a fashion that conceals their variance. Presenting data as columns with error bars (dynamite plunger plots) conceals data. We recommend that individual data be presented as dot plots shown next to the average for the group with appropriate error bars (Figure 1).” Rockman H.A. (2012). "Great expectations". J Clin Invest 122 (4): 1133
  • 25. Avoid making bar graphs Error bars Different types, different meanings 100 SORRY , WE JUST 75 YOU... • descriptive statistics (Range, SD) • inferential statistics (SE, CI) 50 25 0 Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
  • 26. Avoid making bar graphs Error bars Different types, different meanings • descriptive statistics (Range, SD) • inferential statistics (SE, CI) Often, they also imply a symmetrical distribution of the data. Cumming, G. et al. (2007). "Error bars in experimental biology". J Cell Biol 177 (1): 7–11
  • 27. Avoid making bar graphs Mean and Standard deviation are only useful in the context of a “normal distribution” 95% µ 95% of a normal distribution lies within two standard deviations (σ) of the mean (µ)
  • 28. Avoid making bar graphs symmetrical distribution skewed distribution Data presentation to reveal the distribution of the data • Display data in their raw form. • A dot plot is a good start. • “Dynamite plunger plots” conceal data. • Check the pattern of distribution of the values.
  • 29. Avoid making bar graphs symmetrical distribution skewed distribution • First set: Gaussian (or normal) distribution (symmetrically distributed) • Second set: right skewed, lognormal (few large values) “ This type of distribution of values is quite common in biology (ex: plasma concentrations of immune or inflammatory mediators)” “Plunger plots only: who would know that the values were skewed – ... ... and that the common statistical tests would be inappropriate?”
  • 30. Avoid making bar graphs Don't tell me no one warned you before! Bar graph Dynamite plunger
  • 31. Summary Why visualize your data? For others ... Providing a narrative for the reader But primarily for you ... Looking for patterns and relationships Summarize complex data structures Help avoid erroneous conclusions based upon questionable or unexpected data
  • 32. Chose the right descriptor for your data
  • 33. Averages can be misleading
  • 34. Averages can be misleading
  • 35. Averages can be misleading
  • 36. Averages can be misleading
  • 37. Is the mean always a good descriptor? # of children per household in China (2012) • mean: 1.35 http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
  • 38. Is the mean always a good descriptor? # of children per household in China (2012) • mean: 1.35 • median: 1 more representative of the “typical” family (One child policy) http://www.globalhealthfacts.org/data/topic/map.aspx?ind=87
  • 39. Any measure is wrong! “Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless” Walter Lewis (MIT) 183.3cm 185.7cm http://www.youtube.com/watch?v=JUxHebuXviM
  • 40. Any measure is wrong! “Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless” Walter Lewis (MIT) The same concept applies when you report your data! Provide the uncertainty of your descriptor hint: this is NOT the standard deviation
  • 41. Any measure is wrong! “Whenever you make a measurement, you must know the uncertainty otherwise it is meaningless” Walter Lewis (MIT) The same concept applies when you report your data! Provide the uncertainty of your descriptor hint: this is NOT the standard deviation Report the Confidence Interval of your descriptor
  • 42. The Bootstrap: origin Modern electronic computation has encouraged a host of new statistical methods that require fewer distributional assumptions than their predecessors and can be applied to more complicated statistical estimators. These methods allow [...] to explore and describe data and draw valid statistical inferences without the usual concerns for mathematical tractability. Efron B. and Tibshirani R. (1991), Science, Jul 26;253(5018):390-5
  • 43. Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
  • 44. Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an A1 A2 a4 a5 a3 a2 a1 an a2 a1 a2 a3 a1 a5 mA1 mA2 A2 an a1 an a1 a3 a4 mA3 A2 a4 a3 an a5 a1 a3 mA4 ... Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
  • 45. Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an A1 A2 a4 a5 a3 a2 a1 an a2 a1 a2 a3 a1 a5 mA1 mA2 A2 an a1 an a1 a3 a4 mA3 A2 a4 a3 an a5 a1 a3 mA4 ... ... Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
  • 46. Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an A1 A2 a4 a5 a3 a2 a1 an a2 a1 a2 a3 a1 a5 mA1 mA2 A2 an a1 an a1 a3 a4 mA3 A2 a4 a3 an a5 a1 a3 mA4 ... Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
  • 47. Computing the bootstrap 95% CI A0 (m0) a1 a4 a5 a2 a3 an A1 A2 a4 a5 a3 a2 a1 an a2 a1 a2 a3 a1 a5 mA1 mA2 A2 an a1 an a1 a3 a4 mA3 A2 a4 a3 an a5 a1 a3 mA4 ... 5.18 [4.91, 4.47] Calmettes G. and al. (2012), “Making do with what we have: use your bootstrap”, J Physiol, 590(15):3403-3406
  • 49. Choose your statistical test wisely Authors Guidelines Every paper that contains statistical testing should state [...] a justification for the use of that test (including, for example, a discussion of the normality of the data when the test is appropriate only for normal data), [...], whether the tests were one-tailed or two-tailed, and the actual P value for each test (not merely "significant" or "P < 0.5"). http://www.nature.com/nature/authors/gta/#a5.6
  • 50. The simple case (How to) mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male
  • 51. The simple case (How to) Distribution of the data? mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male
  • 52. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male
  • 53. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male • fit of the histogram
  • 54. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male • fit of the histogram
  • 55. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 • fit of the histogram • QQ plot Male ith point A(i) Theoretical quantiles of the distribution Φ −1 i − 3/8 n + 1/4
  • 56. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male • fit of the histogram • QQ plot not “normal”
  • 57. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 • fit of the histogram • QQ plot Female Male Male
  • 58. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female visual inspection mean/std 187.0 ± 19.8 • fit of the histogram • QQ plot Female Male Male
  • 59. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female visual inspection mean/std test 187.0 ± 19.8 Male • fit of the histogram • QQ plot • Shapiro-Wilk test
  • 60. The simple case (How to) Distribution of the data? difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female visual inspection mean/std test 187.0 ± 19.8 Male • fit of the histogram • QQ plot • Shapiro-Wilk test Null Hypothesis for the SW test: Data are normally distributed Female p-value: 0.9195 Male p-value: 0.3866
  • 61. The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed
  • 62. The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed
  • 63. The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed
  • 64. The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed Statistical test? t-test
  • 65. The simple case (How to) difference/ci 51.2 [50.4, 51.9] mean/std 135.9 ± 19.0 Female mean/std 187.0 ± 19.8 Male Distribution of the data? Normally distributed Statistical test? t-test Null Hypothesis for the t-test: Data belong to the same population t-test p-value < 2.2e-16
  • 66. Usually it is not so simple
  • 67. The “not so simple” case S1 S2
  • 68. The “not so simple” case S1 S2
  • 69. The “not so simple” case S1 S2 S1 S2
  • 70. The “not so simple” case S1 S2 Shapiro-Wilk test: S1 p-value: 7.4e-05 S2 p-value: 6.7e-06 S1 S2
  • 72. What to do? For the t-test: ! Non parametric alternatives • Mann-Whitney U (independant) ! • Wilcoxon (dependant)
  • 73. Choose a new statistical hero Bootstrapman t-test
  • 74. Computing the bootstrap p-value Are the two samples different? Observed difference = 0.44
  • 75. Computing the bootstrap p-value Are the two samples different? Observed difference = 0.44 If the two samples were from the same population, what would the probabilities be that the observed difference was from chance alone?
  • 76. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn
  • 77. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5
  • 78. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1
  • 79. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 D0 = 0.44 D1 = -0.83
  • 80. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A2 B2 a1 b5 b3 a1 a4 an a2 b1 b5 b5 b1 b5 mA2 mB2 D2 = mA2-mB2 D0 = 0.44 D1 = -0.83 D2 = 0.84
  • 81. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 Repeat 10000 times (D1 ... D10000)
  • 82. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 Repeat 10000 times (D1 ... D10000) How many pseudo-differences are greater or equal than the observed difference D0 ? (0.44)
  • 83. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 How many pseudo-differences are greater or equal than the observed difference D0 ? Repeat 10000 times (D1 ... D10000) (0.44) 9829<D0 171>D0
  • 84. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 How many pseudo-differences are greater or equal than the observed difference D0 ? 171 = 0.0171 p= 10000 (one-tailed) Repeat 10000 times (D1 ... D10000) (0.44) 9829<D0 171>D0
  • 85. Computing the bootstrap p-value A0 a1 a4 a5 a2 a3 an D0 = mA-mB (0.44) B0 b2 b3 b1 b4 b5 bn MW: p = 0.0169 171 = 0.0171 p= 10000 (one-tailed) a4 b5 bn b3 a b2 an b4 1b a2 1 a3 a5 A1 B1 a4 b5 b3 b2 a1 an a2 b1 b2 a3 b1 a5 mA1 mB1 D1 = mA1-mB1 How many pseudo-differences are greater or equal than the observed difference D0 ? Repeat 10000 times (D1 ... D10000) (0.44) 9829<D0 171>D0
  • 86. Summary How do my data look like? Distribution? • visual inspection (hist. / QQ plot) • normality test What do I want to compare? • parametric test Right statistical test? • non parametric test • resampling statistics
  • 87. The dark side of the p-value
  • 88. Statistical significance “The effect of the drug was statistically significant.”
  • 89. Statistical significance “The effect of the drug was statistically significant.” so what?
  • 90. Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).”
  • 91. Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).” Training has a larger effect in the mutant mice than in the control mice!
  • 92. Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).” Training has a larger effect in the mutant mice than in the control mice!
  • 93. Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).” * Activity Extreme scenario: - training-induced activity barely reaches significance in mutant mice (e.g., 0.049) and barely fails to reach significance for control mice (e.g., 0.051) - + - + control mutant Does not test whether training effect for mutant mice differs statistically from that for control mice.
  • 94. Statistical significance (example) “The percentage of neurons showing cue-related activity increased with training in the mutant mice (P<0.05) but not in the control mice (P>0.05).” When making a comparison between two effects, always report the statistical significance of their difference rather than the difference between significance levels. Nieuwenhuis S. and al. (2011), “Erroneous analyses of interactions in neuroscience: a problem of significance”, Nat Neuroscience, 14(9):1105-1107
  • 95. P-values do not convey information Mean: 16 SD: 5 Mean: 20 SD: 5 Difference = 4 p-value = 0.1090
  • 96. P-values do not convey information Mean: 16 SD: 5 Mean: 20 SD: 5 Difference = 4 p-value = 0.1090 0.0367
  • 97. P-values do not convey information Mean: 16 SD: 5 Mean: 20 SD: 5 Difference = 4 p-value = 0.1090 0.0367 0.0009
  • 98. P-values do not convey information Fact: Most applied scientists use p-values as a measure of evidence and of the size of the effect - The probability of hypotheses depends on much more than just the p-value. - This topic has renewed importance with the advent of the massive multiple testing often seen in genomics studies 8 “Manhattan plot” -log10(P) 6 4 2 Loannidis JP, (2005) PLoS Med 2(8):e124 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
  • 99. Report effect size and CIs instead
  • 100. P-value is function of the sample size Measured Effect Size: difference = 0.018 mV Amplitude (mV) Control Atropine 0.5 mV 100 ms 0.4 0.2 0 control atropine (n=6777) (n=5272) Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
  • 101. P-value is function of the sample size Measured Effect Size: difference = 0.018 mV Amplitude (mV) Control Atropine 0.5 mV 100 ms p = 10-5 0.4 0.2 0 control atropine (n=6777) (n=5272) Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
  • 102. P-value is function of the sample size P (t-test) 100 not significant 10–2 significant 10–4 101 102 103 Hedges' g 0.4 0.2 0.018 mV 0 –0.2 –0.4 101 102 103 Sample size Hentschke, H. et al. (2011). "Computation of measures of effect size for neuroscience data sets". Eur J Neurosci. 34(12):1887–94
  • 103. Bootstrap effect size and 95% CIs a1 a2 a4 a5 a3 an a5 a1 a5 a3 a3 a7 a1 a4 a2 a2 a9 a1 a6 a3 a4 a3 A b1 b2 b4 b5 b3 bn etc... a1 a1 (10000 times) a8 a6 b4 b2 b2 b1 b7 b5 b3 b4 b2 b1 b1 b1 b3 b8 b4 b5 B etc... b1 b1 (10000 times) b2 b4 etc... etc... mA1 mA2 mA3 mA4 mA5 E1 E2 (mA1-mB1 ) (mA1-mB1 ) ... mB1 mB2 mB3 mB4 mB5 E10000 (mA10000-mB10000 )
  • 104. Bootstrap effect size and 95% CIs a1 a2 a4 a5 a3 an a5 a1 a5 a3 a3 a7 a1 a4 a2 a2 a9 a1 a6 a3 a4 a3 A b1 b2 b4 b5 b3 bn etc... a1 a1 (10000 times) a8 a6 (0.44) b4 b2 b2 b1 b7 b5 b3 b4 b2 b1 b1 b1 b3 b8 b4 b5 B etc... b1 b1 (10000 times) b2 b4 etc... etc... mA1 mA2 mA3 mA4 mA5 E1 E2 (mA1-mB1 ) (mA1-mB1 ) ... mB1 mB2 mB3 mB4 mB5 E10000 (mA10000-mB10000 )
  • 105. Bootstrap effect size and 95% CIs a1 a2 a4 a5 a3 an a5 a1 a5 a3 a3 a7 a1 a4 a2 a2 a9 a1 a6 a3 a4 a3 A b1 b2 b4 b5 b3 bn etc... a1 a1 (10000 times) a8 a6 (0.44) b4 b2 b2 b1 b7 b5 b3 b4 b2 b1 b1 b1 b3 b8 b4 b5 B etc... b1 b1 (10000 times) b2 b4 etc... etc... mA1 mA2 mA3 mA4 mA5 E1 E2 (mA1-mB1 ) (mA1-mB1 ) ... mB1 mB2 mB3 mB4 mB5 E10000 (mA10000-mB10000 )
  • 106. Bootstrap effect size and 95% CIs a1 a2 a4 a5 a3 an a5 a1 a5 a3 a3 a7 a1 a4 a2 a2 a9 a1 a6 a3 a4 a3 A b1 b2 b4 b5 b3 bn etc... a1 a1 (10000 times) a8 250th a6 (0.44) b4 b2 b 9750th2 b1 b7 b5 b3 b4 b2 b1 b1 b1 b3 b8 b4 b5 B etc... b1 b1 (10000 times) b2 b4 etc... etc... mA1 mA2 mA3 mA4 mA5 E1 E2 (mA1-mB1 ) (mA1-mB1 ) ... mB1 mB2 mB3 mB4 mB5 E10000 (mA10000-mB10000 )
  • 107. Bootstrap effect size and 95% CIs Do the 95% confidence intervals of the observed effect size include zero (no difference)? 0.44 [0.042, 0.853] Eff. size = 0.44 A B 250th 9750th
  • 109. Statistical vs Biological significance “The P value reported by tests is a probabilistic significance, not a biological one.” “Statistical significance suggests but does not imply biological significance.” Krzywinski M and Altman N (2013) "Points of significance: Significance, P values and t-tests”. Nature Methods 10, 1041–1042
  • 110. Statistical vs Biological significance Statistical significance has a meaning in a specific context No change Small change Large change Biological consequences?
  • 111. Statistical vs Biological significance AB PD LP LP 1 PY LP 2 “Good enough” solutions 0.60 1,600 0.50 mRNA copy number Conductances at +15 mV (µS/nF) Somato-gastric ganglion 0.40 0.30 0.20 0.10 0 1,400 1,200 1,000 800 600 400 200 Kd K Ca A-type 0 shab BK-KC shal Schulz D.J. et al. (2006) "Variable channel expression in identified single and electrically coupled neurons in different animals". Nat Neurosci. 9: 356– 362
  • 112. Statistical vs Biological significance Madhvani R.V. et al. (2011) "Shaping a new Ca2+ conductance to suppress early afterdepolarizations in cardiac myocytes". J Physiol 589(Pt 24):6081-92
  • 113. Statistical vs Biological significance Breast cancer study Difference in cancer returning between control vs low-fat diet groups. Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning
  • 114. Statistical vs Biological significance Breast cancer study Difference in cancer returning between control vs low-fat diet groups. Authors conclusions: People with low-fat diets had a 25% less chance of cancer returning Actual return rates: - control: 12.4% - low-fat diet: 9.8% Difference 2.6% 2.6 9.8 = 26.5%
  • 115. Beware of false positives (from the authors) Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
  • 116. Beware of false positives Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
  • 117. Beware of false positives 2012 Bennett C. et al. (2010) “Neural Correlates of Interspecies Perspective Taking in the Post-Mortem Atlantic Salmon: An Argument For Proper Multiple Comparisons Correction”. JSUR, 2010. 1(1):1-5
  • 118. Beware of false positives http://xkcd.com/882/
  • 122. Know your audience who is my audience? level of understanding? Who? what do they already know? Why? What? How?
  • 123. Know your audience who is my audience? level of understanding? Who? what do they already know? why am I presenting? Why? what do my audience want to achieve? What? How?
  • 124. Know your audience who is my audience? level of understanding? Who? what do they already know? why am I presenting? Why? what do my audience want to achieve? what do I want my audience to know? What? which story will captivate the audience? How?
  • 125. Know your audience who is my audience? level of understanding? Who? what do they already know? why am I presenting? Why? what do my audience want to achieve? what do I want my audience to know? What? which story will captivate the audience? what medium will support the message the best? How? what format/layout will appeal to the audience?
  • 126. Color blindness is a common disease Males: one in 12 (8%) / Females: one in 200 (0.5%)
  • 127. Color blindness is a common disease “Anyone who needs to be convinced that making scientific images more accessible is a worthwhile task [...]: if your next grant or manuscript submission contains color figures, what if some of your reviewers are color blind? Will they be able to appreciate your figures? Considering the competition for funding and for publication, can you afford the possibility of frustrating your audience? The solution is at hand." Clarke, M. (2007). "Making figures comprehensible for color-blind readers" Nature blog (http://blogs.nature.com/nautilus/2007/02/post_4.html)
  • 128. Making figures for color blind people Wong, B. (2011). "Points of view: Color blindness". Nature Methods 8, 441
  • 129. Making figures for color blind people http://colororacle.org/
  • 130. Making figures for color blind people http://colororacle.org/
  • 131. Telling stories with data “The Martini Glass Structure” http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf
  • 132. Telling stories with data “The Martini Glass Structure” GUIDED START ! EXPLORE NARRATIVE http://vis.stanford.edu/files/2010-Narrative-InfoVis.pdf
  • 133. Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
  • 134. Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
  • 135. Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
  • 136. Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
  • 137. Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
  • 138. Aesthetic minimalism Suda B. (2010). "A practical guide to Designing with Data"
  • 139. Common mistakes in data reporting Welcome to the FOX “Dishonest Charts” gallery
  • 140. Common mistakes in data reporting
  • 141. Common mistakes in data reporting E. Tufte’s “Lie Factor” Make things appear to be “better” than they are by fiddling with the scales of things
  • 142. Common mistakes in data reporting
  • 143. Common mistakes in data reporting
  • 144. Common mistakes in data reporting
  • 145. Common mistakes in data reporting
  • 146. Common mistakes in data reporting
  • 147. Common mistakes in data reporting Fig 1I “We found that relative to WT mice, the luminal microbiota of Il10−/− mice exhibited a ~100-fold increase in E. coli (Fig. 1I)” Arthur et al, (2012) Science 5;338(6103):120-3
  • 148. Common mistakes in data reporting A B C D E
  • 149. Common mistakes in data reporting A B C D E 20% 20% 20% 20% 20%
  • 150. Common mistakes in data reporting
  • 151. Common mistakes in data reporting
  • 152. Common mistakes in data reporting Percent Return on Investment 40 30 20 10 0 year1 40 year2 year3 Group year4 Group A B Percent Return on Investment Group A 30 Group B 20 10 0 year1 year2 year3 year4
  • 153. Thank you! “The important thing is not to stop questioning. Curiosity has its own reason for existing” - Albert Einstein-