SlideShare une entreprise Scribd logo
1  sur  22
Télécharger pour lire hors ligne
A Geometric Note on a Type of Multiple Testing
Dipak K Dey, Junfeng Liu, Nalini Ravishanker, Edwards Qiang Zhang (07-24-2015)
ABSTRACT. For a collection of subjects, the within-subject replicate measurements are usually
modeled as subject-specific mean (zero and/or non-zero) plus random noises. For the problem of
selecting a set of potentially significant subjects (likely with non-zero means) out of all subjects,
we study some new aspects of the elegant false discovery rate (FDR) control procedure proposed
by Benjamini and Hochberg (1995).
1 Introduction
We present the collected measurements as
yi,j = µi + ϵi,j, ϵi,j ∼ N(0, σ2
i ), j = 1, . . . , m, i = 1, . . . , n,
where n is the total number of subjects and m is the sample size (number of replicates) for each
subject. A type of confidence interval for each subject mean (µi) could be constructed as
µi ∈ ¯yi ± Ctm−1,1− α
2
ˆσm−1/
√
m, i = 1, . . . , n.
where, 1 − α is the prescribed confidence level, subject-specific variance is estimated as ˆσ2
m−1 =
1
m−1
∑m
j=1(yi,j − ¯yi)2
involving subject-specific mean estimator ¯yi = 1
m
∑m
j=1 yi,j, tm−1,1− α
2
is the
1 − α
2
quantile of the central Students’ t-distribution with degrees of freedom m − 1, and C is a
cross-the-board tuning parameter. Simply employing the following rule
|
√
m¯yi
ˆσm−1
| > Ctm−1,1−α
2
→ reject µi = 0
|
√
m¯yi
ˆσm−1
| ≤ Ctm−1,1−α
2
→ accept µi = 0
(1)
relates to checking out t-statistic based p-value, pi = 1 − F(|
√
m¯yi
ˆσm−1
|), where F is the probability
distribution function for certain random variable. For instance, F could be the probability distri-
bution of |Tm−1| (Tm−1 is the central Students’ t-statistic with degrees of freedom m − 1). Rule (1)
becomes into
1
pi < 1 − Fm−1(Ctm−1,1− α
2
) → reject µi = 0,
pi ≥ 1 − Fm−1(Ctm−1,1−α
2
) → accept µi = 0.
We first look at an example where three groups have subject mean (indexed by i = 1,. . .,n = 100
for each group) profiles µi = 0 (Group 1), µi = 0.01(1 + sin10i
n
) (Group 2) and µi = 0.10(1 + sin10i
n
)
(Group 3), respectively. The within-subject variation σ = 1. Under rule (1), the rejection proportion
profiles (α = 0.10, m varies from 6 to 10, C = 1 + j
20
(j = 1, . . . , 16)) are plotted in Figure 1.
Groups 1 and 2 have similar rejection proportion profiles since these two subject mean profiles are
substantially close to each other. Thus, the resultant false discovery rate in this manner is roughly
1/2 when we combine these two groups (H0 = Group 1 (zero mean), Ha = Group 2 (non-zero
mean)) under any values of m and C.
2 New perspectives
Built upon the ordered-p-value set ({p(j), 1 ≤ j ≤ n}) with p-values (indexed by rank j) being
arranged from the smallest to the largest, the elegant false discovery rate (FDR) control procedure
(Benjamini and Hochberg, 1995) would
reject all subjects with rank ≤ max{j : p(j) ≤
j
n
q, 1 ≤ j ≤ n}, (2)
where n is the total number of hypotheses (subjects) with H0 and Ha combined. If rejections are
found then the instant FDR is calculated as the proportion of wrong rejections out of all rejections
(H0 and Ha combined). If no rejections are found then the instant FDR is defined as 0. The
so-called FDR which is defined as the expectation of the instant FDRs is controlled at π0q, where
π0 is the proportion of H0 hypotheses (subjects) out of all hypotheses. For illustration purposes,
we set the subject mean function (Ha) as
f(u, x) = 0.08u(1 + |sin(6x)|u
), x ∈ [0, 1], u = 1, 2, . . . . (3)
The subject means under H0 are implemented through setting u = 0 at x = i
n0
(i = 1, . . . , n0), where
n0 is the number of subjects (hypotheses) under H0. The subject means under Ha are implemented
2
through setting x = i
n1
(i = 1, . . . , n1), where n1 is the number of subjects (hypotheses) under Ha.
Under any numerical simulation configuration (subject group size (n0, n1), within-subject variation
(σ), within-subject replicate/sample size (m)), separating Ha subjects from H0 is expected to be
easier as we increase Ha subject mean profile to ∞. We take a look at the resultant specificity
profiles and find they approach to a limit (regulated by q) as Ha mean profile increases. Such
a limit is achieved exactly once Ha mean profile is sufficiently large. We are thus motivated to
take a geometric view by juxtaposing the ordered-p-value profiles (H0 and Ha) along with an
overriding adaptive hypothesis rejection cut-off route (indexed by subjects, H0 and H1 combined)
for sequential p-value check. In Figure 2, the ordered-p-value profile under H0 roughly resembles
a straight line connecting points (π1,0) and (1,1). As ordered-p-value profile under Ha approaches
to the bottom (mean profile increases), the rejected hypothesis set includes all Ha and those H0
subjects with p-value located from D to B (Rule (2)). The limiting specificity is subsequently
calculated. Along the cut route (the solid line spanning from (0,0) to (x1,y1)) in Figure 2, each
check point j∗
∈ {1, . . . , n(= n0 + n1)} corresponds to a number (n0(j∗
)) of p-values (≤ j∗
n
q, under
H0) and another number (n1(j∗
)) of p-values (≤ j∗
n
q, under Ha) (Figure 3). All those hypotheses
linked to these n0(j∗
)+n1(j∗
) p-values will be rejected as long as n0(j∗
) + n1(j∗
) ≥ j∗
. However,
any check point (j∗
) along the cut route (Figure 2) which is beyond that one (j∗
B) corresponding to
point B would not be able to collect a sufficient number of hypotheses (H0 and Ha combined) such
that n0(j∗
) + n1(j∗
) ≥ j∗
. The set {j∗
− n0(j∗
) : 1 ≤ j∗
≤ j∗
B} roughly formulates a no-rejection
region boundary prescribed for Ha hypotheses (the bold dash line, Figure 3), i.e., there will be no
discovery (rejection) unless the ordered-p-value profile under Ha ever crosses this boundary from
upper portion (“NO REJECTION region”, Figure 3)) to the lower portion. When there is such a
crossing, geometric arguments show that the instant FDR is always around π0q no matter where the
crossing point is located along the no-rejection boundary. Numerical simulation would disclose some
operating characteristics under different specifications on experimental factors (e.g., within-subject
variation (σ), within-subject sample size (m), Ha subject mean profile, population size (n0 + n1),
H0 proportion (π0 = n0/(n0 + n1)), etc.). Moreover, we also try applying a quadratic cut route
reject all subjects with rank ≤ max{j : p(j) ≤ (
j
n
)2
q, 1 ≤ j ≤ n}, (4)
3
We summarize some observations.
• In Figure 2, the intersection (B) between H0 p-value profile ( y = (x − π1)/π0) and linear
cut route (y = xq) has location (x1,y1) with x1 = (1 − π0)/(1 − qπ0), the intersection (C)
between H0 p-value profile and quadratic cut route (y = x2
q) has location (x2,y2) with x2 =
(1 −
√
1 − 4π0(1 − π0)q)/(2qπ0).
• From Figure 3, when the probability of discovery= 1, FDR=pFDR (positive false discov-
ery rate)= π0q (constant) no matter where the ordered-p-value profile (Ha) crosses the no-
rejection boundary. The no-rejection boundary function g(x) = qx/(1 − qπ0) (0 ≤ x ≤ π1,
under linear cut) and g(x) = (1−2qπ0x)−
√
1−4qπ0x
2qπ2
0
(0 ≤ x ≤ π1, under quadratic cut). The
relationship between instant FDR(=pFDR) and no-rejection boundary function (g(x)) is
pFDR= π0g(x)/(x + π0g(x)) (0 ≤ x ≤ π1).
• In Figure 4, at each q, the instant FDR(=pFDR) increases with the location (x ∈ (0, π1), the
x-axis) where the ordered-p-value profile (Ha) crosses the no-rejection boundary. When q=1,
FDR= π0 for any cut routine (linear, quadratic).
• In Figure 5, under linear cut, when probability of discovery is less than one (e.g., ordered-
p-value profiles are close between H0 and Ha), pFDR>FDR and FDR= π0q. pFDR is less
sensitive to q compared to FDR. This is relevant to the observation in Figure 1 (Groups 1 and
2). In Figure 5, under quadratic cut, the FDR is much less than that under linear cut case.
When Ha mean profiles are close to zero, the pFDR is more volatile than linear cut case.
• Under linear cut, the specificity approaches to (1−q)/(1−qπ0) as µ increases. Under quadratic
cut, the specificity approaches to 1
π0
− 1−(1−4qπ0(1−π0))1/2
2qπ2
0
as µ increases. See Figures 5, 6, 7,
10, 11, 12.
• As ordered-p-value profile under Ha decreases (mean profile increases), the numbers of discov-
eries becomes very small. The number increases with Ha subject mean profile. The expected
number of discoveries under linear cut is higher than that under quadratic cut. The difference
is larger as π0 gets larger. See Figures 8, 9.
4
• As n increases, the limiting specificity profile approaches to the aforementioned calculated
curve more closely. See the left panels in Figures 5, 10.
• As π0 decreases, the limiting specificity profile approaches to the aforementioned calculated
curve more closely. See the left panels in Figures 10, 11.
• The specificity under linear cut is lower than that under quadratic cut and the difference
lessens as π0 decreases. The sensitivity under linear cut is higher than that under quadratic
cut. The probability of discovery under linear cut is higher than that under quadratic cut.
• We consider an unrealistic case where H0 ordered-p-value profile is not random: {i/n0, i =
1, . . . , n0}. The FDR and pFDR is less than π0q when the Ha mean profile is close to zero
(Figure 13).
• When σ (homogeneous among subjects) increases, the resultant cluster of profiles (collected
from mean profile set) behaves similarly to a sub-cluster of profiles (collected from mean profile
set with small values) with small σ (Figure 14 ).
• When σ is heterogeneous across subjects (roughly independent of subject mean), the proba-
bility of discovery tend to be larger (closer to one) than that under homogeneous σ case when
the mean profile is close to zero. The pFDR under heterogeneous σ is closer to FDR compared
to the case with homogeneous σ. All other profiles (sensitivity, specificity) are similar between
these two cases (homogeneous and heterogeneous σ) (Figure 15 ).
• If all Ha p-values are ≤ p, we reject all p-values ≤ p. The false rejection rate ≤ π0q amounts
to p ≤ π1q
π1+(1−q)π0
(Figures 16, 17).
3 A note on p-value
We numerically study the ranking of p-values through setting set size (n), Ha subject mean pro-
file (µ) and noise variance (σ2
) and others. Stochastic p-value rankings from both H0 and Ha
5
(Figures 18. Although the subject means are clearly ordered across domain [0, 1]) and the within-
subject variation is moderate (=1) or minor (=1/100), the rankings of p-values are substantially
fluctuating around a trend. The degree of shuffling seems to be similar between two cases (σ =1 and
1/100). The p-values are individually calculated for each subject without considering the overall
model structure (e.g., mean profile function, homogeneous variation, etc.). Each p-value is associ-
ated with a probability function, Pr(Tm−1 >
√
m¯xm/ˆσm), where ¯xm and ˆσm are independent of each
other. This pair of statistics (sample standard deviation, sample mean) is also used to estimate
the population coefficient of variation (σ/µ). The stochastic ˆσm has an substantial shuffling impact
on the ranking of ¯xm. For instance, given another subject *, the comparison between
√
m¯xm/ˆσm
and
√
m¯x∗
m/ˆσ∗
m may be confused by the stochastic relative magnitude between ˆσm and ˆσ∗
m. The
distribution of estimate of coefficient of variation is available (e.g., Hendricks and Robey (1936),
Vangel (1996)). Even pairwise comparison between any two subject means is generally complicated
under certain circumstances and numerical investigation is usually needed (e.g., Hsu (1938)).
References
[1] W.A. Hendricks, K.W. Robey (1936). The sampling distribution of the coefficient of variation.
The Annals of Mathematical Statistics 7(3): 129-132.
[2] P.L. Hsu (1938). Contribution to the theory of “Student’s” t-test as applied to the problem of
two samples. Statistical Research Memoirs 2: 1-24.
[3] Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate: A practical and
powerful approach to multiple testing. Journal of the Royal Statistical Society (B) 57: 289-300.
[4] M.G. Vangel (1996). Confidence intervals for a normal coefficient of variation. The American
Statistician 50(1): 21-26.
4 APPENDIX
6
0.0 0.2 0.4 0.6 0.8 1.0
−0.10.10.30.5
Subject means
Subject population
Subjectmean
(n=100 per group)
Group 1
2
3
6 7 8 9 10
0.000.050.100.15
Rejection proportion
m (replicates)
Rejectionproportion
(Group 1)
6 7 8 9 10
0.000.050.100.15
Rejection proportion
m (replicates)
Rejectionproportion
(Group 2)
6 7 8 9 10
0.000.050.100.15 Rejection proportion
m (replicates)
Rejectionproportion
(Group 3)
Figure 1: The rejection proportion profiles arising from applying testing rule (1) (α = 0.10). Three
groups (1,2,3) have subject mean (subject index i= 1, . . . , 100) profiles µi = 0, µi = 0.01(1 + sin10i
n
)
and µi = 0.10(1+sin10i
n
), respectively (the top-left panel). The within-subject variation (σ)= 1. The
tuning parameter (C) in rule (1) = 1+ j
20
(j = 1, . . . , 16) with resultant rejection proportion profiles
(with m spanning from 6 to 10) located from top to bottom in each panel (top-right, bottom-left,
bottom-right).
7
Geometry of false discovery rate control
B(x1,y1)
C(x2,y2)
A
D
Specificity=AB/AD (linear cut)
Specificity=AC/AD (quadratic cut)
π1 π0
q
p−value
H0
Ha
Figure 2: The bold dash line represents the ordered p-values from Ha with large positive means
(π1 = 0.7). The bold dot line represents the ordered p-values from H0 (π0 = 0.3). The solid lines
represent the linear and quadratic cut routes (x-axis is the ordered p-value index, y-axis is the
threshold for H0 rejection). Under Benjamini-Hochberg (1995) FDR control procedure, specificity
approaches to its limit as the alternative means increase. The intersection points between the
linear and quadratic cut routes and H0 ordered p-value profile are the final p-value cut-off point
for rejecting H0, which are labeled as B (location=(x1,y1)) and C (location=(x2,y2)), respectively.
The specificities are calculated.
8
FDR control
(linear cut)
π1 π0
(NO REJECTION region)
(Ha) q
p−value
FDR control
(quadratic cut)
π1 π0
(NO REJECTION region)
(Ha) q
p−value
Figure 3: The left panel shows the geometry of Benjamini-Hochberg FDR control procedure (1995).
The bold solid line represents the linear cut route (x-axis is the ordered p-value index, y-axis is
the threshold for H0 rejection). The bold dot line represents the ordered p-value profile under H0
(group size ∝ π0). The bold dash line represents the no-rejection region boundary for the ordered
p-values from Ha (group size ∝ π1). In the horizontal direction, the distance between the bold dash
and the solid lines equals the distance between the bold dot line and the point which separates the
two regions labeled by “π1” and “π0”, respectively). The right panel shows the geometry of FDR
control procedure under quadratic cut route.
9
0.0 0.2 0.4 0.6 0.8 1.0
0.00.10.20.30.4
Positive false discovery rate
(linear and quadratic cut)
Exceeding point (0 to π1) (Ha)
FDR
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Linear
Quadratic
q (1/10)
(10/10,by 1/10)
Figure 4: The FDR under linear and quadratic cut routes. FDR under linear cut is a constant
among exceeding points. FDR under quadratic cut is an increasing function of exceeding position.
10
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(900,100)
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Quadratic cut
q
Proportions
(n0,n1)=(900,100)
Figure 5: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes
(n0 = 900 (H0),n1 = 100 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
),
u = 1, . . . , 35.
11
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(500,500)
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Quadratic cut
q
Proportions
(n0,n1)=(500,500)
Figure 6: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes
(n0 = 500 (H0),n1 = 500 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
),
u = 1, . . . , 35.
12
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(100,900)
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Quadratic cut
q
Proportions
(n0,n1)=(100,900)
Figure 7: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes
(n0 = 100 (H0),n1 = 900 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
),
u = 1, . . . , 35.
13
0.0 0.2 0.4 0.6 0.8 1.0
020406080100 Linear cut
q
Numberofdiscoveries
(n0,n1)=(90,10)
0.0 0.2 0.4 0.6 0.8 1.0
020406080100
Quadratic cut
q
Numberofdiscoveries
(n0,n1)=(90,10)
Figure 8: The number of discoveries under linear and quadratic cut routes (n0 = 90 (H0),n1 = 10
(Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
), u = 1, . . . , 35.
0.0 0.2 0.4 0.6 0.8 1.0
020406080100
Linear cut
q
Numberofdiscoveries
(n0,n1)=(10,90)
0.0 0.2 0.4 0.6 0.8 1.0
020406080100
Quadratic cut
q
Numberofdiscoveries
(n0,n1)=(10,90)
Figure 9: The number of discoveries under linear and quadratic cut routes (n0 = 10 (H0),n1 = 90
(Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
), u = 1, . . . , 35.
14
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(90,10)
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Quadratic cut
q
Proportions
(n0,n1)=(90,10)
Figure 10: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes
(n0 = 90 (H0),n1 = 10 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
),
u = 1, . . . , 35.
15
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(50,50)
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Quadratic cut
q
Proportions
(n0,n1)=(50,50)
Figure 11: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes
(n0 = 50 (H0),n1 = 50 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
),
u = 1, . . . , 35.
16
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(10,90)
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Quadratic cut
q
Proportions
(n0,n1)=(10,90)
Figure 12: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes
(n0 = 10 (H0),n1 = 90 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
),
u = 1, . . . , 35.
17
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut (p non−random)
q
Proportions
(n0,n1)=(900,100)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Quadratic cut (p non−random)
q
Proportions
(n0,n1)=(900,100)
Figure 13: The FDR under linear and quadratic cut routes with ordered H0 p-values forming a non-
random equal-partition of [0, 1]. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
),
u = 1, . . . , 35.
18
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(90,10), σ increased
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Quadratic cut
q
Proportions
(n0,n1)=(90,10), σ increased
Figure 14: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes
(n0 = 90 (H0),n1 = 10 (Ha)). σ = 10, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u
),
u = 1, . . . , 35.
19
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(90,10), σ diverse
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
0.0 0.2 0.4 0.6 0.8 1.0
0.00.20.40.60.81.0
Linear cut
q
Proportions
(n0,n1)=(50,50), σ diverse
FDR
pFDR
Sensitivity
Specificity
Pr(discovery)
Figure 15: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut
routes (n0 = 90 (H0),n1 = 10 (Ha)). σ is heterogeneous among subjects, m = 6 and Ha subject
mean profile = 0.08u(1 + |sin(6x)|u
), u = 1, . . . , 35. Subject variation= 2|cos(1000i)| (i = 1, . . . , n0)
(under H0) and subject variation= 2|cos(1000i)| (i = 1, . . . , n1) (under Ha).
20
Histogram (class=5)
p−value
Frequency
n0 n1 =5x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
A
H0
Ha
p
Histogram (class=5)
p−value
Frequency
n0 n1 =5x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
B
H0
Ha
p
Figure 16: Histogram of p-values. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1+|sin(6x)|u
),
u = 1, 5 (A,B).
Histogram (class=5)
p−value
Frequency
n0 n1 =5x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
C
H0
Ha
p
Histogram (class=5)
p−value
Frequency
n0 n1 =5x 104
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
D
H0
Ha
p
Figure 17: Histogram of p-values. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1+|sin(6x)|u
),
u = 10, 20 (C,D).
21
0 5 10 15 20
−0.20.00.20.40.60.81.0
Lag
Autocorrelation
Autocorrelation
(rank residual)
0 5 10 15 20
−0.20.00.20.40.60.81.0
Lag
Autocorrelation
Autocorrelation
(rank)
0.0 0.5 1.0 1.5
0.00.20.40.60.81.0
Rank (p−value)
Subject(index=i/n)
Rank(p−value)
(f(i)=i/n,σ=1/100,m=6,n=100)
Rank fit
Rank
Mean
0 5 10 15 20 25 30
0.00.20.40.60.81.0
Lag
Autocorrelation
Autocorrelation
(rank residual)
0 5 10 15 20 25 30
0.00.20.40.60.81.0
Lag
Autocorrelation
Autocorrelation
(rank)
0.0 0.5 1.0 1.5
0.00.20.40.60.81.0
Rank (p−value)
Subject(index=i/n)
Rank(p−value)
(f(i)=i/n,σ=1/100,m=6,n=1000)
Rank fit
Rank
Mean
Figure 18: Rankings of p-values. Subject mean profile is modeled as i/n (i = 1, . . . , n). σ = 1/100.
22

Contenu connexe

Tendances

Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes FactorsChristian Robert
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESNONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESTahia ZERIZER
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityChristian Robert
 
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapStatistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapChristian Robert
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationChristian Robert
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)NYversity
 
Advanced Microeconomics - Lecture Slides
Advanced Microeconomics - Lecture SlidesAdvanced Microeconomics - Lecture Slides
Advanced Microeconomics - Lecture SlidesYosuke YASUDA
 
Introduction to Decision Making Theory
Introduction to Decision Making TheoryIntroduction to Decision Making Theory
Introduction to Decision Making TheoryYosuke YASUDA
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorChristian Robert
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsChristian Robert
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Jae-kwang Kim
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsChristian Robert
 

Tendances (20)

Approximating Bayes Factors
Approximating Bayes FactorsApproximating Bayes Factors
Approximating Bayes Factors
 
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALESNONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
NONLINEAR DIFFERENCE EQUATIONS WITH SMALL PARAMETERS OF MULTIPLE SCALES
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Statistics symposium talk, Harvard University
Statistics symposium talk, Harvard UniversityStatistics symposium talk, Harvard University
Statistics symposium talk, Harvard University
 
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapStatistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrap
 
random forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimationrandom forests for ABC model choice and parameter estimation
random forests for ABC model choice and parameter estimation
 
Slides Bank England
Slides Bank EnglandSlides Bank England
Slides Bank England
 
Pata contraction
Pata contractionPata contraction
Pata contraction
 
Machine learning (1)
Machine learning (1)Machine learning (1)
Machine learning (1)
 
Proba stats-r1-2017
Proba stats-r1-2017Proba stats-r1-2017
Proba stats-r1-2017
 
Advanced Microeconomics - Lecture Slides
Advanced Microeconomics - Lecture SlidesAdvanced Microeconomics - Lecture Slides
Advanced Microeconomics - Lecture Slides
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
Introduction to Decision Making Theory
Introduction to Decision Making TheoryIntroduction to Decision Making Theory
Introduction to Decision Making Theory
 
Testing as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factorTesting as estimation: the demise of the Bayes factor
Testing as estimation: the demise of the Bayes factor
 
WSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in StatisticsWSC 2011, advanced tutorial on simulation in Statistics
WSC 2011, advanced tutorial on simulation in Statistics
 
Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach Chapter2: Likelihood-based approach
Chapter2: Likelihood-based approach
 
Propensity albert
Propensity albertPropensity albert
Propensity albert
 
Approximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forestsApproximate Bayesian model choice via random forests
Approximate Bayesian model choice via random forests
 

En vedette

TALK-MedImmune-2013
TALK-MedImmune-2013TALK-MedImmune-2013
TALK-MedImmune-2013Junfeng Liu
 
叠加于漂移之上的简单随机游动的概率计算
叠加于漂移之上的简单随机游动的概率计算叠加于漂移之上的简单随机游动的概率计算
叠加于漂移之上的简单随机游动的概率计算Junfeng Liu
 
A New Statistical Aspects of Cancer Diagnosis and Treatment
A New Statistical Aspects of Cancer Diagnosis and TreatmentA New Statistical Aspects of Cancer Diagnosis and Treatment
A New Statistical Aspects of Cancer Diagnosis and TreatmentJunfeng Liu
 
A STATISTICAL NOTE ON SYSTEM TRANSITION INTO EQUILIBRIUM [03-21-2014]
A STATISTICAL NOTE ON SYSTEM TRANSITION INTO EQUILIBRIUM [03-21-2014]A STATISTICAL NOTE ON SYSTEM TRANSITION INTO EQUILIBRIUM [03-21-2014]
A STATISTICAL NOTE ON SYSTEM TRANSITION INTO EQUILIBRIUM [03-21-2014]Junfeng Liu
 
A Note on Confidence Bands for Linear Regression Means-07-24-2015
A Note on Confidence Bands for Linear Regression Means-07-24-2015A Note on Confidence Bands for Linear Regression Means-07-24-2015
A Note on Confidence Bands for Linear Regression Means-07-24-2015Junfeng Liu
 
Producto núcleo
Producto núcleoProducto núcleo
Producto núcleoBrenda1026
 
Conformacion Grupo-Paralelo C-Modulo 5
Conformacion Grupo-Paralelo C-Modulo 5Conformacion Grupo-Paralelo C-Modulo 5
Conformacion Grupo-Paralelo C-Modulo 5gladys_8888
 
Luis Lugo piano cuba "Diario Uno"Parana nota concierto teatro 3 de febrero c...
Luis Lugo piano cuba "Diario Uno"Parana nota  concierto teatro 3 de febrero c...Luis Lugo piano cuba "Diario Uno"Parana nota  concierto teatro 3 de febrero c...
Luis Lugo piano cuba "Diario Uno"Parana nota concierto teatro 3 de febrero c...Luis Lugo El Piano de Cuba I
 
Facebook en la empresa
Facebook en la empresaFacebook en la empresa
Facebook en la empresaCarlos Vargas
 
Mendeļa iedzimstības modelis
Mendeļa iedzimstības modelisMendeļa iedzimstības modelis
Mendeļa iedzimstības modelisbiologija_11klase
 
Praying with Women Doctors
Praying with Women DoctorsPraying with Women Doctors
Praying with Women DoctorsMelanie Rigney
 
Complications of mesh and should we use it ? - www.jinekoklojivegebelik.com
Complications of mesh and should we use it ? - www.jinekoklojivegebelik.comComplications of mesh and should we use it ? - www.jinekoklojivegebelik.com
Complications of mesh and should we use it ? - www.jinekoklojivegebelik.comjinekolojivegebelik.com
 
Klimata politika (2.daļa)
Klimata politika (2.daļa)Klimata politika (2.daļa)
Klimata politika (2.daļa)ZalaBriviba
 

En vedette (20)

TALK-MedImmune-2013
TALK-MedImmune-2013TALK-MedImmune-2013
TALK-MedImmune-2013
 
叠加于漂移之上的简单随机游动的概率计算
叠加于漂移之上的简单随机游动的概率计算叠加于漂移之上的简单随机游动的概率计算
叠加于漂移之上的简单随机游动的概率计算
 
A New Statistical Aspects of Cancer Diagnosis and Treatment
A New Statistical Aspects of Cancer Diagnosis and TreatmentA New Statistical Aspects of Cancer Diagnosis and Treatment
A New Statistical Aspects of Cancer Diagnosis and Treatment
 
A STATISTICAL NOTE ON SYSTEM TRANSITION INTO EQUILIBRIUM [03-21-2014]
A STATISTICAL NOTE ON SYSTEM TRANSITION INTO EQUILIBRIUM [03-21-2014]A STATISTICAL NOTE ON SYSTEM TRANSITION INTO EQUILIBRIUM [03-21-2014]
A STATISTICAL NOTE ON SYSTEM TRANSITION INTO EQUILIBRIUM [03-21-2014]
 
A Note on Confidence Bands for Linear Regression Means-07-24-2015
A Note on Confidence Bands for Linear Regression Means-07-24-2015A Note on Confidence Bands for Linear Regression Means-07-24-2015
A Note on Confidence Bands for Linear Regression Means-07-24-2015
 
6 vsk caka
6 vsk caka6 vsk caka
6 vsk caka
 
Producto núcleo
Producto núcleoProducto núcleo
Producto núcleo
 
O uso de blog educacional
O uso de blog educacionalO uso de blog educacional
O uso de blog educacional
 
Conformacion Grupo-Paralelo C-Modulo 5
Conformacion Grupo-Paralelo C-Modulo 5Conformacion Grupo-Paralelo C-Modulo 5
Conformacion Grupo-Paralelo C-Modulo 5
 
Luis Lugo piano cuba "Diario Uno"Parana nota concierto teatro 3 de febrero c...
Luis Lugo piano cuba "Diario Uno"Parana nota  concierto teatro 3 de febrero c...Luis Lugo piano cuba "Diario Uno"Parana nota  concierto teatro 3 de febrero c...
Luis Lugo piano cuba "Diario Uno"Parana nota concierto teatro 3 de febrero c...
 
Facebook en la empresa
Facebook en la empresaFacebook en la empresa
Facebook en la empresa
 
Mendeļa iedzimstības modelis
Mendeļa iedzimstības modelisMendeļa iedzimstības modelis
Mendeļa iedzimstības modelis
 
Darvins un evolucija
Darvins un evolucijaDarvins un evolucija
Darvins un evolucija
 
Praying with Women Doctors
Praying with Women DoctorsPraying with Women Doctors
Praying with Women Doctors
 
Sporaugi + Sēklaugi
Sporaugi + Sēklaugi Sporaugi + Sēklaugi
Sporaugi + Sēklaugi
 
Pelvic Organ Prolapse Misconceptions
Pelvic Organ Prolapse MisconceptionsPelvic Organ Prolapse Misconceptions
Pelvic Organ Prolapse Misconceptions
 
Ejercicios
EjerciciosEjercicios
Ejercicios
 
Augi - augu pazimes
Augi - augu pazimesAugi - augu pazimes
Augi - augu pazimes
 
Complications of mesh and should we use it ? - www.jinekoklojivegebelik.com
Complications of mesh and should we use it ? - www.jinekoklojivegebelik.comComplications of mesh and should we use it ? - www.jinekoklojivegebelik.com
Complications of mesh and should we use it ? - www.jinekoklojivegebelik.com
 
Klimata politika (2.daļa)
Klimata politika (2.daļa)Klimata politika (2.daļa)
Klimata politika (2.daļa)
 

Similaire à A Geometric Note on a Type of Multiple Testing-07-24-2015

Probability/Statistics Lecture Notes 4: Hypothesis Testing
Probability/Statistics Lecture Notes 4: Hypothesis TestingProbability/Statistics Lecture Notes 4: Hypothesis Testing
Probability/Statistics Lecture Notes 4: Hypothesis Testingjemille6
 
Econometrics 2.pptx
Econometrics 2.pptxEconometrics 2.pptx
Econometrics 2.pptxfuad80
 
Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...praveenyadav2020
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingSSA KPI
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Communication Theory - Random Process.pdf
Communication Theory - Random Process.pdfCommunication Theory - Random Process.pdf
Communication Theory - Random Process.pdfRajaSekaran923497
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodAlexander Decker
 
STUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURE
STUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURESTUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURE
STUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURESurender Singh
 
Chapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability DistributionsChapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability DistributionsJasonTagapanGulla
 
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013Christian Robert
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)jemille6
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)NYversity
 

Similaire à A Geometric Note on a Type of Multiple Testing-07-24-2015 (20)

Probability/Statistics Lecture Notes 4: Hypothesis Testing
Probability/Statistics Lecture Notes 4: Hypothesis TestingProbability/Statistics Lecture Notes 4: Hypothesis Testing
Probability/Statistics Lecture Notes 4: Hypothesis Testing
 
U unit7 ssb
U unit7 ssbU unit7 ssb
U unit7 ssb
 
Econometrics 2.pptx
Econometrics 2.pptxEconometrics 2.pptx
Econometrics 2.pptx
 
Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...Fisher_info_ppt and mathematical process to find time domain and frequency do...
Fisher_info_ppt and mathematical process to find time domain and frequency do...
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Basics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programmingBasics of probability in statistical simulation and stochastic programming
Basics of probability in statistical simulation and stochastic programming
 
QMC: Operator Splitting Workshop, Compactness Estimates for Nonlinear PDEs - ...
QMC: Operator Splitting Workshop, Compactness Estimates for Nonlinear PDEs - ...QMC: Operator Splitting Workshop, Compactness Estimates for Nonlinear PDEs - ...
QMC: Operator Splitting Workshop, Compactness Estimates for Nonlinear PDEs - ...
 
Distributions
DistributionsDistributions
Distributions
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
Talk 3
Talk 3Talk 3
Talk 3
 
Communication Theory - Random Process.pdf
Communication Theory - Random Process.pdfCommunication Theory - Random Process.pdf
Communication Theory - Random Process.pdf
 
Numerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis methodNumerical solution of boundary value problems by piecewise analysis method
Numerical solution of boundary value problems by piecewise analysis method
 
Paper06
Paper06Paper06
Paper06
 
STUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURE
STUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURESTUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURE
STUDIES ON INTUTIONISTIC FUZZY INFORMATION MEASURE
 
Chapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability DistributionsChapter 3 – Random Variables and Probability Distributions
Chapter 3 – Random Variables and Probability Distributions
 
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
Reading Testing a point-null hypothesis, by Jiahuan Li, Feb. 25, 2013
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Statistics Homework Help
Statistics Homework HelpStatistics Homework Help
Statistics Homework Help
 
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
Mayo Slides: Part I Meeting #2 (Phil 6334/Econ 6614)
 
Machine learning (2)
Machine learning (2)Machine learning (2)
Machine learning (2)
 

A Geometric Note on a Type of Multiple Testing-07-24-2015

  • 1. A Geometric Note on a Type of Multiple Testing Dipak K Dey, Junfeng Liu, Nalini Ravishanker, Edwards Qiang Zhang (07-24-2015) ABSTRACT. For a collection of subjects, the within-subject replicate measurements are usually modeled as subject-specific mean (zero and/or non-zero) plus random noises. For the problem of selecting a set of potentially significant subjects (likely with non-zero means) out of all subjects, we study some new aspects of the elegant false discovery rate (FDR) control procedure proposed by Benjamini and Hochberg (1995). 1 Introduction We present the collected measurements as yi,j = µi + ϵi,j, ϵi,j ∼ N(0, σ2 i ), j = 1, . . . , m, i = 1, . . . , n, where n is the total number of subjects and m is the sample size (number of replicates) for each subject. A type of confidence interval for each subject mean (µi) could be constructed as µi ∈ ¯yi ± Ctm−1,1− α 2 ˆσm−1/ √ m, i = 1, . . . , n. where, 1 − α is the prescribed confidence level, subject-specific variance is estimated as ˆσ2 m−1 = 1 m−1 ∑m j=1(yi,j − ¯yi)2 involving subject-specific mean estimator ¯yi = 1 m ∑m j=1 yi,j, tm−1,1− α 2 is the 1 − α 2 quantile of the central Students’ t-distribution with degrees of freedom m − 1, and C is a cross-the-board tuning parameter. Simply employing the following rule | √ m¯yi ˆσm−1 | > Ctm−1,1−α 2 → reject µi = 0 | √ m¯yi ˆσm−1 | ≤ Ctm−1,1−α 2 → accept µi = 0 (1) relates to checking out t-statistic based p-value, pi = 1 − F(| √ m¯yi ˆσm−1 |), where F is the probability distribution function for certain random variable. For instance, F could be the probability distri- bution of |Tm−1| (Tm−1 is the central Students’ t-statistic with degrees of freedom m − 1). Rule (1) becomes into 1
  • 2. pi < 1 − Fm−1(Ctm−1,1− α 2 ) → reject µi = 0, pi ≥ 1 − Fm−1(Ctm−1,1−α 2 ) → accept µi = 0. We first look at an example where three groups have subject mean (indexed by i = 1,. . .,n = 100 for each group) profiles µi = 0 (Group 1), µi = 0.01(1 + sin10i n ) (Group 2) and µi = 0.10(1 + sin10i n ) (Group 3), respectively. The within-subject variation σ = 1. Under rule (1), the rejection proportion profiles (α = 0.10, m varies from 6 to 10, C = 1 + j 20 (j = 1, . . . , 16)) are plotted in Figure 1. Groups 1 and 2 have similar rejection proportion profiles since these two subject mean profiles are substantially close to each other. Thus, the resultant false discovery rate in this manner is roughly 1/2 when we combine these two groups (H0 = Group 1 (zero mean), Ha = Group 2 (non-zero mean)) under any values of m and C. 2 New perspectives Built upon the ordered-p-value set ({p(j), 1 ≤ j ≤ n}) with p-values (indexed by rank j) being arranged from the smallest to the largest, the elegant false discovery rate (FDR) control procedure (Benjamini and Hochberg, 1995) would reject all subjects with rank ≤ max{j : p(j) ≤ j n q, 1 ≤ j ≤ n}, (2) where n is the total number of hypotheses (subjects) with H0 and Ha combined. If rejections are found then the instant FDR is calculated as the proportion of wrong rejections out of all rejections (H0 and Ha combined). If no rejections are found then the instant FDR is defined as 0. The so-called FDR which is defined as the expectation of the instant FDRs is controlled at π0q, where π0 is the proportion of H0 hypotheses (subjects) out of all hypotheses. For illustration purposes, we set the subject mean function (Ha) as f(u, x) = 0.08u(1 + |sin(6x)|u ), x ∈ [0, 1], u = 1, 2, . . . . (3) The subject means under H0 are implemented through setting u = 0 at x = i n0 (i = 1, . . . , n0), where n0 is the number of subjects (hypotheses) under H0. The subject means under Ha are implemented 2
  • 3. through setting x = i n1 (i = 1, . . . , n1), where n1 is the number of subjects (hypotheses) under Ha. Under any numerical simulation configuration (subject group size (n0, n1), within-subject variation (σ), within-subject replicate/sample size (m)), separating Ha subjects from H0 is expected to be easier as we increase Ha subject mean profile to ∞. We take a look at the resultant specificity profiles and find they approach to a limit (regulated by q) as Ha mean profile increases. Such a limit is achieved exactly once Ha mean profile is sufficiently large. We are thus motivated to take a geometric view by juxtaposing the ordered-p-value profiles (H0 and Ha) along with an overriding adaptive hypothesis rejection cut-off route (indexed by subjects, H0 and H1 combined) for sequential p-value check. In Figure 2, the ordered-p-value profile under H0 roughly resembles a straight line connecting points (π1,0) and (1,1). As ordered-p-value profile under Ha approaches to the bottom (mean profile increases), the rejected hypothesis set includes all Ha and those H0 subjects with p-value located from D to B (Rule (2)). The limiting specificity is subsequently calculated. Along the cut route (the solid line spanning from (0,0) to (x1,y1)) in Figure 2, each check point j∗ ∈ {1, . . . , n(= n0 + n1)} corresponds to a number (n0(j∗ )) of p-values (≤ j∗ n q, under H0) and another number (n1(j∗ )) of p-values (≤ j∗ n q, under Ha) (Figure 3). All those hypotheses linked to these n0(j∗ )+n1(j∗ ) p-values will be rejected as long as n0(j∗ ) + n1(j∗ ) ≥ j∗ . However, any check point (j∗ ) along the cut route (Figure 2) which is beyond that one (j∗ B) corresponding to point B would not be able to collect a sufficient number of hypotheses (H0 and Ha combined) such that n0(j∗ ) + n1(j∗ ) ≥ j∗ . The set {j∗ − n0(j∗ ) : 1 ≤ j∗ ≤ j∗ B} roughly formulates a no-rejection region boundary prescribed for Ha hypotheses (the bold dash line, Figure 3), i.e., there will be no discovery (rejection) unless the ordered-p-value profile under Ha ever crosses this boundary from upper portion (“NO REJECTION region”, Figure 3)) to the lower portion. When there is such a crossing, geometric arguments show that the instant FDR is always around π0q no matter where the crossing point is located along the no-rejection boundary. Numerical simulation would disclose some operating characteristics under different specifications on experimental factors (e.g., within-subject variation (σ), within-subject sample size (m), Ha subject mean profile, population size (n0 + n1), H0 proportion (π0 = n0/(n0 + n1)), etc.). Moreover, we also try applying a quadratic cut route reject all subjects with rank ≤ max{j : p(j) ≤ ( j n )2 q, 1 ≤ j ≤ n}, (4) 3
  • 4. We summarize some observations. • In Figure 2, the intersection (B) between H0 p-value profile ( y = (x − π1)/π0) and linear cut route (y = xq) has location (x1,y1) with x1 = (1 − π0)/(1 − qπ0), the intersection (C) between H0 p-value profile and quadratic cut route (y = x2 q) has location (x2,y2) with x2 = (1 − √ 1 − 4π0(1 − π0)q)/(2qπ0). • From Figure 3, when the probability of discovery= 1, FDR=pFDR (positive false discov- ery rate)= π0q (constant) no matter where the ordered-p-value profile (Ha) crosses the no- rejection boundary. The no-rejection boundary function g(x) = qx/(1 − qπ0) (0 ≤ x ≤ π1, under linear cut) and g(x) = (1−2qπ0x)− √ 1−4qπ0x 2qπ2 0 (0 ≤ x ≤ π1, under quadratic cut). The relationship between instant FDR(=pFDR) and no-rejection boundary function (g(x)) is pFDR= π0g(x)/(x + π0g(x)) (0 ≤ x ≤ π1). • In Figure 4, at each q, the instant FDR(=pFDR) increases with the location (x ∈ (0, π1), the x-axis) where the ordered-p-value profile (Ha) crosses the no-rejection boundary. When q=1, FDR= π0 for any cut routine (linear, quadratic). • In Figure 5, under linear cut, when probability of discovery is less than one (e.g., ordered- p-value profiles are close between H0 and Ha), pFDR>FDR and FDR= π0q. pFDR is less sensitive to q compared to FDR. This is relevant to the observation in Figure 1 (Groups 1 and 2). In Figure 5, under quadratic cut, the FDR is much less than that under linear cut case. When Ha mean profiles are close to zero, the pFDR is more volatile than linear cut case. • Under linear cut, the specificity approaches to (1−q)/(1−qπ0) as µ increases. Under quadratic cut, the specificity approaches to 1 π0 − 1−(1−4qπ0(1−π0))1/2 2qπ2 0 as µ increases. See Figures 5, 6, 7, 10, 11, 12. • As ordered-p-value profile under Ha decreases (mean profile increases), the numbers of discov- eries becomes very small. The number increases with Ha subject mean profile. The expected number of discoveries under linear cut is higher than that under quadratic cut. The difference is larger as π0 gets larger. See Figures 8, 9. 4
  • 5. • As n increases, the limiting specificity profile approaches to the aforementioned calculated curve more closely. See the left panels in Figures 5, 10. • As π0 decreases, the limiting specificity profile approaches to the aforementioned calculated curve more closely. See the left panels in Figures 10, 11. • The specificity under linear cut is lower than that under quadratic cut and the difference lessens as π0 decreases. The sensitivity under linear cut is higher than that under quadratic cut. The probability of discovery under linear cut is higher than that under quadratic cut. • We consider an unrealistic case where H0 ordered-p-value profile is not random: {i/n0, i = 1, . . . , n0}. The FDR and pFDR is less than π0q when the Ha mean profile is close to zero (Figure 13). • When σ (homogeneous among subjects) increases, the resultant cluster of profiles (collected from mean profile set) behaves similarly to a sub-cluster of profiles (collected from mean profile set with small values) with small σ (Figure 14 ). • When σ is heterogeneous across subjects (roughly independent of subject mean), the proba- bility of discovery tend to be larger (closer to one) than that under homogeneous σ case when the mean profile is close to zero. The pFDR under heterogeneous σ is closer to FDR compared to the case with homogeneous σ. All other profiles (sensitivity, specificity) are similar between these two cases (homogeneous and heterogeneous σ) (Figure 15 ). • If all Ha p-values are ≤ p, we reject all p-values ≤ p. The false rejection rate ≤ π0q amounts to p ≤ π1q π1+(1−q)π0 (Figures 16, 17). 3 A note on p-value We numerically study the ranking of p-values through setting set size (n), Ha subject mean pro- file (µ) and noise variance (σ2 ) and others. Stochastic p-value rankings from both H0 and Ha 5
  • 6. (Figures 18. Although the subject means are clearly ordered across domain [0, 1]) and the within- subject variation is moderate (=1) or minor (=1/100), the rankings of p-values are substantially fluctuating around a trend. The degree of shuffling seems to be similar between two cases (σ =1 and 1/100). The p-values are individually calculated for each subject without considering the overall model structure (e.g., mean profile function, homogeneous variation, etc.). Each p-value is associ- ated with a probability function, Pr(Tm−1 > √ m¯xm/ˆσm), where ¯xm and ˆσm are independent of each other. This pair of statistics (sample standard deviation, sample mean) is also used to estimate the population coefficient of variation (σ/µ). The stochastic ˆσm has an substantial shuffling impact on the ranking of ¯xm. For instance, given another subject *, the comparison between √ m¯xm/ˆσm and √ m¯x∗ m/ˆσ∗ m may be confused by the stochastic relative magnitude between ˆσm and ˆσ∗ m. The distribution of estimate of coefficient of variation is available (e.g., Hendricks and Robey (1936), Vangel (1996)). Even pairwise comparison between any two subject means is generally complicated under certain circumstances and numerical investigation is usually needed (e.g., Hsu (1938)). References [1] W.A. Hendricks, K.W. Robey (1936). The sampling distribution of the coefficient of variation. The Annals of Mathematical Statistics 7(3): 129-132. [2] P.L. Hsu (1938). Contribution to the theory of “Student’s” t-test as applied to the problem of two samples. Statistical Research Memoirs 2: 1-24. [3] Y. Benjamini and Y. Hochberg (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society (B) 57: 289-300. [4] M.G. Vangel (1996). Confidence intervals for a normal coefficient of variation. The American Statistician 50(1): 21-26. 4 APPENDIX 6
  • 7. 0.0 0.2 0.4 0.6 0.8 1.0 −0.10.10.30.5 Subject means Subject population Subjectmean (n=100 per group) Group 1 2 3 6 7 8 9 10 0.000.050.100.15 Rejection proportion m (replicates) Rejectionproportion (Group 1) 6 7 8 9 10 0.000.050.100.15 Rejection proportion m (replicates) Rejectionproportion (Group 2) 6 7 8 9 10 0.000.050.100.15 Rejection proportion m (replicates) Rejectionproportion (Group 3) Figure 1: The rejection proportion profiles arising from applying testing rule (1) (α = 0.10). Three groups (1,2,3) have subject mean (subject index i= 1, . . . , 100) profiles µi = 0, µi = 0.01(1 + sin10i n ) and µi = 0.10(1+sin10i n ), respectively (the top-left panel). The within-subject variation (σ)= 1. The tuning parameter (C) in rule (1) = 1+ j 20 (j = 1, . . . , 16) with resultant rejection proportion profiles (with m spanning from 6 to 10) located from top to bottom in each panel (top-right, bottom-left, bottom-right). 7
  • 8. Geometry of false discovery rate control B(x1,y1) C(x2,y2) A D Specificity=AB/AD (linear cut) Specificity=AC/AD (quadratic cut) π1 π0 q p−value H0 Ha Figure 2: The bold dash line represents the ordered p-values from Ha with large positive means (π1 = 0.7). The bold dot line represents the ordered p-values from H0 (π0 = 0.3). The solid lines represent the linear and quadratic cut routes (x-axis is the ordered p-value index, y-axis is the threshold for H0 rejection). Under Benjamini-Hochberg (1995) FDR control procedure, specificity approaches to its limit as the alternative means increase. The intersection points between the linear and quadratic cut routes and H0 ordered p-value profile are the final p-value cut-off point for rejecting H0, which are labeled as B (location=(x1,y1)) and C (location=(x2,y2)), respectively. The specificities are calculated. 8
  • 9. FDR control (linear cut) π1 π0 (NO REJECTION region) (Ha) q p−value FDR control (quadratic cut) π1 π0 (NO REJECTION region) (Ha) q p−value Figure 3: The left panel shows the geometry of Benjamini-Hochberg FDR control procedure (1995). The bold solid line represents the linear cut route (x-axis is the ordered p-value index, y-axis is the threshold for H0 rejection). The bold dot line represents the ordered p-value profile under H0 (group size ∝ π0). The bold dash line represents the no-rejection region boundary for the ordered p-values from Ha (group size ∝ π1). In the horizontal direction, the distance between the bold dash and the solid lines equals the distance between the bold dot line and the point which separates the two regions labeled by “π1” and “π0”, respectively). The right panel shows the geometry of FDR control procedure under quadratic cut route. 9
  • 10. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.10.20.30.4 Positive false discovery rate (linear and quadratic cut) Exceeding point (0 to π1) (Ha) FDR Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Linear Quadratic q (1/10) (10/10,by 1/10) Figure 4: The FDR under linear and quadratic cut routes. FDR under linear cut is a constant among exceeding points. FDR under quadratic cut is an increasing function of exceeding position. 10
  • 11. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(900,100) FDR pFDR Sensitivity Specificity Pr(discovery) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Quadratic cut q Proportions (n0,n1)=(900,100) Figure 5: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes (n0 = 900 (H0),n1 = 100 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 11
  • 12. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(500,500) FDR pFDR Sensitivity Specificity Pr(discovery) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Quadratic cut q Proportions (n0,n1)=(500,500) Figure 6: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes (n0 = 500 (H0),n1 = 500 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 12
  • 13. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(100,900) FDR pFDR Sensitivity Specificity Pr(discovery) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Quadratic cut q Proportions (n0,n1)=(100,900) Figure 7: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes (n0 = 100 (H0),n1 = 900 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 13
  • 14. 0.0 0.2 0.4 0.6 0.8 1.0 020406080100 Linear cut q Numberofdiscoveries (n0,n1)=(90,10) 0.0 0.2 0.4 0.6 0.8 1.0 020406080100 Quadratic cut q Numberofdiscoveries (n0,n1)=(90,10) Figure 8: The number of discoveries under linear and quadratic cut routes (n0 = 90 (H0),n1 = 10 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 0.0 0.2 0.4 0.6 0.8 1.0 020406080100 Linear cut q Numberofdiscoveries (n0,n1)=(10,90) 0.0 0.2 0.4 0.6 0.8 1.0 020406080100 Quadratic cut q Numberofdiscoveries (n0,n1)=(10,90) Figure 9: The number of discoveries under linear and quadratic cut routes (n0 = 10 (H0),n1 = 90 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 14
  • 15. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(90,10) FDR pFDR Sensitivity Specificity Pr(discovery) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Quadratic cut q Proportions (n0,n1)=(90,10) Figure 10: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes (n0 = 90 (H0),n1 = 10 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 15
  • 16. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(50,50) FDR pFDR Sensitivity Specificity Pr(discovery) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Quadratic cut q Proportions (n0,n1)=(50,50) Figure 11: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes (n0 = 50 (H0),n1 = 50 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 16
  • 17. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(10,90) FDR pFDR Sensitivity Specificity Pr(discovery) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Quadratic cut q Proportions (n0,n1)=(10,90) Figure 12: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes (n0 = 10 (H0),n1 = 90 (Ha)). σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 17
  • 18. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut (p non−random) q Proportions (n0,n1)=(900,100) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Quadratic cut (p non−random) q Proportions (n0,n1)=(900,100) Figure 13: The FDR under linear and quadratic cut routes with ordered H0 p-values forming a non- random equal-partition of [0, 1]. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 18
  • 19. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(90,10), σ increased FDR pFDR Sensitivity Specificity Pr(discovery) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Quadratic cut q Proportions (n0,n1)=(90,10), σ increased Figure 14: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes (n0 = 90 (H0),n1 = 10 (Ha)). σ = 10, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. 19
  • 20. 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(90,10), σ diverse FDR pFDR Sensitivity Specificity Pr(discovery) 0.0 0.2 0.4 0.6 0.8 1.0 0.00.20.40.60.81.0 Linear cut q Proportions (n0,n1)=(50,50), σ diverse FDR pFDR Sensitivity Specificity Pr(discovery) Figure 15: The FDR, pFDR, specificity and sensitivity profiles under linear and quadratic cut routes (n0 = 90 (H0),n1 = 10 (Ha)). σ is heterogeneous among subjects, m = 6 and Ha subject mean profile = 0.08u(1 + |sin(6x)|u ), u = 1, . . . , 35. Subject variation= 2|cos(1000i)| (i = 1, . . . , n0) (under H0) and subject variation= 2|cos(1000i)| (i = 1, . . . , n1) (under Ha). 20
  • 21. Histogram (class=5) p−value Frequency n0 n1 =5x 104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 A H0 Ha p Histogram (class=5) p−value Frequency n0 n1 =5x 104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 B H0 Ha p Figure 16: Histogram of p-values. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1+|sin(6x)|u ), u = 1, 5 (A,B). Histogram (class=5) p−value Frequency n0 n1 =5x 104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 C H0 Ha p Histogram (class=5) p−value Frequency n0 n1 =5x 104 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 D H0 Ha p Figure 17: Histogram of p-values. σ = 1, m = 6 and Ha subject mean profile = 0.08u(1+|sin(6x)|u ), u = 10, 20 (C,D). 21
  • 22. 0 5 10 15 20 −0.20.00.20.40.60.81.0 Lag Autocorrelation Autocorrelation (rank residual) 0 5 10 15 20 −0.20.00.20.40.60.81.0 Lag Autocorrelation Autocorrelation (rank) 0.0 0.5 1.0 1.5 0.00.20.40.60.81.0 Rank (p−value) Subject(index=i/n) Rank(p−value) (f(i)=i/n,σ=1/100,m=6,n=100) Rank fit Rank Mean 0 5 10 15 20 25 30 0.00.20.40.60.81.0 Lag Autocorrelation Autocorrelation (rank residual) 0 5 10 15 20 25 30 0.00.20.40.60.81.0 Lag Autocorrelation Autocorrelation (rank) 0.0 0.5 1.0 1.5 0.00.20.40.60.81.0 Rank (p−value) Subject(index=i/n) Rank(p−value) (f(i)=i/n,σ=1/100,m=6,n=1000) Rank fit Rank Mean Figure 18: Rankings of p-values. Subject mean profile is modeled as i/n (i = 1, . . . , n). σ = 1/100. 22