R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013

Universit´ Paris-Dauphine Anné 2012-2013
e e
D´partement de Math´matique
e e

Examen NOISE, sujet B
Pr´liminaires
e

Cet examen est ` ráliser sur ordinateur en utilisant le langage R et `
a e a
rendre simultan´ment sur papier pour les r´ponses d´taillés et sur fichier
e e e e
informatique Examen pour les fonctions R utilisés. Les fichiers informa-
e
tiques seront ` sauvegarder suivant la proc´dure ci-dessous et seront pris
a e
en compte pour la note finale. Toute duplication de fichiers R fera l’objet
d’une poursuite disciplinaire. L’absence de document enregistr´ donnera
e
lieu ` une note nulle sans possibilit´ de contestation.
a e
1. Enregistrez r´guli`rement vos fichiers sur l’ordinateur, sans utiliser
e e
d’accents ni d’espace, ni de caract`res spćiaux.
e e
2. Si vous utilisez Rkward, vous devez enregistrer ` l’aide du bouton
a
“Save script” (ou “Save script as”) et non “Save”.
3. V´rifiez que vos fichiers ont bien ´t´ enregistr´s en les rouvrant avant
e e e e
de vous dćonnecter. N’h´sitez pas ` rouvrir votre fichier ` l’aide d’un
e e a a
autre ´diteur de texte afin de v´rifier qu’il contient bien tout votre
e e
code R.
4. En cas de probl`me ou d’inqui´tude, contacter un enseignant sans
e e
vous dćonnecter. Il nous est sinon impossible de rćup´rer les fichiers
e e e
de sauvegarde automatique.

Aucun document informatique n’est autoris´, seuls les livres de R le sont.
e
L’utilisation de tout service de messagerie ou de mail est interdite et, en
cas d’utilisation av´ré, se verra sanctionné.
e e e
Les probl`mes sont ind´pendants, peuvent ˆtre trait´s dans n’importe quel
e e e e
ordre. R´soudre trois et uniquement trois exercices au choix.
e

Exercice 1
Given the probability density
C k−1 − |x|
f (x|k, θ) = |x| e θ ,
θk
1. explain why an importance sampling technique, designed to approximate the
constant C, that is based on the Normal density cannot not work. Illustrate this
lack of convergence with a numerical experiment using k = 1 and θ = 2.
2. Propose a more suitable importance distribution.
We now focus on the integral
I= x2 f (x|1, 2)dx
R
using samples of size n = 102

3. Propose a Monte Carlo approximation of I. (Hint : Note that the integral over R
is twice the integral over R+ and connect f with a standard distribution on R+ .)
4. Approximate I by importance sampling using the same distribution g as in question
2.
5. Compute a confidence interval on I at level 95% for each of your method. Which
one of the two estimates does reach the lowest precision ?
6. Design a Monte Carlo experiment in order to check whether or not the asymptotic
coverage level of the CI holds. Repeat the experiment with samples of size n = 103 .

Exercice 2
Given the density on R∗ ,
+
β α −α−1 − β
f (x|α, β) = x e x
Γ(α)
1. Determine which of the following distributions can be used in an A/R algorithm
designed to sample from f (x|2, 4) :

k x k−1 −(xλ)k 1 1 k−1 − x
g1 (x) = ( ) e g2 (x) = x e θ g3 (x) = (1 + αx)−1/α−1
λ λ θk Γ(k)

which are respectively a Weibull, a Gamma and a generalized Pareto distribution
density. (Motivate your choice.)
2. Using the inversion method write an algorithm that samples from the selected g.
3. Write an R function AR() that samples from f (x|2, 4). (Extra-credit : Optimize the
parameters of the proposal density g.)
4. Based on a sample of size 104 from f (x|2, 4), estimate by Monte Carlo the mean
and variance of h(X) = 1/X and give a confidence interval at level 95% for both
quantities.
5. The distribution associated with f can be obtained by the transform 1/Z where
Z ∼ Gamma(α, 1/β). Establish this result and test it, based on the sample used in
question 4.

Exercice 3
If X1 , X2 , . . . , Xk is a sample from the N (0, 1) distribution, then Yk = Xi2 follows the
χ 2 (k) distribution. We wish to verify a convergence theorem due to R. A. Fisher which

states that √ L
2Yk − 2k − 1 − − N (0, 1)
−→
k→∞

1. Create a function rchisq2(n,k) which simulates n realizations of the χ2 (k) distri-
bution, using nk realizations of the standard normal distribution. (Note : if you do
not manage this question, you can use the R function rchisq() for the remainder
of the exercise.)
√ √
2. For k = 50 and n = 1000, propose a graphical way to verify the fit of 2Yk − 2k − 1
to the N (0, 1) distribution.
3. Using ks.test() and n = 1000, check whether the normal distribution is an accep-
table fit when k = 3, k = 30, k = 300.

4. From now on, k = 300 and n = 1000. We now have a test to check the fit of a
sample x to the χ2 (k) distribution : we accept the null hypothesis that x comes
from √ χ2 (k) distribution iff the Kolmogorov-Smirnov test accepts the hypothesis
the √
that 2Yk − 2k − 1 fits the N (0, 1) distribution. Perform a bootstrap experiment
to calculate the probability of accepting the null hypothesis for a sample which
comes from the Beta(1, k) distribution.
5. Perform another bootstrap experiment to calculate the same probability when using
directly the Kolmogorov-Smirnov test for fit to the χ2 (k) distribution (whose cdf
exists in R as pchisq).

Exercice 4
The F rechet(α, s, m) distribution defines a random variable X which takes values in
]m, +∞[ and with cumulative distribution function
−α
x−m
F (x) = exp −
s

1. Using the generic inversion method, write a function rfrechet(n,α,s,m) which
outputs n realizations of the F rechet(α, s, m) distribution.
2. For α = 5, s = 1, m = −3, give a Monte Carlo experiment to estimate V ar(X)
and the median of X. Calculate (on paper) the theoretical value of the median and
compare it to your estimate.
3. Propose a bootstrap experiment to evaluate the bias of your variance and median
estimators.
4. For α = 5, s = 1, m = −3, use the Kolmogorov-Smirnov test to verify that the
variable α
1
Y =
X −m
follows an Exp(1) distribution.

Exercice 5
Consider the density function on the real line R
(2k + 1)! Φ(x)k Φ(−x)k
fk (x) = √
(k!)2 2π exp(x2 /2)
where k ≥ 1 is an integer and Φ is the normal cdf.
1. Check by numerical integration that fk is a proper density for k = 6, 12, 24
2. Design an accept-reject algorithm function on R that produce an iid sample of
arbitrary size m for an arbitrary parameter k. (Hint : Notice that either Φ(x)
or Φ(−x) is necessarily less than 1/2 and that Φ(−x) = 1 − Φ(x). Deduce that
Φ(x)Φ(−x) < 1/4.) Produce a graphical verification of the fit for m = 103 and
k = 6, 12, 24.
3. We want to check from the acceptance rate of this accept-reject algorithm that
the normalisation is correct in the above. Produce 250 realisations of an empirical
acceptance rate based on 100 proposals and deduce a 97% confidence interval on
the expectation of the acceptance rate. Check whether or not it contains the inverse
normalising constant.

4. This density is actually the distribution of the median of a normal sample of size
n = 2k + 1. (Extra-credit : Establish this rigorously.) Generate a sample from the
above accept-reject algorithm with m = 250 and k = 10, then another sample of
m = 250 medians from samples of 21 normal variates. Test whether they have the
same distribution.
5. Check whether or not the p-value of the above test is distributed as a uniform U (0, 1)
random variate.
Exercice 6
Download the dataset Nile :
> data(Nile)
> nile = jitter(as.vector(Nile))
We will assume that those are iid realisations of a random variable X, producing a
sample Xn = (X1 , . . . , Xn ).

We denote by IQ0.8 (Xn ) an inter-quantile interval of the sample, defined by
IQ0.8 (Xn ) = Q0.9 (Xn ) − Q0.1 (Xn )
where Q0.9 (Xn ) and Q0.1 (Xn ) are the empirical quantiles of the sample at levels 90%
and 10%. We would like to calibrate IQ0.8 (Xn ) by a coefficient α so that it becomes an
unbiased estimator of the standard deviation of the distribution of the Xi ’s.
1. Write an R function iqan(x) which produces the statistic IQ2 (Xn ) associated
0.8
with the sample x. Compute the outcome of your function on the dataset nile.
2. Simulate 104 replicas of a Cauchy C(µ, σ) (µ being the location and σ the scale)
sample Xn of size n = 100 and deduce a Monte Carlo evaluation of the coefficient
α such that αE[IQ2 (Xn )] = σ 2 . (Extra-credit : Explain why the values of µ and σ
0.8
can be chosen arbitrarily.)
3. Based on the previous experiment, and using the 104 realisations of IQ0.8 (Xn )
generated in question 2., deduce a 93% confidence interval on IQ0.8 (Xn )/σ. (Hint :
Use the empirical cdf, rather than bootstrap.) Compare with the asymptotically
normal 93% confidence interval on E[IQ0.8 (Xn )]/σ. Check whether or not 6.1554
belongs to these intervals. (Extra-credit : Justify the choice α = 1/6.1554.)
4. By running a Monte Carlo experiment based on 105 replications of Cauchy random
variates with location µ and scale σ of your choice, check whether or not the 93%
confidence interval on log |X − µ| contains log(σ).
We will assume in the rest of the exercise that µ = Q0.5 (Xn ), the median of the sample,
˜
and
σ (Xn ) = exp{(log |X1 − Q0.5 (Xn )| + . . . + log |Xn − Q0.5 (Xn )|)/n}
˜
are acceptable estimators of µ and σ. (Extra-credit : Explain why the usual empirical
moments do not work for the Cauchy distribution.)
5. Check whether or not nile is distributed from a Cauchy sample (with unknown
location and scake).
6. Since nile is not necessarily a Cauchy sample, denoting by σ the standard deviation
of the distribution of the Xi ’s, construct by bootstrap a 93% confidence interval on
IQ0.8 (Xn )/σ, using σ (Xn ) based on nile as the estimate of σ. Does it still contain
˜
6.1554 ?

R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (18)

Similaire à R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013

Similaire à R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013 (20)

Plus de Christian Robert

Plus de Christian Robert (20)

Dernier

Dernier (20)

R exam (B) given in Paris-Dauphine, Licence Mido, Jan. 11, 2013