Determinantal point processes (DPPs) are specific repulsive point processes, which were introduced in the 1970s by Macchi to model fermion beams in quantum optics. More recently, they have been studied as models and sampling tools by statisticians and machine learners. Important statistical quantities associated to DPPs have geometric and algebraic interpretations, which makes them a fun object to study and a powerful algorithmic building block.
After a quick introduction to determinantal point processes, I will discuss some of our recent statistical applications of DPPs. First, we used DPPs to sample nodes in numerical integration, resulting in Monte Carlo integration with fast convergence with respect to the number of integrand evaluations. Second, we used DPP machinery to characterize the distribution of the zeros of time-frequency transforms of white noise, a recent challenge in signal processing. Third, we turned DPPs into low-error variable selection procedures in linear regression.
DPPs everywhere: repulsive point processes for Monte Carlo integration, signal processing and machine learning
1. DPPs everywhere
Repulsive point processes for Monte Carlo integration, signal
processing and machine learning
R´emi Bardenet 1
,
joint work with Ayoub Belhadji, Pierre Chainais, Julien Flamant,
Guillaume Gautier, Adrien Hardy, Michal Valko
1
CNRS & CRIStAL, Univ. Lille, France
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 1
2. Projection determinantal point processes
Monte Carlo with DPPs
The zeros of time-frequency transforms of white noise
DPPs for feature selection
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 2
3. Projection determinantal point processes
Monte Carlo with DPPs
The zeros of time-frequency transforms of white noise
DPPs for feature selection
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 3
4. Projection DPPs
Let (ϕk )k=0,...,N−1 be an orthonormal sequence in L2
(ω).
Let
K(x, y) =
N−1
k=0
ϕk (x)ϕk (y).
Then
p(x1, . . . , xN ) =
1
N!
det K(xi , x )
N
i, =1
ω(x1) . . . ω(xN )
is a probability density.
In particular,
P there is one particle in B(x, dx) = K(x, x)ω(x)dx.
P
there is one particle in B(x, dx)
and one in B(y, dy)
= K(x, x)ω(x)dx K(y, y)ω(y)dy − K(x, y)2
ω(x)ω(y)dxdy.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 4
6. Projection determinantal point processes
Monte Carlo with DPPs
The zeros of time-frequency transforms of white noise
DPPs for feature selection
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 6
7. A first Monte Carlo result
Theorem
Let µ(dx) = ω(x)dx with ω separable, C 1
, positive on the open set
(−1, 1)d
, and satisfying a technical regularity assumption. Let ε > 0. If
x1, . . . , xN stands for the associated multivariate OP Ensemble, then for
every f C 1
vanishing outside [−1 + ε, 1 − ε],
N1+1/d
N
i=1
f (xi )
KN (xi , xi )
− f (x)µ(dx)
law
−−−−→
N→∞
N 0, Ω2
f ,ω ,
where
Ω2
f ,ω =
1
2
∞
k1,...,kd =0
(k1 + · · · + kd )
f ω
ω⊗d
eq
(k1, . . . , kd )2
,
and ω⊗d
eq (x) = π−d
(1 − x2
)−1/2
.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 7
8. A first Monte Carlo result
Theorem
Let µ(dx) = ω(x)dx with ω separable, C 1
, positive on the open set
(−1, 1)d
, and satisfying a technical regularity assumption. Let ε > 0. If
x1, . . . , xN stands for the associated multivariate OP Ensemble, then for
every f C 1
vanishing outside [−1 + ε, 1 − ε],
N1+1/d
N
i=1
f (xi )
KN (xi , xi )
− f (x)µ(dx)
law
−−−−→
N→∞
N 0, Ω2
f ,ω ,
where
Ω2
f ,ω =
1
2
∞
k1,...,kd =0
(k1 + · · · + kd )
f ω
ω⊗d
eq
(k1, . . . , kd )2
,
and ω⊗d
eq (x) = π−d
(1 − x2
)−1/2
.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 7
9. Less assumptions on µ with importance sampling
Wait a minute...
We implicitely assumed to know all multivariate moments of µ.
Theorem
Let µ(dx) = ω(x)dx with ω C 1
on (−1, 1)d
. Consider a measure q(x)dx
satisfying the assumptions of the previous theorem, let KN (x, y) be the
corresponding kernel, and x1, . . . , xN the associated multivariate OP
Ensemble. Then, for every f as before,
√
N1+1/d
N
i=1
f (xi )
KN (xi , xi )
ω(xi )
q(xi )
− f (x)µ(dx)
law
−−−−→
N→∞
N 0, Ω2
f ,ω ,
where Ω2
f ,ω is unchanged.
From an importance sampling perspective, this asymptotic variance
is puzzling.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 8
10. Projection determinantal point processes
Monte Carlo with DPPs
The zeros of time-frequency transforms of white noise
DPPs for feature selection
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 9
11. The windowed Fourier transform (STFT)
x(t)+n(t)
g(t)
K∆t
0 T=N∆t
Let
Vg f (u, v) = f (t)g(t − u)e−2iπtv
dt,
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 10
12. The windowed Fourier transform (STFT)
time
amplitude noisysignal
original
reconstructed
Let
Vg f (u, v) = f (t)g(t − u)e−2iπtv
dt,
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 10
13. The zeros of the STFT of white noise
0.0 4.0 8.0 12.0 16.0
Time
-8.0
-4.0
0.0
4.0
8.0
Frequency
A “repulsive” point process.
There are kernels behind the STFT.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 11
14. But...
The zeros are not a DPP with Hermitian kernel.
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0
r
0.0
0.2
0.4
0.6
0.8
1.0
1.2
g0(r)
Ginibre
Poisson
WGN spectrogram
planar GAF
Ginibre is the DPP you could have expected.
The zeros have the same distribution as the zeros of
an
√
n!
zn
.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 12
15. Reconstruction using the empty space function
time
amplitude
noisysignal
original
reconstructed
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 13
16. DPPs crawl back in
The wavelets of Daubechies and Paul [6]
Let α 0, ψα(t) = 1
t+i
α+1
, and consider
Wαf : x, s → f (t)ψα
t − x
s
dt
√
s
.
Theorem [3]
When α = 0, and up to a conformal transformation, the zeros of the
scalogram of white noise on H2
are the zeros of
k 0
ak zk
,
with ak i.i.d. NC(0, 1).
Those are a DPP, with kernel Bergman’s kernel.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 14
17. The zeros of the hyperbolic GAF
Figure: A realization of the analytic wavelet transform of analytic white noise
More transforms and more zeros to be found in [3].
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 15
18. Projection determinantal point processes
Monte Carlo with DPPs
The zeros of time-frequency transforms of white noise
DPPs for feature selection
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 16
19. DPPs for feature selection [4]
Assume yi = Xi,:w∗
+ ξi , i = 1, . . . , n, where ξi ∼ N(0, v) i.i.d.
For a given estimator w = w(X, y), its excess risk is
E(w) = Eξ Xw∗
− Xw 2
2 /N. (1)
For the ordinary least squares estimator ˆw = X+
y, has
E( ˆw) = v ×
rk(X)
N
.
For the PCR
E(w∗
k )
w∗ 2
σ2
k+1
N
+
vk
N
.
Under the projection DPP of kernel Vk V T
k ,
E E(wS )
1
N
1 + k(p − k) w∗ 2
σ2
k+1 +
vk
N
.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 17
20. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
21. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
22. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
23. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
24. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
25. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
26. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
27. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
28. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
29. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
30. Discussion
DPPs were first formalized in quantum optics [13].
Monte Carlo with DPPs is stochastic Gaussian quadrature [2]
with explicit error rate in
√
N−1−1/d ,
defined in any dimension,
that couples repulsiveness and sampling, see also [5, 11, 8].
The zeros of some time-frequency transforms of white noise are
well-known point processes [1, 3].
What can we do with it? Spatial statistics!
Can every point process be associated with a transform?
Which ones are DPPs?
DPPs are good sketching tools [4].
Algebraic problem + summarization task → DPP!
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18
31. References I
[1] R. Bardenet, J. Flamant, and P. Chainais.
On the zeros of the spectrogram of white noise.
Applied and Computational Harmonic Analysis, 2018.
[2] R. Bardenet and A. Hardy.
Monte Carlo with determinantal point processes.
Under revision for Annals of Applied Probability; arXiv preprint
arXiv:1605.00361, 2016.
[3] R. Bardenet and A. Hardy.
From random matrices to Monte Carlo integration via Gaussian
quadrature.
In Proceedings of the IEEE Statistical Signal Processing workshop (SSP),
2018.
[4] A. Belhadji, R. Bardenet, and P. Chainais.
A determinantal point process for column subset selection.
Arxiv preprint:1812.09771, 2018.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 19
32. References II
[5] Y. Chen, M. Welling, and A. Smola.
Super-samples from kernel herding.
In Proceedings of the conference on Uncertainty in Artificial Intelligence
(UAI), 2010.
[6] I. Daubechies and T. Paul.
Time-frequency localization operators – a geometric phase space
approach: Ii. the use of dilations.
Inverse problems, 4:661–680, 1988.
[7] P. J. Davis and P. Rabinowitz.
Methods of numerical integration.
Academic Press, New York, 2nd edition, 1984.
[8] B. Delyon and F. Portier.
Integral approximation by kernel smoothing.
to appear in Bernoulli, 2016.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 20
33. References III
[9] J. Dick, F. Y. Kuo, and I. H. Sloan.
High-dimensional integration: the quasi-Monte Carlo way.
Acta Numerica, 22:133–288, 2013.
[10] W. Gautschi.
Orthogonal polynomials: computation and approximation.
Oxford University Press, USA, 2004.
[11] F. Husz´ar and D. Duvenaud.
Optimally-weighted herding is Bayesian quadrature.
In Uncertainty in Artificial Intelligence (UAI), 2012.
[12] K. Johansson.
On fluctuations of eigenvalues of random Hermitian matrices.
Duke Math. J., 91(1):151–204, 1998.
[13] O. Macchi.
The coincidence approach to stochastic point processes.
Advances in Applied Probability, 7:83–122, 1975.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 21
34. References IV
[14] C. P. Robert and G. Casella.
Monte Carlo Statistical Methods.
Springer-Verlag, New York, 2004.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 22
35. Some background on numerical integration
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 23
36. Numerical integration
Let µ be a finite positive measure, supported in [−1, 1]d
. Our goal is to
find N nodes (xi ) and weights (wi ) so that
f (x)µ(dx) ≈
N
i=1
wi f (xi ), ∀f ∈ C ,
where C is a large class of functions. Typical methods include
variants of Riemann integration [7],
Gaussian quadrature [10],
Monte Carlo methods [14]: importance sampling, MCMC,
quasi-Monte Carlo methods [9] such as scrambled nets,
each coming with its guarantees [blackboard].
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 24
37. The starting point
Johansson [12] gives an example of random N × N unitary matrix
with eigenvalues eiθ1
, . . . , eiθN
, such that, for all f smooth enough,
that is:
σ2
:= 2
∞
k=0
k|ˆfk |2
< ∞,
then
N
i=1
f (θi ) − N
[0,2π]
f (θ)
dθ
2π
law
−−−−→
N→∞
N(0, σ2
).
There is no rescaling necessary. In other words,
1
N
N
i=1
f (θi ) −
[0,2π]
f (θ)
dθ
2π
∼
1
N
.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 25
38. The starting point
Johansson [12] gives an example of random N × N unitary matrix
with eigenvalues eiθ1
, . . . , eiθN
, such that, for all f smooth enough,
that is:
σ2
:= 2
∞
k=0
k|ˆfk |2
< ∞,
then
N
i=1
f (θi ) − N
[0,2π]
f (θ)
dθ
2π
law
−−−−→
N→∞
N(0, σ2
).
There is no rescaling necessary. In other words,
1
N
N
i=1
f (θi ) −
[0,2π]
f (θ)
dθ
2π
∼
1
N
.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 25
39. Orthogonal polynomial ensembles
Conversely, how do we build generic, multivariate DPPs such that a
fast CLT holds? How fast a rate can we achieve?
We understand orthogonal polynomials [10] and related kernels very
well, so let us give them a try.
In d = 1, we pick for (ϕk ) the ON polynomials with respect to µ,
that is, ϕk is of order k and ϕk ϕ dµ = δk .
This is starting to look very much like Gaussian quadrature...
[blackboard]
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 26
40. Multivariate orthogonal polynomial ensembles
Let µ be such that µ(A) > 0 for some non-empty open set
A ⊂ [−1, 1]d
.
Choose an ordering b : N → Nd
for the multi-indices
(α1, . . . , αd ) ∈ Nd
.
Apply Gram-Schmidt to ordered list of monomial functions
(x1, . . . , xd ) → xα1
1 · · · xαd
d
α=b(1),...,b(N)
.
We have built a sequence (ϕk ) of multivariate orthonormal
polynomials, that we can use to define
KN (x, y) =
k∈b({1,...,n})
ϕk (x)ϕk (y).
We use a particular ordering b that is crucial to our proofs.
R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 27