DPPs everywhere: repulsive point processes for Monte Carlo integration, signal processing and machine learning

1. DPPs everywhere Repulsive point processes for Monte Carlo integration, signal processing and machine learning R´emi Bardenet 1 , joint work with Ayoub Belhadji, Pierre Chainais, Julien Flamant, Guillaume Gautier, Adrien Hardy, Michal Valko 1 CNRS & CRIStAL, Univ. Lille, France R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 1

2. Projection determinantal point processes Monte Carlo with DPPs The zeros of time-frequency transforms of white noise DPPs for feature selection R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 2

4. Projection DPPs Let (ϕk )k=0,...,N−1 be an orthonormal sequence in L2 (ω). Let K(x, y) = N−1 k=0 ϕk (x)ϕk (y). Then p(x1, . . . , xN ) = 1 N! det K(xi , x ) N i, =1 ω(x1) . . . ω(xN ) is a probability density. In particular, P there is one particle in B(x, dx) = K(x, x)ω(x)dx. P there is one particle in B(x, dx) and one in B(y, dy) = K(x, x)ω(x)dx K(y, y)ω(y)dy − K(x, y)2 ω(x)ω(y)dxdy. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 4

5. What projection DPP samples look like −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 Left: independent, middle: Gaussian quadrature right: orthogonal polynomial ensemble (DPP). R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 5

7. A ﬁrst Monte Carlo result Theorem Let µ(dx) = ω(x)dx with ω separable, C 1 , positive on the open set (−1, 1)d , and satisfying a technical regularity assumption. Let ε > 0. If x1, . . . , xN stands for the associated multivariate OP Ensemble, then for every f C 1 vanishing outside [−1 + ε, 1 − ε], N1+1/d N i=1 f (xi ) KN (xi , xi ) − f (x)µ(dx) law −−−−→ N→∞ N 0, Ω2 f ,ω , where Ω2 f ,ω = 1 2 ∞ k1,...,kd =0 (k1 + · · · + kd ) f ω ω⊗d eq (k1, . . . , kd )2 , and ω⊗d eq (x) = π−d (1 − x2 )−1/2 . R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 7

8. A ﬁrst Monte Carlo result Theorem Let µ(dx) = ω(x)dx with ω separable, C 1 , positive on the open set (−1, 1)d , and satisfying a technical regularity assumption. Let ε > 0. If x1, . . . , xN stands for the associated multivariate OP Ensemble, then for every f C 1 vanishing outside [−1 + ε, 1 − ε], N1+1/d N i=1 f (xi ) KN (xi , xi ) − f (x)µ(dx) law −−−−→ N→∞ N 0, Ω2 f ,ω , where Ω2 f ,ω = 1 2 ∞ k1,...,kd =0 (k1 + · · · + kd ) f ω ω⊗d eq (k1, . . . , kd )2 , and ω⊗d eq (x) = π−d (1 − x2 )−1/2 . R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 7

9. Less assumptions on µ with importance sampling Wait a minute... We implicitely assumed to know all multivariate moments of µ. Theorem Let µ(dx) = ω(x)dx with ω C 1 on (−1, 1)d . Consider a measure q(x)dx satisfying the assumptions of the previous theorem, let KN (x, y) be the corresponding kernel, and x1, . . . , xN the associated multivariate OP Ensemble. Then, for every f as before, √ N1+1/d N i=1 f (xi ) KN (xi , xi ) ω(xi ) q(xi ) − f (x)µ(dx) law −−−−→ N→∞ N 0, Ω2 f ,ω , where Ω2 f ,ω is unchanged. From an importance sampling perspective, this asymptotic variance is puzzling. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 8

11. The windowed Fourier transform (STFT) x(t)+n(t) g(t) K∆t 0 T=N∆t Let Vg f (u, v) = f (t)g(t − u)e−2iπtv dt, R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 10

12. The windowed Fourier transform (STFT) time amplitude noisysignal original reconstructed Let Vg f (u, v) = f (t)g(t − u)e−2iπtv dt, R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 10

13. The zeros of the STFT of white noise 0.0 4.0 8.0 12.0 16.0 Time -8.0 -4.0 0.0 4.0 8.0 Frequency A “repulsive” point process. There are kernels behind the STFT. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 11

14. But... The zeros are not a DPP with Hermitian kernel. 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 r 0.0 0.2 0.4 0.6 0.8 1.0 1.2 g0(r) Ginibre Poisson WGN spectrogram planar GAF Ginibre is the DPP you could have expected. The zeros have the same distribution as the zeros of an √ n! zn . R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 12

15. Reconstruction using the empty space function time amplitude noisysignal original reconstructed R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 13

16. DPPs crawl back in The wavelets of Daubechies and Paul [6] Let α 0, ψα(t) = 1 t+i α+1 , and consider Wαf : x, s → f (t)ψα t − x s dt √ s . Theorem [3] When α = 0, and up to a conformal transformation, the zeros of the scalogram of white noise on H2 are the zeros of k 0 ak zk , with ak i.i.d. NC(0, 1). Those are a DPP, with kernel Bergman’s kernel. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 14

17. The zeros of the hyperbolic GAF Figure: A realization of the analytic wavelet transform of analytic white noise More transforms and more zeros to be found in [3]. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 15

19. DPPs for feature selection [4] Assume yi = Xi,:w∗ + ξi , i = 1, . . . , n, where ξi ∼ N(0, v) i.i.d. For a given estimator w = w(X, y), its excess risk is E(w) = Eξ Xw∗ − Xw 2 2 /N. (1) For the ordinary least squares estimator ˆw = X+ y, has E( ˆw) = v × rk(X) N . For the PCR E(w∗ k ) w∗ 2 σ2 k+1 N + vk N . Under the projection DPP of kernel Vk V T k , E E(wS ) 1 N 1 + k(p − k) w∗ 2 σ2 k+1 + vk N . R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 17

20. Discussion DPPs were first formalized in quantum optics [13]. Monte Carlo with DPPs is stochastic Gaussian quadrature [2] with explicit error rate in √ N−1−1/d , defined in any dimension, that couples repulsiveness and sampling, see also [5, 11, 8]. The zeros of some time-frequency transforms of white noise are well-known point processes [1, 3]. What can we do with it? Spatial statistics! Can every point process be associated with a transform? Which ones are DPPs? DPPs are good sketching tools [4]. Algebraic problem + summarization task → DPP! Rémi Bardenet (CNRS & Univ. Lille) DPPs everywhere 18

31. References I [1] R. Bardenet, J. Flamant, and P. Chainais. On the zeros of the spectrogram of white noise. Applied and Computational Harmonic Analysis, 2018. [2] R. Bardenet and A. Hardy. Monte Carlo with determinantal point processes. Under revision for Annals of Applied Probability; arXiv preprint arXiv:1605.00361, 2016. [3] R. Bardenet and A. Hardy. From random matrices to Monte Carlo integration via Gaussian quadrature. In Proceedings of the IEEE Statistical Signal Processing workshop (SSP), 2018. [4] A. Belhadji, R. Bardenet, and P. Chainais. A determinantal point process for column subset selection. Arxiv preprint:1812.09771, 2018. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 19

32. References II [5] Y. Chen, M. Welling, and A. Smola. Super-samples from kernel herding. In Proceedings of the conference on Uncertainty in Artiﬁcial Intelligence (UAI), 2010. [6] I. Daubechies and T. Paul. Time-frequency localization operators – a geometric phase space approach: Ii. the use of dilations. Inverse problems, 4:661–680, 1988. [7] P. J. Davis and P. Rabinowitz. Methods of numerical integration. Academic Press, New York, 2nd edition, 1984. [8] B. Delyon and F. Portier. Integral approximation by kernel smoothing. to appear in Bernoulli, 2016. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 20

33. References III [9] J. Dick, F. Y. Kuo, and I. H. Sloan. High-dimensional integration: the quasi-Monte Carlo way. Acta Numerica, 22:133–288, 2013. [10] W. Gautschi. Orthogonal polynomials: computation and approximation. Oxford University Press, USA, 2004. [11] F. Huszár and D. Duvenaud. Optimally-weighted herding is Bayesian quadrature. In Uncertainty in Artificial Intelligence (UAI), 2012. [12] K. Johansson. On fluctuations of eigenvalues of random Hermitian matrices. Duke Math. J., 91(1):151–204, 1998. [13] O. Macchi. The coincidence approach to stochastic point processes. Advances in Applied Probability, 7:83–122, 1975. Rémi Bardenet (CNRS & Univ. Lille) DPPs everywhere 21

34. References IV [14] C. P. Robert and G. Casella. Monte Carlo Statistical Methods. Springer-Verlag, New York, 2004. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 22

35. Some background on numerical integration R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 23

36. Numerical integration Let µ be a finite positive measure, supported in [−1, 1]d . Our goal is to find N nodes (xi ) and weights (wi ) so that f (x)µ(dx) ≈ N i=1 wi f (xi ), ∀f ∈ C , where C is a large class of functions. Typical methods include variants of Riemann integration [7], Gaussian quadrature [10], Monte Carlo methods [14]: importance sampling, MCMC, quasi-Monte Carlo methods [9] such as scrambled nets, each coming with its guarantees [blackboard]. Rémi Bardenet (CNRS & Univ. Lille) DPPs everywhere 24

37. The starting point Johansson [12] gives an example of random N × N unitary matrix with eigenvalues eiθ1 , . . . , eiθN , such that, for all f smooth enough, that is: σ2 := 2 ∞ k=0 k|ˆfk |2 < ∞, then N i=1 f (θi ) − N [0,2π] f (θ) dθ 2π law −−−−→ N→∞ N(0, σ2 ). There is no rescaling necessary. In other words, 1 N N i=1 f (θi ) − [0,2π] f (θ) dθ 2π ∼ 1 N . R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 25

38. The starting point Johansson [12] gives an example of random N × N unitary matrix with eigenvalues eiθ1 , . . . , eiθN , such that, for all f smooth enough, that is: σ2 := 2 ∞ k=0 k|ˆfk |2 < ∞, then N i=1 f (θi ) − N [0,2π] f (θ) dθ 2π law −−−−→ N→∞ N(0, σ2 ). There is no rescaling necessary. In other words, 1 N N i=1 f (θi ) − [0,2π] f (θ) dθ 2π ∼ 1 N . R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 25

39. Orthogonal polynomial ensembles Conversely, how do we build generic, multivariate DPPs such that a fast CLT holds? How fast a rate can we achieve? We understand orthogonal polynomials [10] and related kernels very well, so let us give them a try. In d = 1, we pick for (ϕk ) the ON polynomials with respect to µ, that is, ϕk is of order k and ϕk ϕ dµ = δk . This is starting to look very much like Gaussian quadrature... [blackboard] R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 26

40. Multivariate orthogonal polynomial ensembles Let µ be such that µ(A) > 0 for some non-empty open set A ⊂ [−1, 1]d . Choose an ordering b : N → Nd for the multi-indices (α1, . . . , αd ) ∈ Nd . Apply Gram-Schmidt to ordered list of monomial functions (x1, . . . , xd ) → xα1 1 · · · xαd d α=b(1),...,b(N) . We have built a sequence (ϕk ) of multivariate orthonormal polynomials, that we can use to deﬁne KN (x, y) = k∈b({1,...,n}) ϕk (x)ϕk (y). We use a particular ordering b that is crucial to our proofs. R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 27

41. An experiment −1 0 1 −1.0 −0.5 0.0 0.5 1.0 −1 0 1 −1 0 1 0.0 0.5 1.0 R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 28

42. Results 10−4 10−2variance MLE slope =-1.99 101 102 103 0.0 0.5 1.0 KSp-val. 0.05 d credible interval C theoretical slope sth = −1 − 1/d sth ∈ C? 1 [−2.12, −1.93] −2 R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 29

43. Results 10−3 10−2 10−1variance MLE slope =-1.49 101 102 103 0.0 0.5 1.0 KSp-val. 0.05 d credible interval C theoretical slope sth = −1 − 1/d sth ∈ C? 2 [−1.61, −1.38] −1.5 R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 30

44. Results 10−2 10−1 100variance MLE slope =-1.41 101 102 103 0.0 0.5 1.0 KSp-val. 0.05 d credible interval C theoretical slope sth = −1 − 1/d sth ∈ C? 3 [−1.55, −1.23] −1.33 R´emi Bardenet (CNRS & Univ. Lille) DPPs everywhere 31

DPPs everywhere: repulsive point processes for Monte Carlo integration, signal processing and machine learning

Recommandé

Recommandé

Contenu connexe

Tendances

Tendances (20)

Similaire à DPPs everywhere: repulsive point processes for Monte Carlo integration, signal processing and machine learning

Similaire à DPPs everywhere: repulsive point processes for Monte Carlo integration, signal processing and machine learning (20)

Plus de Advanced-Concepts-Team

Plus de Advanced-Concepts-Team (20)

Dernier

Dernier (20)

DPPs everywhere: repulsive point processes for Monte Carlo integration, signal processing and machine learning