SlideShare une entreprise Scribd logo
1  sur  116
Télécharger pour lire hors ligne
Data analysis and Its applications
      on Asteroseismology
                Olga Moreira
                 April 2005

              DEA en Sciences
             “Astérosismologie”
           Lectured by Anne Thoul
Outline
Principles of data analysis         Introduction to spectral analysis

Introduction                        Fourier analysis
                                    Fourier transform
                                    Power spectrum estimation
Merit functions an parameters
fitting
                                    Deconvolution analysis
Maximum Likelihood Estimator
                                    CLEAN
                                    All poles
Maximization/Minimization Problem
Ordinary methods                    Phase dispersion Minimization
Exotic methods                      Period search

Goodness-of-fit                     Wavelet analysis
Chi-square test                     Wavelets transform and Its applications
K-S test


The beauty of synthetic data
Monte-Carlo simulations
Hare-and-Hounds game
Part I
Principles of data analysis
Introduction
What do you think of when someone say “data”?




                        Roxbourg & Paternó   - Eddington
                        Workshop (Italy)
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What all those definitions of data have in common?




Incomplete        Probability
                                     Inferences
information         theory




  Data               Tools            Analysis
Analysis Method

Merit function    Best fit     Goodness-of-fit
Analysis Method


A complete analysis should provide:

    Parameters;

    Error estimates on the parameters;

    A statistical measure of the goodness-of-fit


         Ignoring the 3rd step will bring
         drastical consequences
Merit functions and
parameters fitting
Maximum Likelihood Estimators (MLE)
λ = λ       λ        λ           : Set of   parameters
    =                            : Set of   random variables
    =           λ                : Probability distribution characterized by λ and

The posteriori probability of a single measurement is given by:

                           =           λ
If are a set of independents and identical distributed (i.i.d) then the joint probability
function becomes:

                            =   ∏ =
                                                 λ

Where           λ =∏            λ is defined as the Likelihood
                     =

•   The best fit of parameters is the one that maximizes the likelihood.
It is common to find defined the as the likelihood., but in fact is
just the logarithm of the likelihood, which more easy to work with.



   =           =                 λ    or     = −
                   =




 Posteriori probability is the probability after the event under no
circuntances should the likelihood be confused with probability
density.
Error Estimate
                 λ




Gaussian shape       Non-guassian shape          Non-guassian shape
                     with a single               with several local
 λ = λ ± ∆λ          extreme:                    extremes:
 ∆λ ≈ σ
                     • No problem on the         • Problems on the
                     determination of            determination of
                     maximum, although it        maximum
                     can represent some          • Problems on the error
                     difficulties to the error   bars estimative.
                     bars estimative.
Estimator: Desirable properties
 Unbiased:                             Minimum variance :

     λ = λ −λ =                          σ λ →


       Information inequality/Cramer-Rao inequality:
          + ′λ
σ λ ≥                                  λ =         ′ =−
              λ

               ′ λ =            σ     λ ≥
                                                  λ

 •   The larger the information the smaller the variance
MLE asymptotically unbiased
 ′ λ     =        λ    +     λ − λ     λ   +

Neglecting the largers orders and    →∞          λ =−       λ

                                                  − λ −λ
  λ =−                λ −λ    +      λ ≈
                                                     σ      λ
              λ

The MLE function has the form of normal distribution with   σ λ =
and :
                                                                    λ

                      λ = λ ± σ        λ
In multi-dimensions:

 =                         λ= λ λ    λ                    λ =      λ =              λ
                                                                         =

                  ∂ λ                             ∂ λ
 λ = λ +                    λ −λ −                        λ −λ λ −λ +
             =     ∂λ                    =    =   ∂λ ∂λ

 λ = λ +         λ −λ       λ −λ +

                            ∂ λ
 →∞      =             =                              Hessian matrix
                            ∂λ ∂λ

 =         λ −λ            λ −λ                      Multivariate gaussian distribution




                                                  λ λ
     λ λ =         −
                             ρ λ λ =                            σ λ =          −

                                             σ λ σ λ
If        λ λ      =       ↔       ≠


  λ    ≈ λ      ± σ     λ                          =




      If        λ λ      ≠       ↔       ≠

Tere is an error region not only defined by σ λ       but by the complete
covariance matrix . For instance in 2D the error region defines an elipse.
Least-square and Chi-square fit
1. Considering one measures with errors that are independently and normal
   distributed around the true value

2. The standard deviations σ are the same for all points.

Then joint probability for !"# is given by $
                             ##

                         ∝∏         −       −
                                                    ∆
                            =                   σ
Maximazing     is the same as minimazing


                                    −

                                =       σ
   The least-square fitting is the MLE of the fitted parameters if the measurements
are independent and normally distributed.


3.   If the deviations are different σ ! σ then :           −
                                                                      =χ
                                                        =   σ
Limitations:
    Real data most of the time violate the i.i.d condition
    Sometimes one have a limited sample
    In practice depends on the        λ behaviour

    The MLE grants a unique solution.
              α =α λ
              ∂        ∂ ∂α             ∂   ∂
                 =       ⋅   =            =
              ∂λ       ∂α ∂λ            ∂α ∂λ

But the uncertity of an estimate depends on the specifique choice of λ

                           λ                α                 α
                                                 ∂λ
                                λ   λ                 α            α   λ
                                                 ∂α
         λ <λ<λ        =   λ
                           +∞
                                        =   α
                                            +∞
                                                          ≠   α
                                                              +∞
                                                 ∂λ
                                λ   λ                 α            α   λ
                           −∞               −∞
                                                 ∂α           −∞
Example: modes stochastically excited
  ν =%ν χ ν

For a single mode:
                     Γ
% ν    =                             +
                             Γ
           ν   −ν        +

                                           ν
  ν    λ   =                     −
               % ν   λ               % ν       λ


                                                                                       ν
                                                   =−        =          %ν λ +
                                                                 =                    %ν

                                                                       Minimization

                                                   λ=   Γν           →λ =        Γν
Maximization/Minimization
        Problem
Going “Downhill” Methods
Finding a global extreme is general very
                                                             Press et al.(1992)
difficult.

For one dimensional minimization
Usually there are two types of methods:
• Methods that bracket the minimum: Golden
section search, and parabolic interpolation
(Brent’s Method)
• Methods that use the first derivative
                                                                         Press et al.(1992)
information.

                                       Multidimensional there are three kind of
                                       methods:
                                       • Direction-set methods. Powell’s method
                                       is the prototype.
                                       • Downhill Simplex method.
                                       • Methods that use the gradient
                                       information.
                        Adapted
                        from Press
                        et al.(1992)
Falling in the wrong valley

The downhill methods a lack on
efficiency/robustness. For instance the simplex
method can very fast for some functions and very
slow for others.

They depend on priori knowledge of the overall
structure of vector space, and require repeated
manual intervention.



                                    If the function to minimize is not well-known,
                                    sometimes, numerically speaking, a smooth hill
                                    can become an headache.

                                    They also don’t solve the famous combinatorial
                                    analysis problem :
                                    The traveling salesman problem
Exotic Methods

              Solving “The traveling salesman problem”:
              A salesman has to visit each city on a given list, knowing the
              distance between all cities will try to minimize the length of his tour.


Methods available:

Simulated Annealing: based on an analogy
with thermodynamics.

Genetic algorithms: based on an analogy
to evolutionary selection rules.

Nearest Neighbor

Neural networks :based on the observation
of biological neural network (brains).

Knowledge-based systems, etc …                        Adapted from Charbonneau (1995)
Goodness-of-fit
Chi-square test

                    −                      : is the number of events observed in
 χ =                                     the ith bin
                                           : is the number expected according to
          =                              some known distribution


               +            +   -   +#     ,      ,#   + ! , #$%
                                                               ,


               )            (              "           ( !" #$% &
                                                              " $

               '                           "                !" #$% & "
                                                                    $

               '            *              *            "



H0: The data follow a specified distribution

Significance level is determined by        - χ &
& is the degree of freedom: & ! ' ( + +, ) +
                                   )*           )+ + ,
Normally acceptable models have - . /# , but day-in and day-out we find
                                             //"
accepted models with -!"        / '0
Kolmogorov-Smirnov (K-S) test
                                                                                                Press et al.(1992)
% ( ) : Cumulative distribution
 ( ) : Known cumulative distribution
1 : Maximum absolute difference
between the two cumulative functions

The significance of an observed value
of D is given approximately by:

                                                                        1 =                 %    −
℘ 1>        =-   ,                   +   +       +          1                 −∞ >   > +∞


                                                        +

            ∞
                             −
-       =       −                        −                          =
                                                                        +
    ,                                                           +
            =


- , is a monotonic function with limits values:

                -        ,                   =       : Largest agreement
                -    ,           ∞           =       : Smallest agreement
Synthetic data
Monte-Carlo simulations
           If one know something about the
           process that generated our data , given
           an assumed set of parameters l then
           one can figure out how to simulate our
           own sets of “synthetic” realizations of
           these parameters. The procedure is to
           draw random numbers from
           appropriate distribution so as to mimic
           our best understanding of the
           underlying processes and
           measurement errors.

                    Stello et al. (2004) xi-hya
Hare-and-Hounds game

          Team A:    generates theoretical mode frequencies and synthetic
time series.
            Team B: analyses the time series, performs the mode identification
and fitting, does the structure inversion
Rules: The teams only have access to time series. Nothing else is allowed.
End of Part I
Options available :
 • Questions
 • Coffee break
 • “Get on with it !!!”
Part II

Introduction to spectral
       analysis
Fourier transform
Properties:
                                                +∞
                   2               = 2( & ) =                    π&
                                                −∞

                   2   ( ( )+ 3 ) = 2 &              +4 &
                                                                                   &
                   2                    =2&                  2   (    )=       2
                              +∞

                       ( )=            3 +(          ⇔ 2         (    )=2&     ⋅4 &
                              −∞

                   2   ( ( )⊗ 3 ) = 2 &               ⋅4 &


Parseval’s Theorem:
The power of a signal represented by f(t) is the same whether computed in time space
or in frequency space:
                +∞                                   +∞
                                                =         2 &              &
                −∞                                   −∞
Sampling theorem

                                     γ                       ⋅γ




              2&                     ϒ&                  2 & ⊗ϒ &




Adapted from Bracewell (1986)




For a bandlimited signal,     which has no components the frequency & , the
sampling theorem states that a real signal    can be reconstructed without error
from samples taken uniformly at &.5& . The minimum sampling frequency,
2 & !5& is called the Nyquist frequency, corresponding to the sampling interval
  !"5& ( where ! ).
    6
Undersampling
                                               The sampling theorem assumes that a
                                               signal is limited in frequency but in
                                               practice the signal is time limited. For
                1 alias spectrum
                                                 ."5& then signal the signal is
                                                   6
  Spectrum
                                               undersampled. Overlying tails appear
                                               in spectrum, spectrum alias.



 Adapted from Bracewell (1986)




Aliasing :
Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)):

   The undersampled FT is evener than the complete FT as consequence the
sampling procedure discriminates the zero components at &!&

      There is a leakage of the high frequencies (aliasing)
Discrete Fourier transform

Discrete Fourier transform


         2    &    =                       π&           =   δ        =
                           =


Discrete form of Parseval’s theorem:


                                 =         2 &
                       =               =




 Fast Fourier Transform (FFT):

 The FFT is a discrete Fourier transform algorithm which reduces the number of
 computation of N pints from 5 5 to 5    3 . This is done by means of Danielson-
 Lanczos lemma, which basic idea id to break a transform of length to 5
 transforms of length 6 5.
Power spectrum estimation

Periodogram:
           & =       2 &      =                         π&
                                      =



           & =                            π&        +                    π&
                         =                               =



If contains periodic signal i.e.:
                             =3       +7                  Random noise

                     3       =            π&    +ϕ

Then at & =&/ there is a large contribution in the sum , for other values the terms in
the sum will be randomly negative and positive, yielding to small contribution. Thus
a peak in the periodogram reveals the existence of a periodic embedded signal.
Frequencies leakage:
   Leakage from nearby frequencies, which is described usually as a spectral
window and is a primarily product of the finite length of data.

    Leakage from high frequencies, due to data sampling, the aforementioned
aliasing.

Tapering functions: Sometimes also called as data windowing. These functions
try to smooth the leakage between frequencies bringing the interference slowly
back to zero. The main goal is to narrow the peak and vanish the side lobes.
Smoothing can represents in certain cases loss of information.




                       Press et al.(1992)                        Press et al.(1992)
Futher complications

          Closely spaced frequencies:
          Direct contribution for the first
          aforementioned leakage.
            2(         +         )    = 2 & +2 &
          = 2 &       + 2 &       +    2 & 2 &




          Damping:

                 =(           π&      −ϕ      )   −η

          The peak in power spectrum will
          have a Lorentzian profile
Power spectrum of random noise
        =3     +7
7       →,              ,        ,        +,,
3       → +         ,3

The estimation of spectral density:

ρ & =        γ 7            π&
         =

γ 7      → (        +                7

Thus :
                                         No matter how much one increase the number of
    7   & =σ 7                           points, N, the signal-to-noise will tend to be
                                         constant.


For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeed
it’s only valid for homogeneous white noise (independent and identically distributed
normal random variables)
Filling gaps

The unevenly spaced data problem can be solve by (few suggestions):

    Finding a way to reduce the unevenly spaced sample into a evenly spaced.

    Basic idea: Interpolation of the missing points (problem: Doesn’t work for
long gaps)

    Using the Lomb-Scargle periodogram


    Doing a deconvolution analysis (Filters)
Lomb-Scargle peridogram

                                      &       − τ                      &     − τ
                  =                                      =
   &   =                                            +
                                  &       − τ                      &       − τ
                          =                                  =



                              &
τ =           −       =

       &
                              &
                      =




    It’s like weighting the data on a “per point” basis instead on a “per time
interval” basis, which make independent on sampling irregularity.

    It has an exponential probability distribution with unit mean, which means
one can establish a false-alarm probability of the null hypothesis (significance
level).

                  9> 9 = −                −         −9   8
                                                             ≈ 8           −9
Deconvolution analysis
Deconvolution

                     2 & ⊗ % &             = 3 & + ε &


                                               signal     noise




    Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable to
incomplete sampling (irregular sampling) of spatial frequency.

   Non-linear algorithm: CLEAN, All poles.

   Problem : The deconvolution usually does not a unique solutions.
Hogbom CLEAN algorithm
The first CLEAN method was developed by Hogbom (1974). It constructs
discrete approximations of the clean map from the convolution equation:

                                      ⊗       =
Starting with /=/ , it searches for the largest value in the residual map:


                            =     −       ⊗       −


After locating the largest residual of given amplitude, it subtracts it from to to
yield to . The iteration continues until root-mean-square (RMS) decreases to
some level. Each subtracted location is saved in so-called CLEAN map. The
resulting final map denoted by       it is assumed that is mainly in noise.
CLEAN algorithm
The basic steps of the CLEAN algorithm used in asteroseismology are:

1. Compute the power spectra of the signal and identify the dominant period

2. Perform a least-square fit to the data to obtain the amplitude and phase of
   the identified mode.

3. Constructs the time series corresponding to that single mode and subtracts
   from the original signal to obtain a new signal

4. Repeats all steps until all its left is noise.



Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting
    the frequency it recalculates the amplitude, phase and frequencies of the
    previous subtracted peaks while fixing the frequency of the latest extracted
    peak.
All poles
: =                       π &δ        &      =           2 &          =                         :
                                                                                 =

The discrete FT is a particular case of the Z-transform (unilateral):
                                                   +∞
                           :           =                           :
                                                    =

It turns up that one can have some advantages by doing the following approximation:
                                                     &    ≈
                                                                       8

     Press et al.(1992)
                                                                  +                  :
                                                                          =

                                          The notable fact is that the equation allows to have
                                          poles, corresponding to infinite spectral power density,
                                          on the unit z-circle (at the real frequencies of the Nyquist
                                          interval), and such poles can provide an accurate
                                          representation for underlying power spectra that have
                                          sharp discrete “lines” or delta-functions. M is called the
                                          number of poles. This approximation does under several
                                          names all-poles model, Maximum Entropy method
                                          (MEM), auto regressive model (AR).
Phase dispersion minimization
           PDM
Definitions
A discrete set of observations can be represented by to vectors, the magnitudes
and the observation times ( with !"; ). Thus the variance of is given:


                                         −
                 σ    =     =
                                                      =
                                         −                 =

Suppose that one divides the initial set into several subsets/samples. If M are the
number samples, having , variances, and containing       data points then the over all
variance for all the samples is given by:



                                             −   ,
                                     =
                        %       =
                                             − 8
                                     =
PDM as period search method
Suppose that one want to minimize the
variance of a data set with respect to the
mean light curve.
The phase vector is given:

                   −
 φ          =
Considering as a function of the
phase, the variance of these
samples gives a scatter around the
mean light curve.
Defining :
        %
Θ =
       σ
If P is not the true period % ≈ σ            Θ ≈
If P is true value then Θ will reach a local minimum.
Mathematically, the PDM is a least-square fitting, but rather than fitting a given
curve, is a fitting relatively to mean curve as defined by means of each bin,
simultaneously one obtain the best period.
Wavelets
Wavelets transform
Wavelets are a class of functions used to localize a given function in both space and
scaling. A family of wavelets can be constructed from a function Ψ
                                                                      *
                                                                           sometimes
known as the “mother wavelet” which is confined in a finite interval. The “daughter
wavelets” Ψ       are then formed by translation of (b) and contraction of (a).

An individual wavelet can be written as:
                                  − *
   Ψ    *
               =          Ψ

Then the wavelet transform is given by:
                                           +∞
                                                            − *
       <                   * =                          Ψ
                                           −∞
                          +∞ +∞
                                                             −
                =     Ψ               Ψ         *
                                                    Ψ   *
                                                                     *
                          −∞ −∞
Applications in variable stars




 Szatmáry et al. (1994) - fig. 17: Double mode oscillation.
Conclusion
Short overview
   Data analysis results must never be subjective, it should return the best fitting
parameters, the underlying errors, accuracy of the fitted model. All the provided
statistical information must be clear.

   Because data is necessary in all scientific fields there a bunch methods for
optimization, merit functions, spectral analysis… Therefore, sometimes is not easy to
decided which method is the ideal method. Most of the time it the decision
dependents on the data to be analyzed.

  All that has been considering here, was the case of a deterministic signal (a fixed
amplitude) add to random noise. Sometimes the signal itself is probabilistic
Data analysis and Its applications
      on Asteroseismology
                Olga Moreira
                 April 2005

              DEA en Sciences
             “Astérosismologie”
           Lectured by Anne Thoul
Outline
Principles of data analysis         Introduction to spectral analysis

Introduction                        Fourier analysis
                                    Fourier transform
                                    Power spectrum estimation
Merit functions an parameters
fitting
                                    Deconvolution analysis
Maximum Likelihood Estimator
                                    CLEAN
                                    All poles
Maximization/Minimization Problem
Ordinary methods                    Phase dispersion Minimization
Exotic methods                      Period search

Goodness-of-fit                     Wavelet analysis
Chi-square test                     Wavelets transform and Its applications
K-S test


The beauty of synthetic data
Monte-Carlo simulations
Hare-and-Hounds game
Part I
Principles of data analysis
Introduction
What do you think of when someone say “data”?




                        Roxbourg & Paternó   - Eddington
                        Workshop (Italy)
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What do you think of when someone say “data”?
What all those definitions of data have in common?




Incomplete        Probability
                                     Inferences
information         theory




  Data               Tools            Analysis
Analysis Method

Merit function    Best fit     Goodness-of-fit
Analysis Method


A complete analysis should provide:

    Parameters;

    Error estimates on the parameters;

    A statistical measure of the goodness-of-fit


         Ignoring the 3rd step will bring
         drastical consequences
Merit functions and
parameters fitting
Maximum Likelihood Estimators (MLE)
λ = λ       λ        λ           : Set of   parameters
    =                            : Set of   random variables
    =           λ                : Probability distribution characterized by λ and

The posteriori probability of a single measurement is given by:

                           =           λ
If are a set of independents and identical distributed (i.i.d) then the joint probability
function becomes:

                            =   ∏ =
                                                 λ

Where           λ =∏            λ is defined as the Likelihood
                     =

•   The best fit of parameters is the one that maximizes the likelihood.
It is common to find defined the as the likelihood., but in fact is
just the logarithm of the likelihood, which more easy to work with.



   =           =                 λ    or     = −
                   =




 Posteriori probability is the probability after the event under no
circuntances should the likelihood be confused with probability
density.
Error Estimate
                 λ




Gaussian shape       Non-guassian shape          Non-guassian shape
                     with a single               with several local
 λ = λ ± ∆λ          extreme:                    extremes:
 ∆λ ≈ σ
                     • No problem on the         • Problems on the
                     determination of            determination of
                     maximum, although it        maximum
                     can represent some          • Problems on the error
                     difficulties to the error   bars estimative.
                     bars estimative.
Estimator: Desirable properties
 Unbiased:                             Minimum variance :

     λ = λ −λ =                          σ λ →


       Information inequality/Cramer-Rao inequality:
          + ′λ
σ λ ≥                                  λ =         ′ =−
              λ

               ′ λ =            σ     λ ≥
                                                  λ

 •   The larger the information the smaller the variance
MLE asymptotically unbiased
 ′ λ     =        λ    +     λ − λ     λ   +

Neglecting the largers orders and    →∞          λ =−       λ

                                                  − λ −λ
  λ =−                λ −λ    +      λ ≈
                                                     σ      λ
              λ

The MLE function has the form of normal distribution with   σ λ =
and :
                                                                    λ

                      λ = λ ± σ        λ
In multi-dimensions:

 =                         λ= λ λ    λ                    λ =      λ =              λ
                                                                         =

                  ∂ λ                             ∂ λ
 λ = λ +                    λ −λ −                        λ −λ λ −λ +
             =     ∂λ                    =    =   ∂λ ∂λ

 λ = λ +         λ −λ       λ −λ +

                            ∂ λ
 →∞      =             =                              Hessian matrix
                            ∂λ ∂λ

 =         λ −λ            λ −λ                      Multivariate gaussian distribution




                                                  λ λ
     λ λ =         −
                             ρ λ λ =                            σ λ =          −

                                             σ λ σ λ
If        λ λ      =       ↔       ≠


  λ    ≈ λ      ± σ     λ                          =




      If        λ λ      ≠       ↔       ≠

Tere is an error region not only defined by σ λ       but by the complete
covariance matrix . For instance in 2D the error region defines an elipse.
Least-square and Chi-square fit
1. Considering one measures with errors that are independently and normal
   distributed around the true value

2. The standard deviations σ are the same for all points.

Then joint probability for !"# is given by $
                             ##

                         ∝∏         −       −
                                                    ∆
                            =                   σ
Maximazing     is the same as minimazing


                                    −

                                =       σ
   The least-square fitting is the MLE of the fitted parameters if the measurements
are independent and normally distributed.


3.   If the deviations are different σ ! σ then :           −
                                                                      =χ
                                                        =   σ
Limitations:
    Real data most of the time violate the i.i.d condition
    Sometimes one have a limited sample
    In practice depends on the        λ behaviour

    The MLE grants a unique solution.
              α =α λ
              ∂        ∂ ∂α             ∂   ∂
                 =       ⋅   =            =
              ∂λ       ∂α ∂λ            ∂α ∂λ

But the uncertity of an estimate depends on the specifique choice of λ

                           λ                α                 α
                                                 ∂λ
                                λ   λ                 α            α   λ
                                                 ∂α
         λ <λ<λ        =   λ
                           +∞
                                        =   α
                                            +∞
                                                          ≠   α
                                                              +∞
                                                 ∂λ
                                λ   λ                 α            α   λ
                           −∞               −∞
                                                 ∂α           −∞
Example: modes stochastically excited
  ν =%ν χ ν

For a single mode:
                     Γ
% ν    =                             +
                             Γ
           ν   −ν        +

                                           ν
  ν    λ   =                     −
               % ν   λ               % ν       λ


                                                                                       ν
                                                   =−        =          %ν λ +
                                                                 =                    %ν

                                                                       Minimization

                                                   λ=   Γν           →λ =        Γν
Maximization/Minimization
        Problem
Going “Downhill” Methods
Finding a global extreme is general very
                                                             Press et al.(1992)
difficult.

For one dimensional minimization
Usually there are two types of methods:
• Methods that bracket the minimum: Golden
section search, and parabolic interpolation
(Brent’s Method)
• Methods that use the first derivative
                                                                         Press et al.(1992)
information.

                                       Multidimensional there are three kind of
                                       methods:
                                       • Direction-set methods. Powell’s method
                                       is the prototype.
                                       • Downhill Simplex method.
                                       • Methods that use the gradient
                                       information.
                        Adapted
                        from Press
                        et al.(1992)
Falling in the wrong valley

The downhill methods a lack on
efficiency/robustness. For instance the simplex
method can very fast for some functions and very
slow for others.

They depend on priori knowledge of the overall
structure of vector space, and require repeated
manual intervention.



                                    If the function to minimize is not well-known,
                                    sometimes, numerically speaking, a smooth hill
                                    can become an headache.

                                    They also don’t solve the famous combinatorial
                                    analysis problem :
                                    The traveling salesman problem
Exotic Methods

              Solving “The traveling salesman problem”:
              A salesman has to visit each city on a given list, knowing the
              distance between all cities will try to minimize the length of his tour.


Methods available:

Simulated Annealing: based on an analogy
with thermodynamics.

Genetic algorithms: based on an analogy
to evolutionary selection rules.

Nearest Neighbor

Neural networks :based on the observation
of biological neural network (brains).

Knowledge-based systems, etc …                        Adapted from Charbonneau (1995)
Goodness-of-fit
Chi-square test

                    −                      : is the number of events observed in
 χ =                                     the ith bin
                                           : is the number expected according to
          =                              some known distribution


               +            +   -   +#     ,      ,#   + ! , #$%
                                                               ,


               )            (              "           ( !" #$% &
                                                              " $

               '                           "                !" #$% & "
                                                                    $

               '            *              *            "



H0: The data follow a specified distribution

Significance level is determined by        - χ &
& is the degree of freedom: & ! ' ( + +, ) +
                                   )*           )+ + ,
Normally acceptable models have - . /# , but day-in and day-out we find
                                             //"
accepted models with -!"        / '0
Kolmogorov-Smirnov (K-S) test
                                                                                                Press et al.(1992)
% ( ) : Cumulative distribution
 ( ) : Known cumulative distribution
1 : Maximum absolute difference
between the two cumulative functions

The significance of an observed value
of D is given approximately by:

                                                                        1 =                 %    −
℘ 1>        =-   ,                   +   +       +          1                 −∞ >   > +∞


                                                        +

            ∞
                             −
-       =       −                        −                          =
                                                                        +
    ,                                                           +
            =


- , is a monotonic function with limits values:

                -        ,                   =       : Largest agreement
                -    ,           ∞           =       : Smallest agreement
Synthetic data
Monte-Carlo simulations
           If one know something about the
           process that generated our data , given
           an assumed set of parameters l then
           one can figure out how to simulate our
           own sets of “synthetic” realizations of
           these parameters. The procedure is to
           draw random numbers from
           appropriate distribution so as to mimic
           our best understanding of the
           underlying processes and
           measurement errors.

                    Stello et al. (2004) xi-hya
Hare-and-Hounds game

          Team A:    generates theoretical mode frequencies and synthetic
time series.
            Team B: analyses the time series, performs the mode identification
and fitting, does the structure inversion
Rules: The teams only have access to time series. Nothing else is allowed.
End of Part I
Options available :
 • Questions
 • Coffee break
 • “Get on with it !!!”
Part II

Introduction to spectral
       analysis
Fourier transform
Properties:
                                                +∞
                   2               = 2( & ) =                    π&
                                                −∞

                   2   ( ( )+ 3 ) = 2 &              +4 &
                                                                                   &
                   2                    =2&                  2   (    )=       2
                              +∞

                       ( )=            3 +(          ⇔ 2         (    )=2&     ⋅4 &
                              −∞

                   2   ( ( )⊗ 3 ) = 2 &               ⋅4 &


Parseval’s Theorem:
The power of a signal represented by f(t) is the same whether computed in time space
or in frequency space:
                +∞                                   +∞
                                                =         2 &              &
                −∞                                   −∞
Sampling theorem

                                     γ                       ⋅γ




              2&                     ϒ&                  2 & ⊗ϒ &




Adapted from Bracewell (1986)




For a bandlimited signal,     which has no components the frequency & , the
sampling theorem states that a real signal    can be reconstructed without error
from samples taken uniformly at &.5& . The minimum sampling frequency,
2 & !5& is called the Nyquist frequency, corresponding to the sampling interval
  !"5& ( where ! ).
    6
Undersampling
                                               The sampling theorem assumes that a
                                               signal is limited in frequency but in
                                               practice the signal is time limited. For
                1 alias spectrum
                                                 ."5& then signal the signal is
                                                   6
  Spectrum
                                               undersampled. Overlying tails appear
                                               in spectrum, spectrum alias.



 Adapted from Bracewell (1986)




Aliasing :
Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)):

   The undersampled FT is evener than the complete FT as consequence the
sampling procedure discriminates the zero components at &!&

      There is a leakage of the high frequencies (aliasing)
Discrete Fourier transform

Discrete Fourier transform


         2    &    =                       π&           =   δ        =
                           =


Discrete form of Parseval’s theorem:


                                 =         2 &
                       =               =




 Fast Fourier Transform (FFT):

 The FFT is a discrete Fourier transform algorithm which reduces the number of
 computation of N pints from 5 5 to 5    3 . This is done by means of Danielson-
 Lanczos lemma, which basic idea id to break a transform of length to 5
 transforms of length 6 5.
Power spectrum estimation

Periodogram:
           & =       2 &      =                         π&
                                      =



           & =                            π&        +                    π&
                         =                               =



If contains periodic signal i.e.:
                             =3       +7                  Random noise

                     3       =            π&    +ϕ

Then at & =&/ there is a large contribution in the sum , for other values the terms in
the sum will be randomly negative and positive, yielding to small contribution. Thus
a peak in the periodogram reveals the existence of a periodic embedded signal.
Frequencies leakage:
   Leakage from nearby frequencies, which is described usually as a spectral
window and is a primarily product of the finite length of data.

    Leakage from high frequencies, due to data sampling, the aforementioned
aliasing.

Tapering functions: Sometimes also called as data windowing. These functions
try to smooth the leakage between frequencies bringing the interference slowly
back to zero. The main goal is to narrow the peak and vanish the side lobes.
Smoothing can represents in certain cases loss of information.




                       Press et al.(1992)                        Press et al.(1992)
Futher complications

          Closely spaced frequencies:
          Direct contribution for the first
          aforementioned leakage.
            2(         +         )    = 2 & +2 &
          = 2 &       + 2 &       +    2 & 2 &




          Damping:

                 =(           π&      −ϕ      )   −η

          The peak in power spectrum will
          have a Lorentzian profile
Power spectrum of random noise
        =3     +7
7       →,              ,        ,        +,,
3       → +         ,3

The estimation of spectral density:

ρ & =        γ 7            π&
         =

γ 7      → (        +                7

Thus :
                                         No matter how much one increase the number of
    7   & =σ 7                           points, N, the signal-to-noise will tend to be
                                         constant.


For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeed
it’s only valid for homogeneous white noise (independent and identically distributed
normal random variables)
Filling gaps

The unevenly spaced data problem can be solve by (few suggestions):

    Finding a way to reduce the unevenly spaced sample into a evenly spaced.

    Basic idea: Interpolation of the missing points (problem: Doesn’t work for
long gaps)

    Using the Lomb-Scargle periodogram


    Doing a deconvolution analysis (Filters)
Lomb-Scargle peridogram

                                      &       − τ                      &     − τ
                  =                                      =
   &   =                                            +
                                  &       − τ                      &       − τ
                          =                                  =



                              &
τ =           −       =

       &
                              &
                      =




    It’s like weighting the data on a “per point” basis instead on a “per time
interval” basis, which make independent on sampling irregularity.

    It has an exponential probability distribution with unit mean, which means
one can establish a false-alarm probability of the null hypothesis (significance
level).

                  9> 9 = −                −         −9   8
                                                             ≈ 8           −9
Deconvolution analysis
Deconvolution

                     2 & ⊗ % &             = 3 & + ε &


                                               signal     noise




    Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable to
incomplete sampling (irregular sampling) of spatial frequency.

   Non-linear algorithm: CLEAN, All poles.

   Problem : The deconvolution usually does not a unique solutions.
Hogbom CLEAN algorithm
The first CLEAN method was developed by Hogbom (1974). It constructs
discrete approximations of the clean map from the convolution equation:

                                      ⊗       =
Starting with /=/ , it searches for the largest value in the residual map:


                            =     −       ⊗       −


After locating the largest residual of given amplitude, it subtracts it from to to
yield to . The iteration continues until root-mean-square (RMS) decreases to
some level. Each subtracted location is saved in so-called CLEAN map. The
resulting final map denoted by       it is assumed that is mainly in noise.
CLEAN algorithm
The basic steps of the CLEAN algorithm used in asteroseismology are:

1. Compute the power spectra of the signal and identify the dominant period

2. Perform a least-square fit to the data to obtain the amplitude and phase of
   the identified mode.

3. Constructs the time series corresponding to that single mode and subtracts
   from the original signal to obtain a new signal

4. Repeats all steps until all its left is noise.



Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting
    the frequency it recalculates the amplitude, phase and frequencies of the
    previous subtracted peaks while fixing the frequency of the latest extracted
    peak.
All poles
: =                       π &δ        &      =           2 &          =                         :
                                                                                 =

The discrete FT is a particular case of the Z-transform (unilateral):
                                                   +∞
                           :           =                           :
                                                    =

It turns up that one can have some advantages by doing the following approximation:
                                                     &    ≈
                                                                       8

     Press et al.(1992)
                                                                  +                  :
                                                                          =

                                          The notable fact is that the equation allows to have
                                          poles, corresponding to infinite spectral power density,
                                          on the unit z-circle (at the real frequencies of the Nyquist
                                          interval), and such poles can provide an accurate
                                          representation for underlying power spectra that have
                                          sharp discrete “lines” or delta-functions. M is called the
                                          number of poles. This approximation does under several
                                          names all-poles model, Maximum Entropy method
                                          (MEM), auto regressive model (AR).
Phase dispersion minimization
           PDM
Definitions
A discrete set of observations can be represented by to vectors, the magnitudes
and the observation times ( with !"; ). Thus the variance of is given:


                                         −
                 σ    =     =
                                                      =
                                         −                 =

Suppose that one divides the initial set into several subsets/samples. If M are the
number samples, having , variances, and containing       data points then the over all
variance for all the samples is given by:



                                             −   ,
                                     =
                        %       =
                                             − 8
                                     =
PDM as period search method
Suppose that one want to minimize the
variance of a data set with respect to the
mean light curve.
The phase vector is given:

                   −
 φ          =
Considering as a function of the
phase, the variance of these
samples gives a scatter around the
mean light curve.
Defining :
        %
Θ =
       σ
If P is not the true period % ≈ σ            Θ ≈
If P is true value then Θ will reach a local minimum.
Mathematically, the PDM is a least-square fitting, but rather than fitting a given
curve, is a fitting relatively to mean curve as defined by means of each bin,
simultaneously one obtain the best period.
Wavelets
Wavelets transform
Wavelets are a class of functions used to localize a given function in both space and
scaling. A family of wavelets can be constructed from a function Ψ
                                                                      *
                                                                           sometimes
known as the “mother wavelet” which is confined in a finite interval. The “daughter
wavelets” Ψ       are then formed by translation of (b) and contraction of (a).

An individual wavelet can be written as:
                                  − *
   Ψ    *
               =          Ψ

Then the wavelet transform is given by:
                                           +∞
                                                            − *
       <                   * =                          Ψ
                                           −∞
                          +∞ +∞
                                                             −
                =     Ψ               Ψ         *
                                                    Ψ   *
                                                                     *
                          −∞ −∞
Applications in variable stars




 Szatmáry et al. (1994) - fig. 17: Double mode oscillation.
Conclusion
Short overview
   Data analysis results must never be subjective, it should return the best fitting
parameters, the underlying errors, accuracy of the fitted model. All the provided
statistical information must be clear.

   Because data is necessary in all scientific fields there a bunch methods for
optimization, merit functions, spectral analysis… Therefore, sometimes is not easy to
decided which method is the ideal method. Most of the time it the decision
dependents on the data to be analyzed.

  All that has been considering here, was the case of a deterministic signal (a fixed
amplitude) add to random noise. Sometimes the signal itself is probabilistic

Contenu connexe

Tendances

Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsDerek Kane
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)마이캠퍼스
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and StatisticsMalik Sb
 
Lesson 6: Limits Involving Infinity (slides)
Lesson 6: Limits Involving Infinity (slides)Lesson 6: Limits Involving Infinity (slides)
Lesson 6: Limits Involving Infinity (slides)Matthew Leingang
 
Lesson 14: Exponential Functions
Lesson 14: Exponential FunctionsLesson 14: Exponential Functions
Lesson 14: Exponential FunctionsMatthew Leingang
 
Introduction to Decision Making Theory
Introduction to Decision Making TheoryIntroduction to Decision Making Theory
Introduction to Decision Making TheoryYosuke YASUDA
 
Lesson 5: Continuity (handout)
Lesson 5: Continuity (handout)Lesson 5: Continuity (handout)
Lesson 5: Continuity (handout)Matthew Leingang
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelMehdi Shayegani
 
Relaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete MarketsRelaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete Marketsguasoni
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrvPooja Sakhla
 
Properties of discrete probability distribution
Properties of discrete probability distributionProperties of discrete probability distribution
Properties of discrete probability distributionJACKIE MACALINTAL
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsGilles Louppe
 
Lesson 3: The Limit of a Function (slides)
Lesson 3: The Limit of a Function (slides)Lesson 3: The Limit of a Function (slides)
Lesson 3: The Limit of a Function (slides)Matthew Leingang
 
Continuous Random variable
Continuous Random variableContinuous Random variable
Continuous Random variableJay Patel
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3MuhannadSaleh
 
Estimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumEstimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumStijn De Vuyst
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesGilles Louppe
 
Statistics Applied to Biomedical Sciences
Statistics Applied to Biomedical SciencesStatistics Applied to Biomedical Sciences
Statistics Applied to Biomedical SciencesLuca Massarelli
 

Tendances (20)

Slides ineq-3b
Slides ineq-3bSlides ineq-3b
Slides ineq-3b
 
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic NetsData Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
Data Science - Part XII - Ridge Regression, LASSO, and Elastic Nets
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Probability and Statistics
Probability and StatisticsProbability and Statistics
Probability and Statistics
 
Lesson 6: Limits Involving Infinity (slides)
Lesson 6: Limits Involving Infinity (slides)Lesson 6: Limits Involving Infinity (slides)
Lesson 6: Limits Involving Infinity (slides)
 
Lesson 14: Exponential Functions
Lesson 14: Exponential FunctionsLesson 14: Exponential Functions
Lesson 14: Exponential Functions
 
Introduction to Decision Making Theory
Introduction to Decision Making TheoryIntroduction to Decision Making Theory
Introduction to Decision Making Theory
 
Lesson 5: Continuity (handout)
Lesson 5: Continuity (handout)Lesson 5: Continuity (handout)
Lesson 5: Continuity (handout)
 
Diagnostic methods for Building the regression model
Diagnostic methods for Building the regression modelDiagnostic methods for Building the regression model
Diagnostic methods for Building the regression model
 
Relaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete MarketsRelaxed Utility Maximization in Complete Markets
Relaxed Utility Maximization in Complete Markets
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv
 
Properties of discrete probability distribution
Properties of discrete probability distributionProperties of discrete probability distribution
Properties of discrete probability distribution
 
Bias-variance decomposition in Random Forests
Bias-variance decomposition in Random ForestsBias-variance decomposition in Random Forests
Bias-variance decomposition in Random Forests
 
Lesson 3: The Limit of a Function (slides)
Lesson 3: The Limit of a Function (slides)Lesson 3: The Limit of a Function (slides)
Lesson 3: The Limit of a Function (slides)
 
Continuous Random variable
Continuous Random variableContinuous Random variable
Continuous Random variable
 
2 random variables notes 2p3
2 random variables notes 2p32 random variables notes 2p3
2 random variables notes 2p3
 
Random Variables
Random VariablesRandom Variables
Random Variables
 
Estimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, BelgiumEstimation Theory, PhD Course, Ghent University, Belgium
Estimation Theory, PhD Course, Ghent University, Belgium
 
Understanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized treesUnderstanding variable importances in forests of randomized trees
Understanding variable importances in forests of randomized trees
 
Statistics Applied to Biomedical Sciences
Statistics Applied to Biomedical SciencesStatistics Applied to Biomedical Sciences
Statistics Applied to Biomedical Sciences
 

En vedette

Мобильный YouTube
Мобильный YouTubeМобильный YouTube
Мобильный YouTubeShukhrat Yakubov
 
ประเภทของโครงงานคอมพิวเตอร์
ประเภทของโครงงานคอมพิวเตอร์ประเภทของโครงงานคอมพิวเตอร์
ประเภทของโครงงานคอมพิวเตอร์122 Chen
 
Jovenes emprendedores
Jovenes emprendedoresJovenes emprendedores
Jovenes emprendedoresrockeritho
 
Saturday afternoon at garcelon bog
Saturday afternoon at garcelon bogSaturday afternoon at garcelon bog
Saturday afternoon at garcelon bogMamie Anthoine Ney
 
How to use Autoboss V30 Tool ?
How to use Autoboss V30 Tool ? How to use Autoboss V30 Tool ?
How to use Autoboss V30 Tool ? Amy joe
 
Towards Privacy Aware Pseudonymless Strategy for Avoiding Profile Generation ...
Towards Privacy Aware Pseudonymless Strategy for Avoiding Profile Generation ...Towards Privacy Aware Pseudonymless Strategy for Avoiding Profile Generation ...
Towards Privacy Aware Pseudonymless Strategy for Avoiding Profile Generation ...Innopolis University
 
Where do insects live
Where do insects liveWhere do insects live
Where do insects liveMika Agcaoili
 
American freedom defense_initiative_group_2_final paper
American freedom defense_initiative_group_2_final paperAmerican freedom defense_initiative_group_2_final paper
American freedom defense_initiative_group_2_final paperUna Hrnjak
 
Daughters Without Dads Inc
Daughters Without Dads IncDaughters Without Dads Inc
Daughters Without Dads Incarmstrongdoresa
 
Pakiet polski
Pakiet polskiPakiet polski
Pakiet polskifederacja
 
BigData in Insurance. Breakfast@Google. Insurance (2015 10)
BigData in Insurance. Breakfast@Google. Insurance (2015 10)BigData in Insurance. Breakfast@Google. Insurance (2015 10)
BigData in Insurance. Breakfast@Google. Insurance (2015 10)Shukhrat Yakubov
 
Организация массовых и зрелищных мероприятий.
Организация массовых и зрелищных мероприятий.Организация массовых и зрелищных мероприятий.
Организация массовых и зрелищных мероприятий.deniskazakov3979
 
4CNW discovery session for Business in North West Ireland
4CNW discovery session for Business in North West Ireland4CNW discovery session for Business in North West Ireland
4CNW discovery session for Business in North West IrelandThe Creative State North West
 
Privacy-Aware VANET Security: Putting Data-Centric Misbehavior and Sybil Atta...
Privacy-Aware VANET Security: Putting Data-Centric Misbehavior and Sybil Atta...Privacy-Aware VANET Security: Putting Data-Centric Misbehavior and Sybil Atta...
Privacy-Aware VANET Security: Putting Data-Centric Misbehavior and Sybil Atta...Innopolis University
 
ประเภทของโครงงานคอมพิวเตอร์
ประเภทของโครงงานคอมพิวเตอร์ประเภทของโครงงานคอมพิวเตอร์
ประเภทของโครงงานคอมพิวเตอร์122 Chen
 
THE LAST 15 YEARS ON WALL STREET PART II
THE LAST 15 YEARS ON WALL STREET PART IITHE LAST 15 YEARS ON WALL STREET PART II
THE LAST 15 YEARS ON WALL STREET PART IIBen Esget
 
Successful Investing in a Low Growth Economy: A Historical Perspective
Successful Investing in a Low Growth Economy: A Historical PerspectiveSuccessful Investing in a Low Growth Economy: A Historical Perspective
Successful Investing in a Low Growth Economy: A Historical PerspectiveBen Esget
 
Anacapri brand
Anacapri brandAnacapri brand
Anacapri brandArezzori
 

En vedette (20)

Мобильный YouTube
Мобильный YouTubeМобильный YouTube
Мобильный YouTube
 
ประเภทของโครงงานคอมพิวเตอร์
ประเภทของโครงงานคอมพิวเตอร์ประเภทของโครงงานคอมพิวเตอร์
ประเภทของโครงงานคอมพิวเตอร์
 
Scheduling
SchedulingScheduling
Scheduling
 
Jovenes emprendedores
Jovenes emprendedoresJovenes emprendedores
Jovenes emprendedores
 
Saturday afternoon at garcelon bog
Saturday afternoon at garcelon bogSaturday afternoon at garcelon bog
Saturday afternoon at garcelon bog
 
How to use Autoboss V30 Tool ?
How to use Autoboss V30 Tool ? How to use Autoboss V30 Tool ?
How to use Autoboss V30 Tool ?
 
Towards Privacy Aware Pseudonymless Strategy for Avoiding Profile Generation ...
Towards Privacy Aware Pseudonymless Strategy for Avoiding Profile Generation ...Towards Privacy Aware Pseudonymless Strategy for Avoiding Profile Generation ...
Towards Privacy Aware Pseudonymless Strategy for Avoiding Profile Generation ...
 
Where do insects live
Where do insects liveWhere do insects live
Where do insects live
 
American freedom defense_initiative_group_2_final paper
American freedom defense_initiative_group_2_final paperAmerican freedom defense_initiative_group_2_final paper
American freedom defense_initiative_group_2_final paper
 
Daughters Without Dads Inc
Daughters Without Dads IncDaughters Without Dads Inc
Daughters Without Dads Inc
 
Pakiet polski
Pakiet polskiPakiet polski
Pakiet polski
 
Telemedicina
TelemedicinaTelemedicina
Telemedicina
 
BigData in Insurance. Breakfast@Google. Insurance (2015 10)
BigData in Insurance. Breakfast@Google. Insurance (2015 10)BigData in Insurance. Breakfast@Google. Insurance (2015 10)
BigData in Insurance. Breakfast@Google. Insurance (2015 10)
 
Организация массовых и зрелищных мероприятий.
Организация массовых и зрелищных мероприятий.Организация массовых и зрелищных мероприятий.
Организация массовых и зрелищных мероприятий.
 
4CNW discovery session for Business in North West Ireland
4CNW discovery session for Business in North West Ireland4CNW discovery session for Business in North West Ireland
4CNW discovery session for Business in North West Ireland
 
Privacy-Aware VANET Security: Putting Data-Centric Misbehavior and Sybil Atta...
Privacy-Aware VANET Security: Putting Data-Centric Misbehavior and Sybil Atta...Privacy-Aware VANET Security: Putting Data-Centric Misbehavior and Sybil Atta...
Privacy-Aware VANET Security: Putting Data-Centric Misbehavior and Sybil Atta...
 
ประเภทของโครงงานคอมพิวเตอร์
ประเภทของโครงงานคอมพิวเตอร์ประเภทของโครงงานคอมพิวเตอร์
ประเภทของโครงงานคอมพิวเตอร์
 
THE LAST 15 YEARS ON WALL STREET PART II
THE LAST 15 YEARS ON WALL STREET PART IITHE LAST 15 YEARS ON WALL STREET PART II
THE LAST 15 YEARS ON WALL STREET PART II
 
Successful Investing in a Low Growth Economy: A Historical Perspective
Successful Investing in a Low Growth Economy: A Historical PerspectiveSuccessful Investing in a Low Growth Economy: A Historical Perspective
Successful Investing in a Low Growth Economy: A Historical Perspective
 
Anacapri brand
Anacapri brandAnacapri brand
Anacapri brand
 

Similaire à Dataanalysis2

Maths Symbols
Maths SymbolsMaths Symbols
Maths Symbolsjanobearn
 
Accuracy
AccuracyAccuracy
Accuracyesraz
 
Sampling based approximation of confidence intervals for functions of genetic...
Sampling based approximation of confidence intervals for functions of genetic...Sampling based approximation of confidence intervals for functions of genetic...
Sampling based approximation of confidence intervals for functions of genetic...prettygully
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help HelpWithAssignment.com
 
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Mel Anthony Pepito
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data AnalysisNBER
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxDevendraRavindraPati
 
Seismic data processing (mathematical foundations)
Seismic data processing (mathematical foundations)Seismic data processing (mathematical foundations)
Seismic data processing (mathematical foundations)Amin khalil
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3Mintu246
 
Factor analysis
Factor analysis Factor analysis
Factor analysis Mintu246
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxPatilDevendra5
 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variablesTanmayVijay1
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validationgmorishita
 

Similaire à Dataanalysis2 (20)

2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data 2. diagnostics, collinearity, transformation, and missing data
2. diagnostics, collinearity, transformation, and missing data
 
Maths Symbols
Maths SymbolsMaths Symbols
Maths Symbols
 
Calculus 11 sequences_and_series
Calculus 11 sequences_and_seriesCalculus 11 sequences_and_series
Calculus 11 sequences_and_series
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
JISA_Paper
JISA_PaperJISA_Paper
JISA_Paper
 
Accuracy
AccuracyAccuracy
Accuracy
 
R meetup lm
R meetup lmR meetup lm
R meetup lm
 
Sampling based approximation of confidence intervals for functions of genetic...
Sampling based approximation of confidence intervals for functions of genetic...Sampling based approximation of confidence intervals for functions of genetic...
Sampling based approximation of confidence intervals for functions of genetic...
 
Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help Get Multiple Regression Assignment Help
Get Multiple Regression Assignment Help
 
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
Lesson 17: Indeterminate forms and l'Hôpital's Rule (slides)
 
Big Data Analysis
Big Data AnalysisBig Data Analysis
Big Data Analysis
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
Seismic data processing (mathematical foundations)
Seismic data processing (mathematical foundations)Seismic data processing (mathematical foundations)
Seismic data processing (mathematical foundations)
 
A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3A comparative analysis of predictve data mining techniques3
A comparative analysis of predictve data mining techniques3
 
Factor analysis
Factor analysis Factor analysis
Factor analysis
 
Heteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptxHeteroscedasticity Remedial Measures.pptx
Heteroscedasticity Remedial Measures.pptx
 
project report(1)
project report(1)project report(1)
project report(1)
 
4 linear regeression with multiple variables
4 linear regeression with multiple variables4 linear regeression with multiple variables
4 linear regeression with multiple variables
 
Midterm I Review
Midterm I ReviewMidterm I Review
Midterm I Review
 
Model Selection and Validation
Model Selection and ValidationModel Selection and Validation
Model Selection and Validation
 

Dataanalysis2

  • 1. Data analysis and Its applications on Asteroseismology Olga Moreira April 2005 DEA en Sciences “Astérosismologie” Lectured by Anne Thoul
  • 2. Outline Principles of data analysis Introduction to spectral analysis Introduction Fourier analysis Fourier transform Power spectrum estimation Merit functions an parameters fitting Deconvolution analysis Maximum Likelihood Estimator CLEAN All poles Maximization/Minimization Problem Ordinary methods Phase dispersion Minimization Exotic methods Period search Goodness-of-fit Wavelet analysis Chi-square test Wavelets transform and Its applications K-S test The beauty of synthetic data Monte-Carlo simulations Hare-and-Hounds game
  • 3. Part I Principles of data analysis
  • 5. What do you think of when someone say “data”? Roxbourg & Paternó - Eddington Workshop (Italy)
  • 6. What do you think of when someone say “data”?
  • 7. What do you think of when someone say “data”?
  • 8. What do you think of when someone say “data”?
  • 9. What do you think of when someone say “data”?
  • 10. What all those definitions of data have in common? Incomplete Probability Inferences information theory Data Tools Analysis
  • 11. Analysis Method Merit function Best fit Goodness-of-fit
  • 12. Analysis Method A complete analysis should provide: Parameters; Error estimates on the parameters; A statistical measure of the goodness-of-fit Ignoring the 3rd step will bring drastical consequences
  • 14. Maximum Likelihood Estimators (MLE) λ = λ λ λ : Set of parameters = : Set of random variables = λ : Probability distribution characterized by λ and The posteriori probability of a single measurement is given by: = λ If are a set of independents and identical distributed (i.i.d) then the joint probability function becomes: = ∏ = λ Where λ =∏ λ is defined as the Likelihood = • The best fit of parameters is the one that maximizes the likelihood.
  • 15. It is common to find defined the as the likelihood., but in fact is just the logarithm of the likelihood, which more easy to work with. = = λ or = − = Posteriori probability is the probability after the event under no circuntances should the likelihood be confused with probability density.
  • 16. Error Estimate λ Gaussian shape Non-guassian shape Non-guassian shape with a single with several local λ = λ ± ∆λ extreme: extremes: ∆λ ≈ σ • No problem on the • Problems on the determination of determination of maximum, although it maximum can represent some • Problems on the error difficulties to the error bars estimative. bars estimative.
  • 17. Estimator: Desirable properties Unbiased: Minimum variance : λ = λ −λ = σ λ → Information inequality/Cramer-Rao inequality: + ′λ σ λ ≥ λ = ′ =− λ ′ λ = σ λ ≥ λ • The larger the information the smaller the variance
  • 18. MLE asymptotically unbiased ′ λ = λ + λ − λ λ + Neglecting the largers orders and →∞ λ =− λ − λ −λ λ =− λ −λ + λ ≈ σ λ λ The MLE function has the form of normal distribution with σ λ = and : λ λ = λ ± σ λ
  • 19. In multi-dimensions: = λ= λ λ λ λ = λ = λ = ∂ λ ∂ λ λ = λ + λ −λ − λ −λ λ −λ + = ∂λ = = ∂λ ∂λ λ = λ + λ −λ λ −λ + ∂ λ →∞ = = Hessian matrix ∂λ ∂λ = λ −λ λ −λ Multivariate gaussian distribution λ λ λ λ = − ρ λ λ = σ λ = − σ λ σ λ
  • 20. If λ λ = ↔ ≠ λ ≈ λ ± σ λ = If λ λ ≠ ↔ ≠ Tere is an error region not only defined by σ λ but by the complete covariance matrix . For instance in 2D the error region defines an elipse.
  • 21. Least-square and Chi-square fit 1. Considering one measures with errors that are independently and normal distributed around the true value 2. The standard deviations σ are the same for all points. Then joint probability for !"# is given by $ ## ∝∏ − − ∆ = σ Maximazing is the same as minimazing − = σ The least-square fitting is the MLE of the fitted parameters if the measurements are independent and normally distributed. 3. If the deviations are different σ ! σ then : − =χ = σ
  • 22. Limitations: Real data most of the time violate the i.i.d condition Sometimes one have a limited sample In practice depends on the λ behaviour The MLE grants a unique solution. α =α λ ∂ ∂ ∂α ∂ ∂ = ⋅ = = ∂λ ∂α ∂λ ∂α ∂λ But the uncertity of an estimate depends on the specifique choice of λ λ α α ∂λ λ λ α α λ ∂α λ <λ<λ = λ +∞ = α +∞ ≠ α +∞ ∂λ λ λ α α λ −∞ −∞ ∂α −∞
  • 23. Example: modes stochastically excited ν =%ν χ ν For a single mode: Γ % ν = + Γ ν −ν + ν ν λ = − % ν λ % ν λ ν =− = %ν λ + = %ν Minimization λ= Γν →λ = Γν
  • 25. Going “Downhill” Methods Finding a global extreme is general very Press et al.(1992) difficult. For one dimensional minimization Usually there are two types of methods: • Methods that bracket the minimum: Golden section search, and parabolic interpolation (Brent’s Method) • Methods that use the first derivative Press et al.(1992) information. Multidimensional there are three kind of methods: • Direction-set methods. Powell’s method is the prototype. • Downhill Simplex method. • Methods that use the gradient information. Adapted from Press et al.(1992)
  • 26. Falling in the wrong valley The downhill methods a lack on efficiency/robustness. For instance the simplex method can very fast for some functions and very slow for others. They depend on priori knowledge of the overall structure of vector space, and require repeated manual intervention. If the function to minimize is not well-known, sometimes, numerically speaking, a smooth hill can become an headache. They also don’t solve the famous combinatorial analysis problem : The traveling salesman problem
  • 27. Exotic Methods Solving “The traveling salesman problem”: A salesman has to visit each city on a given list, knowing the distance between all cities will try to minimize the length of his tour. Methods available: Simulated Annealing: based on an analogy with thermodynamics. Genetic algorithms: based on an analogy to evolutionary selection rules. Nearest Neighbor Neural networks :based on the observation of biological neural network (brains). Knowledge-based systems, etc … Adapted from Charbonneau (1995)
  • 29. Chi-square test − : is the number of events observed in χ = the ith bin : is the number expected according to = some known distribution + + - +# , ,# + ! , #$% , ) ( " ( !" #$% & " $ ' " !" #$% & " $ ' * * " H0: The data follow a specified distribution Significance level is determined by - χ & & is the degree of freedom: & ! ' ( + +, ) + )* )+ + , Normally acceptable models have - . /# , but day-in and day-out we find //" accepted models with -!" / '0
  • 30. Kolmogorov-Smirnov (K-S) test Press et al.(1992) % ( ) : Cumulative distribution ( ) : Known cumulative distribution 1 : Maximum absolute difference between the two cumulative functions The significance of an observed value of D is given approximately by: 1 = % − ℘ 1> =- , + + + 1 −∞ > > +∞ + ∞ − - = − − = + , + = - , is a monotonic function with limits values: - , = : Largest agreement - , ∞ = : Smallest agreement
  • 32. Monte-Carlo simulations If one know something about the process that generated our data , given an assumed set of parameters l then one can figure out how to simulate our own sets of “synthetic” realizations of these parameters. The procedure is to draw random numbers from appropriate distribution so as to mimic our best understanding of the underlying processes and measurement errors. Stello et al. (2004) xi-hya
  • 33. Hare-and-Hounds game Team A: generates theoretical mode frequencies and synthetic time series. Team B: analyses the time series, performs the mode identification and fitting, does the structure inversion Rules: The teams only have access to time series. Nothing else is allowed.
  • 34. End of Part I Options available : • Questions • Coffee break • “Get on with it !!!”
  • 35. Part II Introduction to spectral analysis
  • 36. Fourier transform Properties: +∞ 2 = 2( & ) = π& −∞ 2 ( ( )+ 3 ) = 2 & +4 & & 2 =2& 2 ( )= 2 +∞ ( )= 3 +( ⇔ 2 ( )=2& ⋅4 & −∞ 2 ( ( )⊗ 3 ) = 2 & ⋅4 & Parseval’s Theorem: The power of a signal represented by f(t) is the same whether computed in time space or in frequency space: +∞ +∞ = 2 & & −∞ −∞
  • 37. Sampling theorem γ ⋅γ 2& ϒ& 2 & ⊗ϒ & Adapted from Bracewell (1986) For a bandlimited signal, which has no components the frequency & , the sampling theorem states that a real signal can be reconstructed without error from samples taken uniformly at &.5& . The minimum sampling frequency, 2 & !5& is called the Nyquist frequency, corresponding to the sampling interval !"5& ( where ! ). 6
  • 38. Undersampling The sampling theorem assumes that a signal is limited in frequency but in practice the signal is time limited. For 1 alias spectrum ."5& then signal the signal is 6 Spectrum undersampled. Overlying tails appear in spectrum, spectrum alias. Adapted from Bracewell (1986) Aliasing : Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)): The undersampled FT is evener than the complete FT as consequence the sampling procedure discriminates the zero components at &!& There is a leakage of the high frequencies (aliasing)
  • 39. Discrete Fourier transform Discrete Fourier transform 2 & = π& = δ = = Discrete form of Parseval’s theorem: = 2 & = = Fast Fourier Transform (FFT): The FFT is a discrete Fourier transform algorithm which reduces the number of computation of N pints from 5 5 to 5 3 . This is done by means of Danielson- Lanczos lemma, which basic idea id to break a transform of length to 5 transforms of length 6 5.
  • 40. Power spectrum estimation Periodogram: & = 2 & = π& = & = π& + π& = = If contains periodic signal i.e.: =3 +7 Random noise 3 = π& +ϕ Then at & =&/ there is a large contribution in the sum , for other values the terms in the sum will be randomly negative and positive, yielding to small contribution. Thus a peak in the periodogram reveals the existence of a periodic embedded signal.
  • 41. Frequencies leakage: Leakage from nearby frequencies, which is described usually as a spectral window and is a primarily product of the finite length of data. Leakage from high frequencies, due to data sampling, the aforementioned aliasing. Tapering functions: Sometimes also called as data windowing. These functions try to smooth the leakage between frequencies bringing the interference slowly back to zero. The main goal is to narrow the peak and vanish the side lobes. Smoothing can represents in certain cases loss of information. Press et al.(1992) Press et al.(1992)
  • 42. Futher complications Closely spaced frequencies: Direct contribution for the first aforementioned leakage. 2( + ) = 2 & +2 & = 2 & + 2 & + 2 & 2 & Damping: =( π& −ϕ ) −η The peak in power spectrum will have a Lorentzian profile
  • 43. Power spectrum of random noise =3 +7 7 →, , , +,, 3 → + ,3 The estimation of spectral density: ρ & = γ 7 π& = γ 7 → ( + 7 Thus : No matter how much one increase the number of 7 & =σ 7 points, N, the signal-to-noise will tend to be constant. For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeed it’s only valid for homogeneous white noise (independent and identically distributed normal random variables)
  • 44. Filling gaps The unevenly spaced data problem can be solve by (few suggestions): Finding a way to reduce the unevenly spaced sample into a evenly spaced. Basic idea: Interpolation of the missing points (problem: Doesn’t work for long gaps) Using the Lomb-Scargle periodogram Doing a deconvolution analysis (Filters)
  • 45. Lomb-Scargle peridogram & − τ & − τ = = & = + & − τ & − τ = = & τ = − = & & = It’s like weighting the data on a “per point” basis instead on a “per time interval” basis, which make independent on sampling irregularity. It has an exponential probability distribution with unit mean, which means one can establish a false-alarm probability of the null hypothesis (significance level). 9> 9 = − − −9 8 ≈ 8 −9
  • 47. Deconvolution 2 & ⊗ % & = 3 & + ε & signal noise Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable to incomplete sampling (irregular sampling) of spatial frequency. Non-linear algorithm: CLEAN, All poles. Problem : The deconvolution usually does not a unique solutions.
  • 48. Hogbom CLEAN algorithm The first CLEAN method was developed by Hogbom (1974). It constructs discrete approximations of the clean map from the convolution equation: ⊗ = Starting with /=/ , it searches for the largest value in the residual map: = − ⊗ − After locating the largest residual of given amplitude, it subtracts it from to to yield to . The iteration continues until root-mean-square (RMS) decreases to some level. Each subtracted location is saved in so-called CLEAN map. The resulting final map denoted by it is assumed that is mainly in noise.
  • 49. CLEAN algorithm The basic steps of the CLEAN algorithm used in asteroseismology are: 1. Compute the power spectra of the signal and identify the dominant period 2. Perform a least-square fit to the data to obtain the amplitude and phase of the identified mode. 3. Constructs the time series corresponding to that single mode and subtracts from the original signal to obtain a new signal 4. Repeats all steps until all its left is noise. Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting the frequency it recalculates the amplitude, phase and frequencies of the previous subtracted peaks while fixing the frequency of the latest extracted peak.
  • 50. All poles : = π &δ & = 2 & = : = The discrete FT is a particular case of the Z-transform (unilateral): +∞ : = : = It turns up that one can have some advantages by doing the following approximation: & ≈ 8 Press et al.(1992) + : = The notable fact is that the equation allows to have poles, corresponding to infinite spectral power density, on the unit z-circle (at the real frequencies of the Nyquist interval), and such poles can provide an accurate representation for underlying power spectra that have sharp discrete “lines” or delta-functions. M is called the number of poles. This approximation does under several names all-poles model, Maximum Entropy method (MEM), auto regressive model (AR).
  • 52. Definitions A discrete set of observations can be represented by to vectors, the magnitudes and the observation times ( with !"; ). Thus the variance of is given: − σ = = = − = Suppose that one divides the initial set into several subsets/samples. If M are the number samples, having , variances, and containing data points then the over all variance for all the samples is given by: − , = % = − 8 =
  • 53. PDM as period search method Suppose that one want to minimize the variance of a data set with respect to the mean light curve. The phase vector is given: − φ = Considering as a function of the phase, the variance of these samples gives a scatter around the mean light curve. Defining : % Θ = σ If P is not the true period % ≈ σ Θ ≈ If P is true value then Θ will reach a local minimum. Mathematically, the PDM is a least-square fitting, but rather than fitting a given curve, is a fitting relatively to mean curve as defined by means of each bin, simultaneously one obtain the best period.
  • 55. Wavelets transform Wavelets are a class of functions used to localize a given function in both space and scaling. A family of wavelets can be constructed from a function Ψ * sometimes known as the “mother wavelet” which is confined in a finite interval. The “daughter wavelets” Ψ are then formed by translation of (b) and contraction of (a). An individual wavelet can be written as: − * Ψ * = Ψ Then the wavelet transform is given by: +∞ − * < * = Ψ −∞ +∞ +∞ − = Ψ Ψ * Ψ * * −∞ −∞
  • 56. Applications in variable stars Szatmáry et al. (1994) - fig. 17: Double mode oscillation.
  • 58. Short overview Data analysis results must never be subjective, it should return the best fitting parameters, the underlying errors, accuracy of the fitted model. All the provided statistical information must be clear. Because data is necessary in all scientific fields there a bunch methods for optimization, merit functions, spectral analysis… Therefore, sometimes is not easy to decided which method is the ideal method. Most of the time it the decision dependents on the data to be analyzed. All that has been considering here, was the case of a deterministic signal (a fixed amplitude) add to random noise. Sometimes the signal itself is probabilistic
  • 59. Data analysis and Its applications on Asteroseismology Olga Moreira April 2005 DEA en Sciences “Astérosismologie” Lectured by Anne Thoul
  • 60. Outline Principles of data analysis Introduction to spectral analysis Introduction Fourier analysis Fourier transform Power spectrum estimation Merit functions an parameters fitting Deconvolution analysis Maximum Likelihood Estimator CLEAN All poles Maximization/Minimization Problem Ordinary methods Phase dispersion Minimization Exotic methods Period search Goodness-of-fit Wavelet analysis Chi-square test Wavelets transform and Its applications K-S test The beauty of synthetic data Monte-Carlo simulations Hare-and-Hounds game
  • 61. Part I Principles of data analysis
  • 63. What do you think of when someone say “data”? Roxbourg & Paternó - Eddington Workshop (Italy)
  • 64. What do you think of when someone say “data”?
  • 65. What do you think of when someone say “data”?
  • 66. What do you think of when someone say “data”?
  • 67. What do you think of when someone say “data”?
  • 68. What all those definitions of data have in common? Incomplete Probability Inferences information theory Data Tools Analysis
  • 69. Analysis Method Merit function Best fit Goodness-of-fit
  • 70. Analysis Method A complete analysis should provide: Parameters; Error estimates on the parameters; A statistical measure of the goodness-of-fit Ignoring the 3rd step will bring drastical consequences
  • 72. Maximum Likelihood Estimators (MLE) λ = λ λ λ : Set of parameters = : Set of random variables = λ : Probability distribution characterized by λ and The posteriori probability of a single measurement is given by: = λ If are a set of independents and identical distributed (i.i.d) then the joint probability function becomes: = ∏ = λ Where λ =∏ λ is defined as the Likelihood = • The best fit of parameters is the one that maximizes the likelihood.
  • 73. It is common to find defined the as the likelihood., but in fact is just the logarithm of the likelihood, which more easy to work with. = = λ or = − = Posteriori probability is the probability after the event under no circuntances should the likelihood be confused with probability density.
  • 74. Error Estimate λ Gaussian shape Non-guassian shape Non-guassian shape with a single with several local λ = λ ± ∆λ extreme: extremes: ∆λ ≈ σ • No problem on the • Problems on the determination of determination of maximum, although it maximum can represent some • Problems on the error difficulties to the error bars estimative. bars estimative.
  • 75. Estimator: Desirable properties Unbiased: Minimum variance : λ = λ −λ = σ λ → Information inequality/Cramer-Rao inequality: + ′λ σ λ ≥ λ = ′ =− λ ′ λ = σ λ ≥ λ • The larger the information the smaller the variance
  • 76. MLE asymptotically unbiased ′ λ = λ + λ − λ λ + Neglecting the largers orders and →∞ λ =− λ − λ −λ λ =− λ −λ + λ ≈ σ λ λ The MLE function has the form of normal distribution with σ λ = and : λ λ = λ ± σ λ
  • 77. In multi-dimensions: = λ= λ λ λ λ = λ = λ = ∂ λ ∂ λ λ = λ + λ −λ − λ −λ λ −λ + = ∂λ = = ∂λ ∂λ λ = λ + λ −λ λ −λ + ∂ λ →∞ = = Hessian matrix ∂λ ∂λ = λ −λ λ −λ Multivariate gaussian distribution λ λ λ λ = − ρ λ λ = σ λ = − σ λ σ λ
  • 78. If λ λ = ↔ ≠ λ ≈ λ ± σ λ = If λ λ ≠ ↔ ≠ Tere is an error region not only defined by σ λ but by the complete covariance matrix . For instance in 2D the error region defines an elipse.
  • 79. Least-square and Chi-square fit 1. Considering one measures with errors that are independently and normal distributed around the true value 2. The standard deviations σ are the same for all points. Then joint probability for !"# is given by $ ## ∝∏ − − ∆ = σ Maximazing is the same as minimazing − = σ The least-square fitting is the MLE of the fitted parameters if the measurements are independent and normally distributed. 3. If the deviations are different σ ! σ then : − =χ = σ
  • 80. Limitations: Real data most of the time violate the i.i.d condition Sometimes one have a limited sample In practice depends on the λ behaviour The MLE grants a unique solution. α =α λ ∂ ∂ ∂α ∂ ∂ = ⋅ = = ∂λ ∂α ∂λ ∂α ∂λ But the uncertity of an estimate depends on the specifique choice of λ λ α α ∂λ λ λ α α λ ∂α λ <λ<λ = λ +∞ = α +∞ ≠ α +∞ ∂λ λ λ α α λ −∞ −∞ ∂α −∞
  • 81. Example: modes stochastically excited ν =%ν χ ν For a single mode: Γ % ν = + Γ ν −ν + ν ν λ = − % ν λ % ν λ ν =− = %ν λ + = %ν Minimization λ= Γν →λ = Γν
  • 83. Going “Downhill” Methods Finding a global extreme is general very Press et al.(1992) difficult. For one dimensional minimization Usually there are two types of methods: • Methods that bracket the minimum: Golden section search, and parabolic interpolation (Brent’s Method) • Methods that use the first derivative Press et al.(1992) information. Multidimensional there are three kind of methods: • Direction-set methods. Powell’s method is the prototype. • Downhill Simplex method. • Methods that use the gradient information. Adapted from Press et al.(1992)
  • 84. Falling in the wrong valley The downhill methods a lack on efficiency/robustness. For instance the simplex method can very fast for some functions and very slow for others. They depend on priori knowledge of the overall structure of vector space, and require repeated manual intervention. If the function to minimize is not well-known, sometimes, numerically speaking, a smooth hill can become an headache. They also don’t solve the famous combinatorial analysis problem : The traveling salesman problem
  • 85. Exotic Methods Solving “The traveling salesman problem”: A salesman has to visit each city on a given list, knowing the distance between all cities will try to minimize the length of his tour. Methods available: Simulated Annealing: based on an analogy with thermodynamics. Genetic algorithms: based on an analogy to evolutionary selection rules. Nearest Neighbor Neural networks :based on the observation of biological neural network (brains). Knowledge-based systems, etc … Adapted from Charbonneau (1995)
  • 87. Chi-square test − : is the number of events observed in χ = the ith bin : is the number expected according to = some known distribution + + - +# , ,# + ! , #$% , ) ( " ( !" #$% & " $ ' " !" #$% & " $ ' * * " H0: The data follow a specified distribution Significance level is determined by - χ & & is the degree of freedom: & ! ' ( + +, ) + )* )+ + , Normally acceptable models have - . /# , but day-in and day-out we find //" accepted models with -!" / '0
  • 88. Kolmogorov-Smirnov (K-S) test Press et al.(1992) % ( ) : Cumulative distribution ( ) : Known cumulative distribution 1 : Maximum absolute difference between the two cumulative functions The significance of an observed value of D is given approximately by: 1 = % − ℘ 1> =- , + + + 1 −∞ > > +∞ + ∞ − - = − − = + , + = - , is a monotonic function with limits values: - , = : Largest agreement - , ∞ = : Smallest agreement
  • 90. Monte-Carlo simulations If one know something about the process that generated our data , given an assumed set of parameters l then one can figure out how to simulate our own sets of “synthetic” realizations of these parameters. The procedure is to draw random numbers from appropriate distribution so as to mimic our best understanding of the underlying processes and measurement errors. Stello et al. (2004) xi-hya
  • 91. Hare-and-Hounds game Team A: generates theoretical mode frequencies and synthetic time series. Team B: analyses the time series, performs the mode identification and fitting, does the structure inversion Rules: The teams only have access to time series. Nothing else is allowed.
  • 92. End of Part I Options available : • Questions • Coffee break • “Get on with it !!!”
  • 93. Part II Introduction to spectral analysis
  • 94. Fourier transform Properties: +∞ 2 = 2( & ) = π& −∞ 2 ( ( )+ 3 ) = 2 & +4 & & 2 =2& 2 ( )= 2 +∞ ( )= 3 +( ⇔ 2 ( )=2& ⋅4 & −∞ 2 ( ( )⊗ 3 ) = 2 & ⋅4 & Parseval’s Theorem: The power of a signal represented by f(t) is the same whether computed in time space or in frequency space: +∞ +∞ = 2 & & −∞ −∞
  • 95. Sampling theorem γ ⋅γ 2& ϒ& 2 & ⊗ϒ & Adapted from Bracewell (1986) For a bandlimited signal, which has no components the frequency & , the sampling theorem states that a real signal can be reconstructed without error from samples taken uniformly at &.5& . The minimum sampling frequency, 2 & !5& is called the Nyquist frequency, corresponding to the sampling interval !"5& ( where ! ). 6
  • 96. Undersampling The sampling theorem assumes that a signal is limited in frequency but in practice the signal is time limited. For 1 alias spectrum ."5& then signal the signal is 6 Spectrum undersampled. Overlying tails appear in spectrum, spectrum alias. Adapted from Bracewell (1986) Aliasing : Examining the terms of undersampled Fourier transform (FT) (Bracewell (1986)): The undersampled FT is evener than the complete FT as consequence the sampling procedure discriminates the zero components at &!& There is a leakage of the high frequencies (aliasing)
  • 97. Discrete Fourier transform Discrete Fourier transform 2 & = π& = δ = = Discrete form of Parseval’s theorem: = 2 & = = Fast Fourier Transform (FFT): The FFT is a discrete Fourier transform algorithm which reduces the number of computation of N pints from 5 5 to 5 3 . This is done by means of Danielson- Lanczos lemma, which basic idea id to break a transform of length to 5 transforms of length 6 5.
  • 98. Power spectrum estimation Periodogram: & = 2 & = π& = & = π& + π& = = If contains periodic signal i.e.: =3 +7 Random noise 3 = π& +ϕ Then at & =&/ there is a large contribution in the sum , for other values the terms in the sum will be randomly negative and positive, yielding to small contribution. Thus a peak in the periodogram reveals the existence of a periodic embedded signal.
  • 99. Frequencies leakage: Leakage from nearby frequencies, which is described usually as a spectral window and is a primarily product of the finite length of data. Leakage from high frequencies, due to data sampling, the aforementioned aliasing. Tapering functions: Sometimes also called as data windowing. These functions try to smooth the leakage between frequencies bringing the interference slowly back to zero. The main goal is to narrow the peak and vanish the side lobes. Smoothing can represents in certain cases loss of information. Press et al.(1992) Press et al.(1992)
  • 100. Futher complications Closely spaced frequencies: Direct contribution for the first aforementioned leakage. 2( + ) = 2 & +2 & = 2 & + 2 & + 2 & 2 & Damping: =( π& −ϕ ) −η The peak in power spectrum will have a Lorentzian profile
  • 101. Power spectrum of random noise =3 +7 7 →, , , +,, 3 → + ,3 The estimation of spectral density: ρ & = γ 7 π& = γ 7 → ( + 7 Thus : No matter how much one increase the number of 7 & =σ 7 points, N, the signal-to-noise will tend to be constant. For unevenly spaced data (missing data) the equation (1) isn’t always valid, indeed it’s only valid for homogeneous white noise (independent and identically distributed normal random variables)
  • 102. Filling gaps The unevenly spaced data problem can be solve by (few suggestions): Finding a way to reduce the unevenly spaced sample into a evenly spaced. Basic idea: Interpolation of the missing points (problem: Doesn’t work for long gaps) Using the Lomb-Scargle periodogram Doing a deconvolution analysis (Filters)
  • 103. Lomb-Scargle peridogram & − τ & − τ = = & = + & − τ & − τ = = & τ = − = & & = It’s like weighting the data on a “per point” basis instead on a “per time interval” basis, which make independent on sampling irregularity. It has an exponential probability distribution with unit mean, which means one can establish a false-alarm probability of the null hypothesis (significance level). 9> 9 = − − −9 8 ≈ 8 −9
  • 105. Deconvolution 2 & ⊗ % & = 3 & + ε & signal noise Linear algorithms: inverse filtering or Wiener filtering. The are inapplicable to incomplete sampling (irregular sampling) of spatial frequency. Non-linear algorithm: CLEAN, All poles. Problem : The deconvolution usually does not a unique solutions.
  • 106. Hogbom CLEAN algorithm The first CLEAN method was developed by Hogbom (1974). It constructs discrete approximations of the clean map from the convolution equation: ⊗ = Starting with /=/ , it searches for the largest value in the residual map: = − ⊗ − After locating the largest residual of given amplitude, it subtracts it from to to yield to . The iteration continues until root-mean-square (RMS) decreases to some level. Each subtracted location is saved in so-called CLEAN map. The resulting final map denoted by it is assumed that is mainly in noise.
  • 107. CLEAN algorithm The basic steps of the CLEAN algorithm used in asteroseismology are: 1. Compute the power spectra of the signal and identify the dominant period 2. Perform a least-square fit to the data to obtain the amplitude and phase of the identified mode. 3. Constructs the time series corresponding to that single mode and subtracts from the original signal to obtain a new signal 4. Repeats all steps until all its left is noise. Stello et al. (2004) proposed a improvement to this algorithm, by after subtracting the frequency it recalculates the amplitude, phase and frequencies of the previous subtracted peaks while fixing the frequency of the latest extracted peak.
  • 108. All poles : = π &δ & = 2 & = : = The discrete FT is a particular case of the Z-transform (unilateral): +∞ : = : = It turns up that one can have some advantages by doing the following approximation: & ≈ 8 Press et al.(1992) + : = The notable fact is that the equation allows to have poles, corresponding to infinite spectral power density, on the unit z-circle (at the real frequencies of the Nyquist interval), and such poles can provide an accurate representation for underlying power spectra that have sharp discrete “lines” or delta-functions. M is called the number of poles. This approximation does under several names all-poles model, Maximum Entropy method (MEM), auto regressive model (AR).
  • 110. Definitions A discrete set of observations can be represented by to vectors, the magnitudes and the observation times ( with !"; ). Thus the variance of is given: − σ = = = − = Suppose that one divides the initial set into several subsets/samples. If M are the number samples, having , variances, and containing data points then the over all variance for all the samples is given by: − , = % = − 8 =
  • 111. PDM as period search method Suppose that one want to minimize the variance of a data set with respect to the mean light curve. The phase vector is given: − φ = Considering as a function of the phase, the variance of these samples gives a scatter around the mean light curve. Defining : % Θ = σ If P is not the true period % ≈ σ Θ ≈ If P is true value then Θ will reach a local minimum. Mathematically, the PDM is a least-square fitting, but rather than fitting a given curve, is a fitting relatively to mean curve as defined by means of each bin, simultaneously one obtain the best period.
  • 113. Wavelets transform Wavelets are a class of functions used to localize a given function in both space and scaling. A family of wavelets can be constructed from a function Ψ * sometimes known as the “mother wavelet” which is confined in a finite interval. The “daughter wavelets” Ψ are then formed by translation of (b) and contraction of (a). An individual wavelet can be written as: − * Ψ * = Ψ Then the wavelet transform is given by: +∞ − * < * = Ψ −∞ +∞ +∞ − = Ψ Ψ * Ψ * * −∞ −∞
  • 114. Applications in variable stars Szatmáry et al. (1994) - fig. 17: Double mode oscillation.
  • 116. Short overview Data analysis results must never be subjective, it should return the best fitting parameters, the underlying errors, accuracy of the fitted model. All the provided statistical information must be clear. Because data is necessary in all scientific fields there a bunch methods for optimization, merit functions, spectral analysis… Therefore, sometimes is not easy to decided which method is the ideal method. Most of the time it the decision dependents on the data to be analyzed. All that has been considering here, was the case of a deterministic signal (a fixed amplitude) add to random noise. Sometimes the signal itself is probabilistic