Micro-Scholarship, What it is, How can it help me.pdf
IVFC Signal Denoising
1. Introduction
Methods and Results
Summary
Cell Counting on In Vivo Flow Cytometry
Time Series data
Chaofeng Wang
March 25, 2011
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
2. Introduction
Methods and Results
Summary
Abstract
In the presentation, I will introduce three methods for IVFC data
analysis.
Line-Separating Method is the conventional and earliest method.
Wavelet-based peak picking is an adaptive method inspired from
audio processing
And statistical thresholding method uses Gaussian Mixture Model
to count cell automatically and consistently.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
3. Introduction
Methods and Results
Summary
In Vivo Flow Cytometry (IVFC)
Excited and detected at a same confocal plane.
Output: Time Series data.
1
1
For IVFC settings, refer to [9].
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
4. Introduction
Methods and Results
Summary
In Vivo Flow Cytometry (IVFC)
Capabilities [9]:
Real-time Cell Counting (v.s. Hemocytometer)
Suitable for cells of high velocity and Low SNR signal(v.s.
Confocal and 2-photon imaging) - 5 ∼ 100 kHz sampling rate.
Monitoring cell kinetics in vivo (without blood extraction)
Most Applications are in Metastasis research [7, 13].
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
5. Introduction
Methods and Results
Summary
Low SNR Reasons Inventory
1. Auto-fluorescence
2. Unspecific Labeling from incomplete cleansing
3. Labeled cells deviating from Confocal Plane
4. Non-uniform Staining
5. Instability of fluorescent dyes in long-time assaying
6. Labeled cells may aggregate. 2 out of 119 images of labeled
cells are potentially clustered cells [8]
7. Instrumental noises and White noises
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
6. Introduction
Methods and Results
Summary
Conventional Gating: Line Separating Method (LSM)
Line Separating Gui V2.0 Thresholds adjustable
Discrete FWHM is calculated, thus discreteness.
MATLAB scripts by chaofeng Wang.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
7. Introduction
Methods and Results
Summary
Discrete FWHM
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
8. Introduction
Methods and Results
Summary
Line Separating Method (LSM)
Gating Strategies
Background assaying - control data
Manual pickup of noise segments from experiment data
Expert adjustment (subjectivity)
Peak Height - Full Width at Half Maximum (FWHM) feature
space, Separating by a straight line (underfitting, Hyperbola,
y = x −1 + a?)
2
2
LSM is proposed on the invention of IVFC by Novak et al [10].
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
9. Introduction
Methods and Results
Summary
Wavelet Based Peak Picking
Two Steps,
1. Wavelet Denoising.
2. Adaptive Peak Picking.
The work is contributed to David Damm, presented on BMEI 2009
conference [4].
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
10. Introduction
Methods and Results
Summary
Wavelet Denoising
Noise Model: recover an unknown function f on [0, 1] from noisy
data
di = f (ti ) + σzi , i = 0, . . . , n − 1
i
where ti = n,
zi is a standard Gaussian White Noise
(zi ∼ N(0, 1), i.i.d), and σ is a noise level.
Denoise Aim: Optimize the Mean Squared Error subject to the
ˆ
condition that f is at least as smooth as f with high probability.
3
3
Reference: [6, 5]
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
11. Introduction
Methods and Results
Summary
Soft thresholding
Apply the soft thresholding nonlinearity coordinatewise to the
empirical wavelet coefficients:
ηt (y ) = sgn(y )(|y | − t)+
where (x)+ = 0 if x < 0; (x)+ = x if x ≥ 0. And t is specially
chosen threshold.
tn = γ1 × σ × 2log (n)/n
γ1 is a constant, which is set to 1 in simpler situations.
For practical situations where σ is unknown, σ = MAD/0.6745 is
ˆ
used.
4
4
Reference: [5]
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
12. Introduction
Methods and Results
Summary
Adaptive Peak Picking
Finite State Automaton
In A1 and P1, accumulated discrete derivative is reset to 0.
A peak is reported whenever stat D2 is reached.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
13. Introduction
Methods and Results
Summary
Adaptive Peak Picking
Threshold baseline is calculated in a rolling window
[t − l/2, t + l/2], on a fixed (even interger) window size l:
B(t) = Medianw + Stdw
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
14. Introduction
Methods and Results
Summary
Wavelet Based Peak Picking
Matlab Wavelet toolbox is used for the research.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
15. Introduction
Methods and Results
Summary
Wavelet Method in comparison to LSM
Table: Comparison of cell counts by wavelet method and LSM
Dataset LSM wavelet Consensus
1-1.dcf 80 162 79
1-2.dcf 71 153 70
2-1.dcf 30 42 13
2-2.dcf 41 59 20
3-1.dcf 175 175 135
3-2.dcf 81 157 77
5-1.dcf 36 67 34
5-6.dcf 59 69 46
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
16. Introduction
Methods and Results
Summary
Statistical Modeling for IVFC data peaks
Disadvantages of LSM:
Subjective, labour-intensive - control is always needed to
perform.
Susceptible to outliers in control.
Control losing thresholding power when long-time assaying
lasting for days.
Experts may give inconsistent thresholds.
We propose a thresholding method to
achieve consisteny and robustness
based on statistical modeling, providing a kind of ground truth
for other fast cell counting methods
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
17. Introduction
Methods and Results
Summary
The histogram of IVFC data
Skewed to the right.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
18. Introduction
Methods and Results
Summary
The histogram of IVFC log(data)
All the values ≤ 0 are discarded.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
19. Introduction
Methods and Results
Summary
Automatic classifiers for Flow cytometry
Pyne et al: robust skew-t distribution mixture models, FLAME [12]
Chan et al: extracted biologically meaningful cell subsets by
defining putative cell subsets as groups of mixture components [2]
In machine learning category, Vector Quantization methods are
used [3, 11].
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
20. Introduction
Methods and Results
Summary
Statistical Thresholding Method (STM)
Assumptions:
Noise peaks are majority and clustered well.
Cell peaks are minority and outliers.
All the peaks can be modeled into 2 or more Gaussian
Mixture Components.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
21. Introduction
Methods and Results
Summary
Gaussian Mixture Model (GMM)
Assume there are K groups in data, in GMM K components
accordingly.
K
p(x) = p(k)p(x|k)
k=1
K
p(x) = πk N (x|µk , Σk )
k=1
where πk is the proportion of component k in whole data.
5
5
Reference: [1]
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
22. Introduction
Methods and Results
Summary
Expectation Maximization for GMM
1. Expectation Step:
πk N (xi |µk , Σk )
γ(i, k) = K
j=1 πj N (xi |µj , Σj )
where γ(i, k) is the prob that xi comes from component k.
2. Likelihood Maximization Step:
N
1
µk = γ(i, k)xi
Nk
i=1
N
1
Σk = γ(i, k)(xi − µk )(xi − µk )T
Nk
i=1
N
where Nk = i=1 γ(i, k), and πk can be estimated as Nk /N.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
23. Introduction
Methods and Results
Summary
Bayesian/Akaike Information Criterion (BIC, AIC)
AIC and BIC are criteria to decide which model is best to avoid
overfitting and underfitting,
AIC = 2k − 2ln(L)
BIC = k × ln(n) − 2ln(L)
where k is the number of parameters, and L is the maximized
likelihood, n is sample size.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
24. Introduction
Methods and Results
Summary
BIC, AIC for k in GMM
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
25. Introduction
Methods and Results
Summary
Thresholding strategy
3-GMM is chosen for IVFC data.
Cell peak component is too small and considered outliers. So the
threshold is set on the noise component with the largest µ.
Set threshold at µ2 + σ2 × a, where µ2 and σ2 is the mean and
standard deviation of the second component. a is called sigma
factor.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
26. Introduction
Methods and Results
Summary
Sigma Factor Picker
The Picker aims to keep False Positive Number as small as
possible.
Sample number N µ + aσ Φ(µ + aσ) FPN for cell peaks
<= 1 a=1 0.841344746069 N.A.
<= 100 a=2 0.977249868052 <= 2.
<= 1000 a=3 0.998650101968 <= 1.
<= 105 a=4 0.999968328758 <= 3.
<= 107 a=5 0.999999713348 <= 3.
<= 109 a=6 0.999999999013 <= 1.
<= 1012 a=7 0.999999999999 <= 1.
Table: Sigma Factor Picker
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
27. Introduction
Methods and Results
Summary
Keep FPN low
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
28. Introduction
Methods and Results
Summary
STM procedures
1. Bring down the baseline to 0 and smooth.
v=v−b
v is the input data and b is the estimated baseline.
2. Shift-lessly filtering. vs = Convolve(v, GKern(lgk ))
GKern(lgk ) is the Gaussian Kernel of length of lgk .
3. Get all the peaks (or say local maxima) of Vs , noted as p.
They are cell peak candidates.
4. Use [0.75 0.95] quantile as bounds to generate initial guess,
and use it to fit 3 gaussian mixture model to p. In descending
order, they are D1 , D2 , D3 .
5. t = D2 .µ + sf ∗ D2 .σ. Sigma factor sf is determined by the
sample number of D2 according to the Sigma Factor Picker
Table.
6. All the peaks in p higher than t are picked as cell peaks.
A Matlab Script for a Graphical User Interface of STM is available.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
29. Introduction
Methods and Results
Summary
Simulated data
100 gaussian-shape peaks (in blue) with height 1˜2, fwhm
5˜9 evenly distributed in 10000 samples.
Additive white gaussian noise with SNR = 1.
Increasing baseline from 0 to 1.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
30. Introduction
Methods and Results
Summary
SNR Presure Tests on Simulated data
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
31. Introduction
Methods and Results
Summary
Cell Peak Proportion Tests on Simulated data
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
32. Introduction
Methods and Results
Summary
STM on Control data
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
33. Introduction
Methods and Results
Summary
STM on Experiment data
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
34. Introduction
Methods and Results
Summary
Real-time test on Experiment data
Used Data Thresholds Cell Counts
[0 100] 0.04727 572
[0 200] 0.04663 590
[0 300] 0.04558 615
[0 400] 0.04552 617
[0 500] 0.04510 626
[0 600] 0.04507 626
[0 700] 0.04522 620
[0 800] 0.04473 635
[0 900] 0.04450 642
whole data 0.04450 642
Table: Real-time test
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
35. Introduction
Methods and Results
Summary
Consistency test on Experiment data
Sum counts on 100 seconds segments, and compare to the result
of integral counting.
Used Data Summed Integral LSM
0-15 m1 652 641 295
15-30 m1 415 395 208
1h m1 229 225 NAN
72h m1 225 221 NAN
0-15 m2 621 614 68
45-60 m2 309 304 55
1h m2 196 200 41
0-15 m3 267 268 N.
30-45 m3 198 197 N.
1h m3 107 106 N.
Table: Consistency test
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
36. Introduction
Methods and Results
Summary
LSM, LSMsd, STM
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
37. Introduction
Methods and Results
Summary
Summary
As for Non-stationary time-series data processing for IVFC,
GMM-based thresholding provides a consistent method for cell
counting. Other statistical models and pattern recognition
methods might also be useful.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
38. Introduction
Methods and Results
Summary
Acknowlegements To
Collaborators for hard work and inspirations:
Jin Guo, IPS
Guangda Liu, IPS
Xiaoying Tan, IPS
Prof. Xunbin Wei, IPS
Visitors for guidance on Signal processing and Statistics:
David Damm, past in Bonn University
Keli Huang, Past in Bonn University
Prof. Axel Mosig, and all members from the group for all kinds of
support.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
39. Introduction
Methods and Results
Summary
Bibliography I
C.M. Bishop and SpringerLink (Online service).
Pattern recognition and machine learning, volume 4.
Springer New York:, 2006.
Cliburn Chan, Feng Feng, Janet Ottinger, David Foster, Mike
West, and Thomas B. Kepler.
Statistical mixture modeling for cell subtype identification in
flow cytometry.
Cytometry, 73A(8):693–701, 2008.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
40. Introduction
Methods and Results
Summary
Bibliography II
ES Costa, ME Arroyo, CE Pedreira, MA Garcia-Marcos,
MD Tabernero, J. Almeida, and A. Orfao.
A new automated flow cytometry data analysis approach for
the diagnostic screening of neoplastic b-cell disorders in
peripheral blood samples with absolute lymphocytosis.
Leukemia, 20(7):1221–1230, 2006.
D. Damm, C. Wang, X. Wei, and A. Mosig.
Cell counting for in vivo flow cytometer signals using
wavelet-based dynamic peak picking.
In Biomedical Engineering and Informatics, 2009. BMEI’09.
2nd International Conference on, pages 1–4. IEEE, 2009.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
41. Introduction
Methods and Results
Summary
Bibliography III
D. L. Donoho.
De-noising by soft-thresholding.
IEEE Trans. Inform. Theory, 41(3):613–627, May 1995.
DAVID L. Donoho and JAIN M. Johnstone.
Ideal spatial adaptation by wavelet shrinkage.
Biometrika, 81(3):425–455, 1994.
Irene Georgakoudi, Nicolas Solban, John Novak, William L.
Rice, Xunbin Wei, Tayyaba Hasan, and Charles P. Lin.
In vivo flow cytometry.
Cancer Research, 64(15):5044–5047, 2004.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
42. Introduction
Methods and Results
Summary
Bibliography IV
Ho Lee, Clemens Alt, Costas M. Pitsillides, Mehron
Puoris’haag, and Charles P. Lin.
In vivo imaging flow cytometer.
Opt. Express, 14(17):7789–7800, Aug 2006.
J. Novak, I. Georgakoudi, X. Wei, A. Prossin, and CP Lin.
In vivo flow cytometer for real-time detection and
quantification of circulating cells.
Optics letters, 29(1):77–79, 2004.
John Novak.
Development of the in vivo flow cytometer.
PhD thesis, Massachusetts Institute of Technology, Boston,
MA, 2004.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
43. Introduction
Methods and Results
Summary
Bibliography V
C.E. Pedreira, E.S. Costa, M.E. Arroyo, J. Almeida, and
A. Orfao.
A multidimensional classification approach for the automated
analysis of flow cytometry data.
Biomedical Engineering, IEEE Transactions on,
55(3):1155–1162, 2008.
Saumyadipta Pyne, Xinli Hu, Kui Wang, Elizabeth Rossin,
Tsung-I Lin, Lisa M. Maier, Clare Baecher-Allan, Geoffrey J.
McLachlan, Pablo Tamayo, David A. Hafler, Philip L.
De Jager, and Jill P. Mesirov.
Automated high-dimensional flow cytometric data analysis.
Proceedings of the National Academy of Sciences,
106(21):8519–8524, May 2009.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data
44. Introduction
Methods and Results
Summary
Bibliography VI
X. Wei, D.A. Sipkins, C.M. Pitsillides, J. Novak,
I. Georgakoudi, and C.P. Lin.
Real-time detection of circulating apoptotic cells by in vivo
flow cytometry.
Molecular imaging: official journal of the Society for Molecular
Imaging, 4(4):415, 2005.
Chaofeng Wang Cell Counting on In Vivo Flow Cytometry Time Series data