This document describes using Change Point Analysis (CPA) to detect subtle changes in disease trends in the BioSense public health surveillance system. It details Taylor's cumulative sum (CUSUM) CPA method, which uses bootstrapping to identify significant changes in mean values of time series data and split the data into segments. An example of applying CUSUM CPA to detect changes in the percentage of clinic visits is provided.
Sonagachi Call Girls Services 9907093804 @24x7 High Class Babes Here Call Now
Change Point Analysis (CPA)
1. BioSense 2.0 Analytics
Change Point Analysis
June 22, 2012
By TahaKass-Hout and ZhihengXu
BioSense is a national public health surveillance system for early detection and rapid assessment of
potential bioterrorism-related illness. It integrates current health data shared by health departments from
a variety of sources to provide insight on the health of communities and the country. Using statistical
aberration detection methods, public health officials are able toidentify and investigate the
anomaliesboth temporal and spatial. In the first iteration of statistical tools for inclusion in BioSense 2.0
redesign, we introduced the Early Aberration Reporting System (EARS) which has been used
extensively in BioSense for disease anomaly detection. As a complimentary tool to EARS, Change Point
Analysis (CPA) has been implemented in BioSenseto address the limitation of EARS in detecting subtle
changes and characterizing disease trends. In this paper, we will describe how to implement Taylor’s
cumulative sum (CUSUM) CPA method.
CUSUM CPA
Taylor [1] developed a change point analysis method through the iterative application of cumulative sum
charts (CUSUM) and bootstrapping methods to detect changes in time-series and their inferences. This
approach is based on the mean-shift model and assumes that residuals are independent and identically
distributed (iid) with a mean of zero. For time-series data Yi with i=1, …, N, the mean-shift model is
written as ,where µ is the sample average as and is the residual term defined
as for the ithobservation. The cumulative sums of residuals are calculated as
2. for i=1, …, N where The change point at location mis detected through searching for the
maximum absolute CUSUM of residuals where . The time-series data is split into
two segments on each side of the change point, and the analysis is repeated for each segment. 1000
Bootstrapping samples are generated to calculate the significance level and 95% confidence interval (CI)
of change points. The following steps summarize how to implement Taylor’s CUSUM CPA to detect
change points:
1. Prepare the initial time series data.
2. Calculate the cumulative sum of residuals .
3. Find the location with the maximum absolute CUSUM of residuals which is defined as the
change point.
4. Calculate the difference between maximum and minimum CUSUM of residuals as
where and
5. Determine whether this change point is significant or not via bootstrapping:
a. Generate a bootstrap sample of N, denoted as through reshuffling
the original N values.
b. Calculate the CUSUM of residuals from the bootstrap sample, denoted as
c. Calculate the maximum, minimum and difference of CUSUM of residuals, denoted as
where
d. Determine whether the difference of CUSUM from the bootstrap sample is less than
the original difference .
3. e. Repeat step a-d 1000 times and record the number of the bootstrap samples which
has denoted as X.
f. The significance level is defined as X/1000.
6. If the significance level ≥95%, it indicates the detected change point is statistically
significant and then we split the dataset into two subsets from this significant change point;
if the significance level <95%, it indicates the detected change point is not statistically
significant and then we stop the splitting.
7. Repeat step 2-6 in each one of two subsets until no more significant change point is
detected.
Data Example
The following data were created to illustrate the detection of change points using CUSUM CPA method.
MMWR
week (i) Percent of visit (Yi) µ εi Si |Si|
1 0.001 0.036 -0.03483 -0.03483 0.034827
2 0.002 0.036 -0.03383 -0.06865 0.068654
3 0.003 0.036 -0.03283 -0.10148 0.101481
4 0.002 0.036 -0.03383 -0.13531 0.135308
5 0.008 0.036 -0.02783 -0.16313 0.163135
6 0.009 0.036 -0.02683 -0.18996 0.189962
7 0.012 0.036 -0.02383 -0.21379 0.213788
8 0.011 0.036 -0.02483 -0.23862 0.238615
9 0.009 0.036 -0.02683 -0.26544 0.265442
10 0.011 0.036 -0.02483 -0.29027 0.290269
11 0.021 0.036 -0.01483 -0.3051 0.305096
12 0.012 0.036 -0.02383 -0.32892 0.328923
13 0.01 0.036 -0.02583 -0.35475 0.35475
14 0.008 0.036 -0.02783 -0.38258 0.382577
15 0.01 0.036 -0.02583 -0.4084 0.408404
16 0.028 0.036 -0.00783 -0.41623 0.416231
17 0.023 0.036 -0.01283 -0.42906 0.429058
18 0.015 0.036 -0.02083 -0.44988 0.449885
5. Percent of visit (Yi)
0.12
0.1
0.08
0.06
0.04
0.02
0
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
The significance level and 95% CI of change points will be calculated from 1000 bootstrapping samples.
After detecting the first significant change point which is highlighted in yellow in the above table, the
time-series data for MMWR week 1-52 will be split into two segments: MMWR week 1-19 and week
20-52. The analysis will be repeated on each of two segments to determine their change points.
Reference
1. Taylor, W. Change-Point Analysis: A Powerful New Tool For Detecting Changes. 2010;
Available from: http://www.variation.com/anonftp/pub/changepoint.pdf.
2. Barker, N. A Practical Introduction to the Bootstrap Using the SAS System. 2010; Available
from: http://www.lexjansen.com/phuse/2005/pk/pk02.pdf.
3. Efron, B.a.T., Robert, An introduction fo the Bootstrap1993, New York: Chapman & Hall.
4. Kass-Hout TA, Xu Z, McMurray P, Park S, Buckeridge DL, Brownstein JS, Finelli L, Groseclose
SL. Application of change point analysis to daily influenza-like-illness (ILI) emergency
department visits. Journal of American Medical Informatics Association (2012), in press.