IJCAI-16, New York, conference presentation of paper http://www.ijcai.org/Proceedings/16/Papers/367.pdf
Researchers have used from 30 days to several
years of daily returns as source data for clustering
financial time series based on their correlations.
This paper sets up a statistical framework to study
the validity of such practices. We first show that
clustering correlated random variables from their
observed values is statistically consistent. Then,
we also give a first empirical answer to the much
debated question: How long should the time series
be? If too short, the clusters found can be spurious;
if too long, dynamics can be smoothed out.