Time Series Clustering Methods With Applications in Environmental Studies
Time series datasets are frequently encountered in environmental studies lately. Amongst various machine learning techniques, clustering is comparatively effective and user-friendly to discover hidden patterns in the temporal datasets (Liao, 2005).
Two types of multiple time series are considered in this talk:
- 1) measurements are carried out uniformly at a limited number of time points and the number of variates is much larger than the number of observations;
- 2) the number of variates is moderate but each series is very long and often non-uniformly sampled. For the first type of data, a new clustering procedure based on factor modeling is proposed (Zhu, 2013). The essence is to extract the hidden stochastic structures in the original series, using Lam and Yao's (2012) procedure of factor modeling for high-dimensional time series.
This new approach is validated by numerical experiments and illustrated with the hyperspectral datasets. Another shape-based clustering method (Zakaria et al., 2012) is discussed for the second type of data. Basically, this algorithm ignores part of the raw data intensionally and focuses on the local patterns instead. Shape information is refined from the data first and then used for further analysis like clustering. As an example, this method is applied to extract latent features from an environmental time series dataset.
Xiang Zhu is a research aide working in the MCS division. He is a first-year Ph.D. student in the Statistics Department at the University of Chicago. Before that, he obtained his Bachelor's degree in Mathematics in 2012 from Nankai University, China.
His general research interests are:
- 1) how to modify existing recipes and then develop new methods for statistical inference and data analysis when solving practical problems;
- 2) how to introduce theoretical advancements in mathematics and statistics to real applications in other fields.