Clustering Time Series

This is a GitHub repository to host the files from the presentation at the Meetup Applied Machine Learning Rhine-Main #1.

We provide you two notebooks:

1. How to do Time Series clustering and how to prepare data for it. 
2. How to calculate features from time series.

We use Google Trends data for this purpose, which is also provided. Running the notebooks requires the following packages to be installed:

pandas, numpy, matplotlib, warnings, IPython, fbprophet, stldecompose, scipy, pyclustering, sklearn, fastdtw, re, statsmodel and nolds.

Clustering

Initially, we clustered data in 3 ways:

1. On Time Series features, using Euclidean Distance
2. On Time Series data, using Euclidean Distance
3. On Time Series data, using DTW Distance

From the inputs received and conversations during the Meetup, we decided to implement PCA instead of feature extraction to reduce our dimension. Thus, we have a fourth kind of clustering now,

4. On PCA-modified Time Series, using Euclidean Distance

The notebook has been updated to show results from it.

Coda

We invite you to run this notebook yourself and study the code to understand how we implemented our ideas. But most importantly, we want you to experiment with the parameters and ideas in this notebook. For example, you could experiment with the number of clusters, number of PCA dimensions, type of PCA kernel, adding more time-series features and so on.

If you have any suggestions to optimize this code, ideas on how to generate better clusters or simply have questions on this topic, feel free to write to us at [email protected].

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
AAPL_train.csv		AAPL_train.csv
README.md		README.md
class_acc.png		class_acc.png
data.csv		data.csv
dimensionality_vs_performance.png		dimensionality_vs_performance.png
dtw.png		dtw.png
feature-extraction.ipynb		feature-extraction.ipynb
features.csv		features.csv
k-Means.gif		k-Means.gif
norm-dist.png		norm-dist.png
sil-coeff.png		sil-coeff.png
time-series-clustering.ipynb		time-series-clustering.ipynb
tsa.csv		tsa.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Clustering Time Series

Clustering

Coda

About

Releases

Packages

Languages

Rocketloop/appliedml-talk-clustering-time-series

Folders and files

Latest commit

History

Repository files navigation

Clustering Time Series

Clustering

Coda

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages