By Julianne Murphy, Data Science Consultant
This is part of a series of posts on online learning resources for data science and programming.
Cluster analysis finds groupings of observations that are similar to each other, with similarity based on one or more distance measurements. The technique groups together similar objects into discrete clusters in which objects in any individual cluster are more similar to each other than to the objects in other clusters. As an unsupervised machine learning method, these algorithms have no dependent variable. Clustering techniques can provide a useful typology to help graphically or verbally describe segments of your data for purposes of exploratory analysis.
Resources
Clustering: a high-level overview of a variety of clustering methods from the scikit-learn (Python) documentation. Focuses on highlighting the advantages and disadvantages of each method. If you want to learn about different methods for clustering, start here (even if you aren’t working in Python, the explanations are useful).
K-Means Clustering in Python: A Practical Guide, by Kevin Arvai: this guide covers the basics of clustering techniques and how to use scikit-learn to implement k-means clustering in Python. A great place to get started if you are a Python user.
K-Means Clustering, by Julia Silge: this tutorial explains how to implement k-means clustering in R. For some additional practice using k-means in R, see another tutorial by Julia Silge here. A great place to get started if you are an R user.
Cluster Analysis, by Brian S. Everitt, Sabine Landau, Morven Leese, and Daniel Stahl: this book provides a practical text on cluster analysis and its potential applications in a wide range of disciplines (e.g., medicine, psychology, market research, and bioinformatics).
The Elements of Statistical Learning: Data Mining, Inference and Prediction, by Trevor Hastie, Robert Tibshirani, and Jerome Friedman: chapter 14 entitled “Unsupervised Learning” provides a more advanced overview of unsupervised machine learning. The chapter also concludes with a series of exercises to test your understanding of unsupervised learning. Note: you can download pdfs of this book for free.
Need More Help?
If you have a question about clustering your data, request a free consultation with our data science consultants.