The hdbscan Clustering Library¶
The hdbscan library is a suite of tools to use unsupervised learning to find clusters, or dense regions, of a dataset. The primary algorithm is HDBSCAN* as proposed by Campello, Moulavi, and Sander. The library provides a high performance implementation of this algorithm, along with tools for analysing the resulting clustering.
User Guide / Tutorial¶
- Basic Usage of HDBSCAN* for Clustering
- Getting More Information About a Clustering
- Parameter Selection for HDBSCAN*
- Outlier Detection
- Predicting clusters for new points
- Soft Clustering for HDBSCAN*
- Combining HDBSCAN* with DBSCAN
- Frequently Asked Questions
- Q: Most of data is classified as noise; why?
- Q: I mostly just get one large cluster; I want smaller clusters.
- Q: HDBSCAN is failing to separate the clusters I think it should.
- Q: I am not getting the claimed performance. Why not?
- Q: I want to predict the cluster of a new unseen point. How do I do this?
- Q: Haversine metric is not clustering my Lat-Lon data correctly.
- Q: I want to cite this software in my journal publication. How do I do that?
Background on Clustering with HDBSCAN¶
- How HDBSCAN Works
- Comparing Python Clustering Algorithms
- Benchmarking Performance and Scaling of Python Clustering Algorithms
- How Soft Clustering for HDBSCAN Works