A new density-based clustering algorithm, <italic>RNN-DBSCAN</italic>, is presented which uses reverse nearest neighbor counts as an estimate of observation density. Clustering is performed using a <italic>DBSCAN</italic>-like approach based on <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:href="bryant-ieq1-2787640.gif"/></alternatives></inline-formula> nearest neighbor graph traversals through dense observations. <italic>RNN-DBSCAN</italic> is preferable to the popular density-based clustering algorithm <italic>DBSCAN</italic> in two aspects. First, problem complexity is reduced to the use of a single parameter (choice of <inline-formula><tex-math notation="LaTeX">$k$</tex-math><alternatives> <inline-graphic xlink:href="bryant-ieq2-2787640.gif"/></alternatives></inline-formula> nearest neighbors), and second, an improved ability for handling large variations in cluster density (heterogeneous density). The superiority of <italic>RNN-DBSCAN</italic> is demonstrated on several artificial and real-world datasets with respect to prior work on reverse nearest neighbor based clustering approaches (<italic>RECORD</italic>, <italic>IS-DBSCAN</italic>, and <italic> ISB-DBSCAN</italic>) along with <italic>DBSCAN</italic> and <italic>OPTICS</italic>. Each of these clustering approaches is described by a common graph-based interpretation wherein clusters of dense observations are defined as connected components, along with a discussion on their computational complexity. Heuristics for <italic>RNN-DBSCAN </italic> parameter selection are presented, and the effects of <inline-formula><tex-math notation="LaTeX">$k$ </tex-math><alternatives><inline-graphic xlink:href="bryant-ieq3-2787640.gif"/></alternatives></inline-formula> on <italic>RNN-DBSCAN</italic> clusterings discussed. Additionally, with respect to scalability, an approximate version of <italic>RNN-DBSCAN</italic> is presented leveraging an existing approximate <inline-formula><tex-math notation="LaTeX"> $k$</tex-math><alternatives><inline-graphic xlink:href="bryant-ieq4-2787640.gif"/></alternatives></inline-formula> nearest neighbor technique.
Avory C. Bryant
1 papers
Krzysztof J. Cios
1 papers