Scaling analysis of affinity propagation

Cyril Furtlehner*, Michèle Sebag, Xiangliang Zhang

*Corresponding author for this work

Research output: Contribution to journalArticlepeer-review

5 Scopus citations

Abstract

We analyze and exploit some scaling properties of the affinity propagation (AP) clustering algorithm proposed by Frey and Dueck [Science 315, 972 (2007)]. Following a divide and conquer strategy we setup an exact renormalization-based approach to address the question of clustering consistency, in particular, how many cluster are present in a given data set. We first observe that the divide and conquer strategy, used on a large data set hierarchically reduces the complexity O (N2) to O (N ( h+2 ) / ( h+1 )), for a data set of size N and a depth h of the hierarchical strategy. For a data set embedded in a d -dimensional space, we show that this is obtained without notably damaging the precision except in dimension d=2. In fact, for d larger than 2 the relative loss in precision scales such as N ( 2-d ) / ( h+1 ) d. Finally, under some conditions we observe that there is a value s- of the penalty coefficient, a free parameter used to fix the number of clusters, which separates a fragmentation phase (for s< s-) from a coalescent one (for s> s-) of the underlying hidden cluster structure. At this precise point holds a self-similarity property which can be exploited by the hierarchical strategy to actually locate its position, as a result of an exact decimation procedure. From this observation, a strategy based on AP can be defined to find out how many clusters are present in a given data set.

Original languageEnglish (US)
Article number066102
JournalPhysical Review E - Statistical, Nonlinear, and Soft Matter Physics
Volume81
Issue number6
DOIs
StatePublished - Jun 1 2010

ASJC Scopus subject areas

  • Statistical and Nonlinear Physics
  • Statistics and Probability
  • Condensed Matter Physics

Fingerprint Dive into the research topics of 'Scaling analysis of affinity propagation'. Together they form a unique fingerprint.

Cite this