MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs

Pinghui Wang, Junzhou Zhao, Xiangliang Zhang, Zhenguo Li, Jiefeng Cheng, John C.S. Lui, Don Towsley, Jing Tao, Xiaohong Guan

Research output: Contribution to journalArticlepeer-review

21 Scopus citations

Abstract

Counting 3-, 4-, and 5-node graphlets in graphs is important for graph mining applications such as discovering abnormal/evolution patterns in social and biology networks. In addition, it is recently widely used for computing similarities between graphs and graph classification applications such as protein function prediction and malware detection. However, it is challenging to compute these metrics for a large graph or a large set of graphs due to the combinatorial nature of the problem. Despite recent efforts in counting triangles (a 3-node graphlet) and 4-node graphlets, little attention has been paid to characterizing 5-node graphlets. In this paper, we develop a computationally efficient sampling method to estimate 5-node graphlet counts. We not only provide fast sampling methods and unbiased estimators of graphlet counts, but also derive simple yet exact formulas for the variances of the estimators which is of great value in practice-the variances can be used to bound the estimates' errors and determine the smallest necessary sampling budget for a desired accuracy. We conduct experiments on a variety of real-world datasets, and the results show that our method is several orders of magnitude faster than the state-of-the-art methods with the same accuracy.
Original languageEnglish (US)
Pages (from-to)73-86
Number of pages14
JournalIEEE Transactions on Knowledge and Data Engineering
Volume30
Issue number1
DOIs
StatePublished - Sep 26 2017

Fingerprint

Dive into the research topics of 'MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs'. Together they form a unique fingerprint.

Cite this