Mining frequent subgraphs is an important operation on graphs. Most existing work assumes
a database of many small graphs, but modern applications, such as social networks,
citation graphs or proteinprotein interaction in bioinformatics, are modeled as a single
large graph. Interesting interactions in such applications may be transitive (e.g., friend of
a friend). Existing methods, however, search for frequent isomorphic (i.e., exact match)
subgraphs and cannot discover many useful patterns.
In this paper we propose GRAMI, a framework that generalizes frequent subgraph mining
in a large single graph. GRAMI discovers frequent patterns. A pattern is a graph where
edges are generalized to distanceconstrained paths. Depending on the definition of the
distance function, many instantiations of the framework are possible. Both directed and
undirected graphs, as well as multiple labels per vertex, are supported. We developed an
efficient implementation of the framework that models the frequency resolution phase as a
constraint satisfaction problem, in order to avoid the costly enumeration of all instances of
each pattern in the graph. We also implemented CGRAMI, a version that supports structural
and semantic constraints; and AGRAMI, an approximate version that supports very large graphs. Our experiments on real data demonstrate that our framework is up to 3 orders of magnitude faster and discovers more interesting patterns than existing approaches.
Date of Award  Jul 24 2011 

Original language  English 

Awarding Institution   Computer, Electrical and Mathematical Science and Engineering


Supervisor  Panos Kalnis (Supervisor) 
