Graph queries typically involve retrieving entities with certain properties and connectivity patterns. One popular property is betweenness centrality, which is a quantitative measure of importance used in many applications such as identifying influential users in social networks. Solving graph queries that involve retrieving important entities with userdefined connectivity patterns in large graphs requires efficient com putation of betweenness centrality and efficient graph query engines. The first part of this thesis studies the betweenness centrality problem, while the second part presents a framework for building efficient graph query engines.
Computing betweenness centrality entails computing allpairs shortest paths; thus, exact computation is costly. The performance of existing approximation algorithms is not well understood due to the lack of an established benchmark. Since graphs in many applications are inherently evolving, several incremental algorithms were proposed. However, they cannot scale to large graphs: they either require excessive memory or perform unnecessary computations rendering them prohibitively slow. Existing graph query engines rely on exhaustive indices for accelerating query evaluation. The time and memory required to build these indices can be prohibitively high for large graphs. This thesis attempts to solve the aforementioned limitations in the graph analytics literature as follows.
First, we present a benchmark for evaluating betweenness centrality approximation algorithms. Our benchmark includes groundtruth data for large graphs in addition to a systematic evaluation methodology. This benchmark is the first attempt to standardize evaluating betweenness centrality approximation algorithms and it is
currently being used by several research groups working on approximate between ness in large graphs. Then, we present a linearspace parallel incremental algorithm for updating betweenness centrality in large evolving graphs. Our algorithm uses biconnected components decomposition to localize processing graph updates, and it performs incremental computation even within affected components. Our algorithm is up to an order of magnitude faster than the stateoftheart parallel incremental algorithm. Finally, we present a framework for building low memory footprint graph query engines. Our framework avoids building exhaustive indices and uses highly optimized matrix algebra operations instead. Our framework loads datasets, and evaluates dataintensive queries up to an order of magnitude faster than existing engines.
Date of Award  Feb 28 2019 

Original language  English (US) 

Awarding Institution   Computer, Electrical and Mathematical Science and Engineering


Supervisor  Panos Kalnis (Supervisor) 

 rdf
 SPARQL
 graph query engine
 graph analytics
 centrality
 betweenness