Algorithms and Frameworks for Graph Analytics at Scale

  • Fuad Tarek Jamour

Student thesis: Doctoral Thesis

Abstract

Graph queries typically involve retrieving entities with certain properties and connectivity patterns. One popular property is betweenness centrality, which is a quantitative measure of importance used in many applications such as identifying influential users in social networks. Solving graph queries that involve retrieving important entities with user-defined connectivity patterns in large graphs requires efficient com- putation of betweenness centrality and efficient graph query engines. The first part of this thesis studies the betweenness centrality problem, while the second part presents a framework for building efficient graph query engines. Computing betweenness centrality entails computing all-pairs shortest paths; thus, exact computation is costly. The performance of existing approximation algorithms is not well understood due to the lack of an established benchmark. Since graphs in many applications are inherently evolving, several incremental algorithms were proposed. However, they cannot scale to large graphs: they either require excessive memory or perform unnecessary computations rendering them prohibitively slow. Existing graph query engines rely on exhaustive indices for accelerating query evaluation. The time and memory required to build these indices can be prohibitively high for large graphs. This thesis attempts to solve the aforementioned limitations in the graph analytics literature as follows. First, we present a benchmark for evaluating betweenness centrality approximation algorithms. Our benchmark includes ground-truth data for large graphs in addition to a systematic evaluation methodology. This benchmark is the first attempt to standardize evaluating betweenness centrality approximation algorithms and it is currently being used by several research groups working on approximate between- ness in large graphs. Then, we present a linear-space parallel incremental algorithm for updating betweenness centrality in large evolving graphs. Our algorithm uses biconnected components decomposition to localize processing graph updates, and it performs incremental computation even within affected components. Our algorithm is up to an order of magnitude faster than the state-of-the-art parallel incremental algorithm. Finally, we present a framework for building low memory footprint graph query engines. Our framework avoids building exhaustive indices and uses highly optimized matrix algebra operations instead. Our framework loads datasets, and evaluates data-intensive queries up to an order of magnitude faster than existing engines.
Date of AwardFeb 28 2019
Original languageEnglish (US)
Awarding Institution
  • Computer, Electrical and Mathematical Science and Engineering
SupervisorPanos Kalnis (Supervisor)

Keywords

  • rdf
  • SPARQL
  • graph query engine
  • graph analytics
  • centrality
  • betweenness

Cite this

'