Rob McColl presented work comparing a performance evaluation of a variety of open source graph database technologies today at the 1st Workshop on Parallel Programming for Analytics Applications (PPAA 2014) held in conjunction with the 19th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP 2014). Slides and the full paper are available on the publications page.
With the proliferation of large, irregular, and sparse relational datasets, new storage and analysis platforms have arisen to fill gaps in performance and capability left by conventional approaches built on traditional database technologies and query languages. Many of these platforms apply graph structures and analysis techniques to enable users to ingest, update, query, and compute on the topological structure of the network represented as sets of edges relating sets of vertices. To store and process Facebook-scale datasets, software and algorithms must be able to support data sources with billions of edges, update rates of millions of updates per second, and complex analysis kernels. These platforms must provide intuitive interfaces that enable graph experts and novice programmers to write implementations of common graph algorithms. In this paper, we conduct a qualitative study and a performance comparison of 12 open source graph databases using four fundamental graph algorithms on networks containing up to 256 million edges.