S-Graffito is a Streaming Graph Management System that addresses the processing of OLTP and OLAP queries on high streaming rate, very large graphs. These graphs are increasingly being deployed to capture relationships between entities (e.g., customers and catalog items in an online retail environment) both for transactional processing and for analytics (e.g., recommendation systems)
Existing work on streaming graph systems, by and large,focuses on either:
In a large number of applications, the unbounded nature of streaming graphs and the need for real-time answers on recent data make it impractical to employ snapshot-based techniques. Specialized systems, on the other hand, provide satisfactory performance for the task in hand but they lack the flexibility to support a wide range of real-world scenarios.
The primary focus of this component is the efficient processing of persistent graph queries over large streaming graphs with very high edge arrival rates. We investigate query execution techniques and robust system architectures for an efficient and scalable treaming Graph Management System (SGMS). In particular, we tackle following problems for efficient persistent query evaluation over streaming graphs:
Graph analytics is concerned with estimating properties of the graph or finding patterns within a graph (e.g. finding cliques or densely connected clusters, subgraph matching, and finding frequent patterns/motifs). Running analytics tasks over streaming graphs is particularly challenging because of the unboundedness of the graph (i.e. sequential access to the unbounded structural events in the graph) as well as the potentially bursty and high velocity arrivals. The growing need to process streaming graphs, with their ever-changing nature, has brought about a resurgence of interest in prediction-based analytics over streaming graph (e.g. link prediction, node prediction, event time prediction, and pattern prediction).
The primary focus of this component is creating an analytics engine that ingests streaming records, batches them using sliding window semantics, and performs (several) machine learning-aided analytics tasks on each batch before retiring the corresponding window and ingesting the next batch. To this end, we design efficient algorithms for a generic analytics engine that is based on time-based windows (as the computation methodology) and low dimensional vertex embeddings (as the analytics primitives). In particular, we tackle the following problems for efficient analytics over streaming graphs.
Keynote at 14th International Conference on Distributed and Event-Based Systems, 2020
A. Sheshbolouki and M. T. Özsu, sGrapp: Butterfly Approximation in Streaming Graphs, ACM Transactions on Knowledge Discovery From Data, 2022 (pp. 1-43)
A. Sheshbolouki and M. T. Özsu, sGrow: Explaining the Scale-Invariant Strength Assortativity of Streaming Butterflies, ACM Transactions on the Web, 2022 (Accepted for publications)
A. Pacaci, A. Bonifati and M.T. Özsu, Evaluating Complex Queries on Streaming Graphs, In Proceedings of the 38th IEEE International Conference on Data Engineering, pages 272-285, 2022
A. Pacaci, A. Bonifati and M.T. Özsu, Regular Path Query Evaluation on Streaming Graphs, In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1415-1430, 2020.
A. Pacaci and M.T. Özsu,Experimental Analysis of Streaming Algorithms for Graph Partitining, In Proceedings of the ACM SIGMOD International Conference on Management of Data, pages 1375-1392, 2019.
sGrapp: Butterfly Approximation in Streaming Graphs
Transient Concepts in Streaming Graphs
Evaluating Complex Queries on Streaming Graphs
Angela Bonifati (Collaborator at Lyon 1 University)
Anil Pacaci (Former PhD Student)
Aida Sheshbolouki (PhD Student)
Kerem Akillioglu (MMath Student)