Improving Efficiency of Finding Frequent Subgraphs in Graph Stream Using gMatrix Summarization
Subject Areas :masoud kazemi 1 , Seyed Hossein Khasteh 2 , hamidreza rokhsati 3
1 - K.N. Toosi University of Technology
2 - University student
3 - K.N. Toosi University of Technology
Keywords: Graph Stream, Sketch Based Summarization, gMatrix, Hash Functions, Cosine Similarity,
Abstract :
In many real-world frameworks, dealing with huge domains of nodes and online streaming edges are unavoidable. Transportation systems, IP networks and developed social medias are quintessential examples of such scenarios. One of the most important open problems while dealing with massive graph streams are finding frequent sub-graph. There are some approaches such as count-min for storing the frequent nodes, however performing these methods will result in inaccurate modelling of structures based on the main graph. Having said that, gMatrix is one of the recently developed approaches which can fairly save the important properties of the main graph. In this approach, different hash functions are utilized to store the basis of streams in the main graph. As a result, having the reverse of the hash functions will be extremely useful in calculation of the frequent subgraph. Though gMatrix mainly suffer from two problems. First, they are not really accurate due to high compression rate of the main graph and second, the complexity of returning a query is high. In this thesis, we have presented a new approach based on gMatrix which can reduce the amount of memory usage as well as returning the queries in less amount of time. The main contribution of the introduced approach is to reduce the dependency among the hash functions. This will result in less conflicts while creating the gMatrix later. In this study we have used Cosine Similarity in order to estimate the amount of dependency and similarity among hash functions. Our experimental results prove the higher performance in terms of algorithm and time complexity.
[1] G. Cormode and S. Muthukrishnan, "An Improved Data-Stream Summary: The Count-min Sketch and its Applications", J. of Algorithms, 55(1), 2005.
[2] Khan, Arijit, and Charu Aggarwal. "Query-friendly compression of graph streams", Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. IEEE Press, 2016.
[3] M.R.Henzinger, P.Raghavan, and S.Rajagopalan, "Computing on data streams", External memory algorithms, pages 107–118, 1999.
[4] J.Feigenbaum, S.Kannan, A.McGregor, S.Suri, and J.Zhang, "On graph problems in a semi-streaming model", Theoretical Computer Science, 348(2-3):207–216, 2005.
[5] A. Gilbert, S. Guha, P. Indyk, Y. Kotidis, S. Muthukrishnan, M. Strauss, "Fast, small-space algorithms for approximate histogram maintenance", in: Proceedings of the 34th ACM Symposium on Theory of Computing, pp. 389–398, 2002.
[6] B.Bolloba ́s. "Extremal Graph Theory", AcademicPress, New York, 1978.
[7] S.Baswana, "Streaming algorithm for graph spanners single pass and constant processing time per edge", Inf. Process. Lett, 106(3):110–114, 2008.
[8] M. Elkin. "Streaming and fully dynamic centralized algorithms for constructing and maintaining sparse spanners", ACM Transactions on Algorithms, 7(2):20, 2011.
[9] M. Elkin and J. Zhang, "Efficient algorithms for constructing (1 +ϵ,β)-spanners in the distributed and streaming models", Distributed Computing, 18(5):375–385, 2006.
[10] R.Tarjan, "Data Structures and Network Algorithms", SIAM, Philadelphia, 1983.
[11] A.A.Benczu ́rand D.R.Karger, "Approximating s-t minimum cuts in O(n2) time", In ACM Symposium on Theory of Computing, pages 47–55, 1996.
[12] D.A.Spielmanand S.-H.Teng, "Spectral sparsificationof graphs", SIAM J. Comput., 40(4):981–1025, 2011.
[13] J.D.Batson, D.A.Spielman, and N.Srivastava, "Twice-ramanujan sparsifiers", SIAM J. Comput., 41(6):1704–1721, 2012.
[14] K.J.Ahnand S.Guha, "Graph sparsificationin the semi-streaming model", In International Colloquium on Automata, Languages and Programming, pages 328–338, 2009.
[15] J.A.Kelnerand A.Levin. "Spectral sparsification in the semi-streaming setting", Theory Comput. Syst., 53(2):243–262, 2013.
[16] N. Manerikar and T. Palpanas, "Frequent Items in Streaming Data: An Experimental Evaluation of the State-of-the-art", Data Knowl. Eng., 68(4):415–430, 2009.
[17] P. Roy, A. Khan, and G. Alonso, "Augmented Sketch: Faster and More Accurate Stream Processing", In SIGMOD, 2016.
[18] L.S.Buriol, G.Frahling, S.Leonardi, A. Marchetti-Spaccamela, and C. Sohler, "Counting triangles in data streams", In ACM Symposium on Principles of Database Systems, pages 253–262, 2006.
[19] A.Pavan, K.Tangwongsan, S.Tirthapura, and K.-L.Wu. "Counting and sampling triangles from a graph stream", In International Conference on Very Large Data Bases, 2013.
[20] K. J. Ahn, S. Guha, and A. McGregor, "Graph sketches: sparsification, spanners, and subgraphs", In ACM Symposium on Principles of Database Systems, pages 5–14, 2012.
[21] H.Jowhari, M.Saglam, and G.Tardos, "Tight bounds for lp samplers, finding duplicates in streams, and related problems", In ACM Symposium on Principles of Database Systems, pages 49–58, 2011.
[22] R.Paghand C.E.Tsourakakis, "Colorful triangle counting and a map-reduce implementation", Inf. Process. Lett., 112(7):277–281, 2012.
[23] L.Becchetti, P.Boldi, C.Castillo, and A.Gionis, "Efficient algorithms for large-scale local triangle counting", TKDD, 4(3), 2010.
[24] H.Jowhari and M.Ghodsi, "New streaming algorithms for counting triangles in graphs", In COCOON, pages 710–716, 2005.
[25] D.M.Kane, K.Mehlhorn, T.Sauerwald, and H.Sun, "Counting arbitrary subgraphs in data streams", In International Colloquium on Automata, Languages and Programming, pages 598–609, 2012.
[26] M.Manjunath, K.Mehlhorn, K.Panagiotou, and H.Sun, "Approximate counting of cycles in streams" In European Symposium on Algorithms, pages 677–688, 2011.
[27] Dahlgaard, Søren, Mathias Knudsen, and Mikkel Thorup, "Practical Hash Functions for Similarity Estimation and Dimensionality Reduction", Advances in Neural Information Processing Systems, 2017.