معیاری جدید برای بخشبندی سیستمهای پردازش گراف مبتنی بر بلوک
محورهای موضوعی : مهندسی برق و کامپیوترمسعود ساغریچیان 1 , مرتضی علیپور لنگوری 2
1 - دانشگاه الزهرا
2 - دانشگاه مکمستر
کلید واژه: بخشبندی, مبتنی بر بلوک, گراف, قطر,
چکیده مقاله :
به واسطه قدرت و سادگی، سیستمهای پردازش گراف مبتنی بر بلوک در سالهای اخیر مورد توجه ویژهای قرار گرفتهاند. اغلب این سیستمها از روشهای بخشبندی عمومی و همهمنظوره جهت تولید پارتیشنهای مورد نیاز خود استفاده میکنند. همین امر منجر شده که کارایی این سیستمها محدود شود. برای رفع این مشکل الگوریتمهای خاصمنظورهای برای بخشبندی این دسته از سیستمها ارائه شده است، اما مشکل این دسته از روشها آن است که همچنان معیارهای سنتی نظیر تعداد یال برشی و تعادل بار به عنوان تابع هدف این روشها مد نظر قرار گرفته است. این در حالی است که قدرت سیستمهای پردازش گراف مبتنی بر بلوک به واسطه ویژگیهای منحصر به فردی است که در طراحی این دسته از سیستمها مد نظر قرار گرفته است. به همین جهت در این مقاله، ویژگیهای ذاتی و اساسی این دسته از سیستمها مورد توجه قرار گرفته و با توجه به این خواص، دو معیار جدید به عنوان معیار تابع هدف بخشبندی، معرفی شده است. بر اساس تحقیقات انجامگرفته، روش پیشنهادی اولین الگوریتم بخشبندی است که قطر گراف سطح بالا و اندازه گرههای گراف سطح بالای حاصل از بخشبندی را به عنوان تابع هدف در نظر گرفته میگیرد. ارزیابی روش پیشنهادی بر روی مجموعه دادههای واقعی نشان داد که روش پیشنهادی به طور مؤثری قادر به کاهش قطر گراف سطح بالای حاصل از بخشبندی نسبت به سایر الگوریتمهای بخشبندی متداول میباشد. به علاوه، یال برشی حاصل از روش پیشنهادی بسیار نزدیک به یکی از معروفترین روشهای بخشبندی متمرکز، متیس میباشد. از آنجا که قطر گراف سطح بالا رابطه مستقیمی با تعداد سوپراستپهای مورد نیاز در سیستمهای پردازش گراف بلوکی دارد، روش پیشنهادی با کاهش آن قادر به افزایش کارایی این دسته از روشها خواهد شد.
Block-centric graph processing systems have received significant attention in recent years. To produce the required partitions, most of these systems use general-purpose partitioning methods. As a result, the performance of them has been limited. To face this problem, special partitioning algorithms have been proposed by researchers. However, these methods focused on traditional partitioning measures like the number of cutting-edges and the load-balance. In return, the power of block-centric graph processing systems is due to unique characteristics that are focused on the design of them. According to basic and important characteristics of these systems, in this paper two new measures are proposed as partitioning goals. To the best of our knowledge, the proposed method is the first work that considers the diameter and size of the high-level graph as optimization factors for partitioning purposes. The evaluation of the proposed method over real graphs showed that we could significantly reduce the diameter of the high-level graph. Moreover, the number of cutting-edges of the proposed method are very close to Metis, one of most popular centralized partitioning methods. Since the number of required supersteps in block-centric graph processing systems mainly depends on the diameter of the high-level graph, the proposed method can significantly improve the performance of these systems.
[1] M. Ulizko, E. Antonov, A. Artamonov, et al., "Graph visualization of the characteristics of complex objects on the example of the analysis of politicians," in Proc. 30th Int. Conf. on Computer Graphics and Machine Vision, 9 pp., St. Petersburg, Russia Federation, 22-25 Sept. 2020.
[2] J. S. Yeom, et al., "Overcoming the scalability challenges of epidemic simulations on blue waters," in Proc. IEEE 28th Int. Parallel and Distributed Processing Symp., pp. 755-764, Phoenix, AZ, USA, 19-23 May 2014.
[3] R. Hess, V. H. Visschers, and M. Siegrist, "Risk communication with pictographs: the role of numeracy and graph processing," Judgment and Decision Making, vol. 6, no. 3, pp. 263-274, Apr. 2011.
[4] V. Agarwal, F. Petrini, D. Pasetto, D. A. Bader, "Scalable graph exploration on multicore processors," in Proc. of the ACM/IEEE International Conf. for High Performance Computing, Networking, Storage and Analysis, 11 pp., New Orleans, LA, USA, 13-19 Nov. 2010.
[5] Y. Wang, Y. Shangdi, L. Dhulipala, Y. Gu, and J. Shun, "GeoGraph: a framework for graph processing on geometric data," ACM SIGOPS Operating Systems Review, vol. 55, no. 1, pp. 38-46, 2021.
[6] G. Malewicz, et al., "Pregel: a system for large-scale graph processing," in Proc. of the ACM SIGMOD Int. Conf. on Management of Datapp. 135-146, Indianapolis, IN, USA, 6-10 Jun. 2010.
[7] S. Salihoglu and J. Widom, "GPS: a graph processing system," in Proc. of the 25th Int. Conf. on Scientific and Statistical Database Management, Article No.: 22, 12 pp., Baltimore, MA, USA, 29-31 Jul. 2013.
[8] J. E. Gonzalez, et al., "Graphx: Graph processing in a distributed dataflow framework," in Proc. 11th USENIX Symp. on Operating Systems Design and Implementation, OSDI'14, pp. 599-613, Broomfield, CO, USA, 6-8 Oct. 2014.
[9] S. Hong, et al., "PGX.D: a fast distributed graph processing engine," in Proc. of the Int. Conf. for High Performance Computing, Networking, Storage and Analysis, 12 pp., Austin, TX, USA, 15-20 Nov. 2015.
[10] R. R. McCune, T. Weninger, and G. Madey, "Thinking like a vertex: a survey of vertex-centric frameworks for large-scale distributed graph processing," ACM Computing Surveys, vol. 48, no. 2, Article No.: 25, 39 pp., Nov. 2015.
[11] D. Yan, J. Cheng, Y. Lu, and W. Ng, "Blogel: a block-centric framework for distributed computation on real-world graphs," Proc. of the VLDB Endowment, vol. 7, no. 14, pp. 1981-1992, Oct. 2014.
[12] M. Sagharichian, H. Naderi, and M. Haghjoo, "ExPregel: a new computational model for large‐scale graph processing," Concurrency and Computation: Practice and Experience, vol. 27, no. 17, pp. 4954-4969, Dec. 2015.
[13] S. Aridhi, A. Montresor, and Y. Velegrakis, "BLADYG: a novel block-centric framework for the analysis of large dynamic graphs," in Proc. of the ACM Workshop on High Performance Graph Processing, pp. 39-42, Kyoto, Japan, 31-31 May 2016.
[14] R. Dindokar, N. Choudhury, and Y. Simmhan, "A meta-graph approach to analyze subgraph-centric distributed programming models," in Proc. IEEE Int. Conf. on Big Data, pp. 37-47, Washington, DC, USA, 5-8 Dec. 2016.
[15] A. Quamar and A. Deshpande, "NScaleSpark: subgraph-centric graph analytics on Apache Spark," in Proc. of the 1st ACM SIGMOD Workshop on Network Data Analytics, Article No.: 5, 8 pp., San Francisco, CA, USA, 1-1 Jul. 2016.
[16] M. Sagharichian and H. Naderi, "Intelligent and independent processes for overcoming big graphs," The J. of Supercomputing, vol. 73, no. 4, pp. 1438-1466, Apr. 2017.
[17] X. Sui, D. Nguyen, M. Burtscher, and K. Pingali, "Parallel graph partitioning on multicore architectures," in Proc. Int. Workshop on Languages and Compilers for Parallel Computing, pp. 246-260, Houston, TX, USA, 7-9 Oct. 2010.
[18] D. LaSalle and G. Karypis, "Multi-threaded graph partitioning," IEEE 27th Int. Symp. on Parallel and Distributed Processing, pp. 225-236, Cambridge, MA, USA,20-24 May 2013.
[19] M. Naumov and T. Moon, Parallel Spectral Graph Partitioning, Tech. Rep., NVIDIA Tech. Rep, 2016.
[20] H. Meyerhenke, P. Sanders, and C. Schulz, "Parallel graph partitioning for complex networks," IEEE Trans. on Parallel and Distributed Systems, vol. 28, no. 9, pp. 2625-2638, Sept. 2017.
[21] F. Rahimian, A. H. Payberah, S. Girdzijauskas, M. Jelasity, and S. Haridi, "JA-BE-JA: A distributed algorithm for balanced graph partitioning," in Proc. IEEE 7th Int Conf. on Self-Adaptive and Self-Organizing Systems, pp. 51-60, Philadelphia, PA, USA, 9-13 Sept. 2013.
[22] C. Martella, D. Logothetis, A. Loukas, and G. Siganos, "Spinner: scalable graph partitioning in the cloud," in Proc.IEEE 33rd International Conf. on Data Engineering, ICDE'17, pp. 1083-1094, San Diego, CA, USA, 19-22 Apr. 2017.
[23] M. Onizuka, T. Fujimori, and H. Shiokawa, "Graph partitioning for distributed graph processing," Data Science and Engineering, vol. 2, no. 1, pp. 94-105, Mar. 2017.
[24] J. Sun, H. Vandierendonck, and D. S. Nikolopoulos, "Graphgrind: addressing load imbalance of graph partitioning," in Proc. of the Int. Conf. on Supercomputing, Article No.: 16, 10 pp., Chicago, IL, USA, 14-16 Jun. 2017.
[25] C. J. Alpert and A. B. Kahng, "Recent directions in netlist partitioning: a survey," Integration, vol. 19, no. 1-2, pp. 1-81, 1995.
[26] B. Hendrickson and T. G. Kolda, "Graph partitioning models for parallel computing," Parallel Computing, vol. 26, no. 12, pp. 1519-1534, 2000.
[27] T. N. Bui and C. Jones, "Finding good approximate vertex and edge partitions is NP-hard," Information Processing Letters, vol. 42, no. 3, pp. 153-159, May 1992.
[28] B. W. Kernighan and S. Lin, "An efficient heuristic procedure for partitioning graphs," The Bell System Technical J., vol. 49, no. 2, pp. 291-307, Feb. 1970.
[29] Y. G. Saab, "An effective multilevel algorithm for bisecting graphs and hypergraphs," IEEE Trans. on Computers, vol. 53, no. 6, pp. 641-652, Jun. 2004.
[30] L. Sun and M. Leng, "An effective multi-level algorithm based on simulated annealing for bisecting graph," in Proc. Int. Workshop on Energy Minimization Methods in Computer Vision and Pattern Recognition, 12 pp., Ezhou, China, 27-29 Aug. 2007.
[31] C. Aykanat, B. B. Cambazoglu, and B. Uçar, "Multi-level direct k-way hypergraph partitioning with multiple constraints and fixed vertices," J. of Parallel and Distributed Computing, vol. 68, no. 5, pp. 609-625, May 2008.
[32] R. Khandekar, S. Rao, and U. Vazirani, "Graph partitioning using single commodity flows," J. of the ACM, vol. 56, no. 4, pp. 385-390, 2009.
[33] H. Meyerhenke, B. Monien, and S. Schamberger, "Graph partitioning and disturbed diffusion," Parallel Computing, vol. 35, no. 10-11, pp. 544-569, Oct. 2009.
[34] P. Chardaire, M. Barake, and G. P. McKeown, "A probe-based heuristic for graph partitioning," IEEE Trans. on Computers, vol. 56, no. 12, pp. 1707-1720, Dec. 2007.
[35] R. Zamprogno and A. R. Amaral, "An efficient approach for large scale graph partitioning," J. of Combinatorial Optimization, vol. 13, no. 4, pp. 289-320, May 2007.
[36] A. Trifunovic and W. J. Knottenbelt, "Towards a parallel disk-based algorithm for multilevel k-way hypergraph partitioning," in Proc. IEEE 18th Int. Parallel and Distributed Processing Symp., pp. 236-243, Santa Fe, NM, USA, 26-30 Apr. 2004.
[37] I. Stanton and G. Kliot, "Streaming graph partitioning for large distributed graphs," in Proc. of the 18th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 1222-1230, Beijing, China, ?12-16 Aug. 2012.
[38] C. Tsourakakis, C. Gkantsidis, B. Radunovic, M. Vojnovic, "Fennel: streaming graph partitioning for massive scale graphs," in Proc. of the 7th ACM Int. Conf. on Web Search and Data Mining, pp. 333-342, New York, NY, USA, 24-26 Feb. 2014.
[39] Y. Tian, A. Balmin, S. A. Corsten, S. Tatikonda, and J. McPherson, "From "think like a vertex" to "think like a graph"," Proc. of the VLDB Endowment, vol. 7, no. 3, pp. 193-204, Nov. 2013.
[40] W. Fan, J. Xu, Y. Wu, W. Yu, and J. Jiang, "GRAPE: parallelizing sequential graph computations," Proc. of the VLDB Endowment, vol. 10, no. 12, pp. 1889-1892, Aug. 2017.
[41] S. Verma, L. M. Leslie, Y. Shin, and I. Gupta, "An experimental comparison of partitioning strategies in distributed graph processing," Proc. of the VLDB Endowment, vol. 10, pp. 493-504, Jan. 2017.
[42] Y. Kim, M. Bae, and S. Oh, "Dynamic block reassignment for load balancing of block centric graph processing systems," KIPS Trans. on Software and Data Engineering, vol. 7, no. 5, pp. 177-188, 2018.
[43] X. Wen, S. Zhang, and H. You, DRONE: A Distributed Subgraph-Centric Framework for Processing Large Scale Power-law Graphs, arXiv preprint arXiv:1812.04380, 2018.
[44] M. Sagharichian, M. A. Langouri, and H. Naderi, "A fast method to exactly calculate the diameter of incremental disconnected graphs," World Wide Web, vol. 20, no. 2, pp. 399-416, Mar. 2017.
[45] Stanford University, Stanford Large Network Dataset Collection, Available from: http://snap.stanford.edu/data.