خلاصهسازی ویدئویی با روش ترکیبی گراف شبکهای و خوشهبندی
الموضوعات :مهسا رحیمی رسکتی 1 , همایون موتمنی 2 , ابراهیم اکبری 3 , حسین نعمت زاده 4
1 - دانشگاه آزاد اسلامی واحد ساری
2 - دانشگاه آزاد اسلامی واحد ساری
3 - دانشگاه آزاد اسلامی واحد ساری
4 - دانشگاه آزاد اسلامی واحد ساری
الکلمات المفتاحية: کاوش ویدئویی, خلاصهسازی ویدئویی, خوشهبندی, K-Medoids, شبکه توجه گرافی کانولوشنالی,
ملخص المقالة :
ما در دنیایی زندگی میکنیم که وجود دوربینهای خانگی و قدرت رسانه باعث شده تا با حجم خیرهکنندهای از دادههای ویدئویی سر و کار داشته باشیم. مسلم است روشی که بتوان با کمک آن، این حجم بالای فیلم را با سرعت و بهینه مورد دسترسی و پردازش قرار داد، اهمیت ویژهای پیدا میکند. با کمک خلاصهسازی ویدئویی این مهم حاصل شده و فیلم به یک سری فریم یا کلیپ کوتاه ولی بامعنی خلاصه میگردد. در این پژوهش سعی گردیده در ابتدا داده با کمک الگوریتم K-Medoids خوشهبندی شود؛ سپس در ادامه با کمک شبکه توجه گرافی کانولوشنالی، جداسازی زمانی و گرافی انجام گیرد و در گام بعدی با کمک روش ردکردن اتصال، نویزها و موارد تکراری حذف گردد. سرانجام با ادغام نتایج بهدستآمده از دو گام متفاوت گرافی و زمانی، خلاصهسازی انجام گیرد. نتایج به دو صورت کیفی و کمی و بر روی سه دیتاست SumMe، TVSum و OpenCv مورد بررسی قرار گرفت. در روش کیفی بهطور میانگین 88% نرخ صحت در خلاصهسازی و 31% میزان خطا دست یافته که به نسبت سایر روشها جزء بالاترین نرخ صحت است. در ارزیابی کمی نیز روش پیشنهادی، کارایی بالاتری نسبت به روشهای موجود دارد.
[1] A. Messina and M. Montagnuolo, "Fuzzy mining of multimedia genre applied to television archives," in Proc. IEEE Int.Conf. on Multimedia and Expo, pp. 117-120, Hannover, Germany, 23 Jun.-26 Apr. 2008.
[2] A. Bora and S. Sharma, "A review on video summarization approcahes: recent advances and directions," in Proc. Int. Conf. on Advances in Computing, Communication Control and Networking, ICACCCN'18, pp. 601-606, Greater Noida, India, 12-13 Oct. 2018.
[3] M. K. Mahesh and K. Pai, "A survey on video summarization techniques," in Proc. Innovations in Power and Advanced Computing Technologies, i-PACT'19, 5 pp., Vellore, India, 22-23 Mar. 2019.
[4] V. K. Vivekraj, D. Sen, and B. Raman, "Video skimming: taxonomy and comprehensive survey," ACM Computing Surveys, vol. 52, no. 5, Article ID: 106, 38 pp., Sept. 2019.
[5] P. Li, Q. Ye, L. Zhang, L. Yuan, X. Xu, and L. Shao, "Exploring global diverse attention via pairwise temporal relation for video summarization," Computer Vision and Pattern Recognition, vol. 111, Article ID: 107677, Mar. 2020.
[6] M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool, "Creating summaries from user videos," In: D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, (eds) Computer Vision-ECCV'14, Lecture Notes in Computer Science, vol 8695. Springer, pp. 505-520, 2014.
[7] Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, "TVSum: summarizing web videos using titles," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR'15, pp. 5179-5187, Boston, MA, USA, 7-12 Jun. 2015.
[8] G. Bradski, A. Keahler, and V. Pisarevsky, "Learning-based computer vision with Intel's open source computer vision library," Intel. Technology J., vol. 9, no. 2, pp. 119-130, May 2005.
[9] D. Zhao, J. Xiu, Y. Bai, and Z. Yang, "An improved item-based movie recommendation algorithm," in Proc. 4th Int. Conf. on Cloud Computing and Intelligence Systems, CCI'16, pp. 278-281, Beijing, China, 17-19 Aug. 2016.
[10] A. Dimou, D. Matsiki, A. Axenopoulos, and P. Daras, "A user-centric approach for event-driven summarization of surveillance videos," in Proc. 6th Int. Conf. on Imaging for Crime Prevention and Detection, ICDP'15, 6 pp., London, UK, 15-17 Jul. 2015.
[11] H. Zeng, et al., "EmotionCues: emotion-oriented visual summarization of classroom videos," IEEE Trans. on Visualization and Computer Graphics, vol. 27, no. 7, pp. 3168-3181, Jul. 2021.
[12] P. Li, C. Tang, and X. Xu, "Video summarization with a graph convolutional attention network," Frontiers of Information Technology & Electronic Engineering, vol. 22, no. 6, pp. 902-913, 2021.
[13] S. S. de Almeida, et al., "Speeding up a video summarization approach using GPUs and multicore CPUs," Procedia Computer Science, vol. 29, pp. 159-171, 2014.
[14] K. Zhang, W. L. Chao, F. Sha, and K. Grauman, "Video summarization with long short-term memory," In: B. Leibe, J. Matas, N. Sebe, and M. Welling, (eds) Computer Vision-ECCV'16, Lecture Notes in Computer Scienc, vol 9911. Springer, pp. 766-782, 2016.
[15] M. Rochan, L. Ye, and Y. Wang, "Video summarization using fully convolutional sequence networks," In: V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, (eds) Computer Vision-ECCV'18, Lecture Notes in Computer Science, vol 11216. Springer, pp. 358-374, 2018.
[16] Y. Li, L. Wang, T. Yang, and B. Gong, "How local is the local diversity? reinforcing sequential determinantal point processes with dynamic ground sets for supervised video summarization," In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision-ECCV'18, Lecture Notes in Computer Science, vol 11216. Springer, pp. 156-174, 2018.
[17] B. Zhao, X. Li, and X. Lu, "Property-constrained dual learning for video summarization," IEEE Trans. on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3989-4000, Oct. 2020.
[18] B. U. Kota, A. Stone, K. Davila, S. Setlur, and V. Govindaraju, "Automated whiteboard lecture video summarization by content region detection and representation," in Proc. 25th Int. Conf. on Pattern Recognition, ICPR'21, pp. 10704-10711, Milan, Italy, 10-15 Jan. 2021.
[19] G. Liang, Y. Lv, S. Li, S. Zhang, and Y. Zhang, "Video summarization with a convolutional attentive adversarial network," Pattern Recognition, vol. 131, Article ID: 108840, Nov. 2022.
[20] R. Yang, S. Wang, X. Wu, T. Liu, and X. Liu, "Using lightweight convolutional neural network to track vibration displacement in rotating body video," Mechanical Systems and Signal Processing, vol. 177, Article ID: 109137, Sept. 2022.
[21] S. Sikandar, R. Mahmum, and N. Akbar, "Cricket videos summary generation using a novel convolutional neural network," in Mohammad Ali Jinnah University Int. Conf. on Computing, MAJICC'22, 7 pp., Karachi, Pakistan, 27-28 Oct. 2022.
[22] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, et al., "Going deeper with convolutions," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR'15, 9 pp., Boston, MA, USA, 7-12 Jun. 2015.
[23] A. Rahimi, T. Cohn, and T. Baldwin, "Semi-supervised user geolocation via graph convolutional networks," in Proc of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2009-2019, Melbourne, Australia, Jul. 2018.
[24] A. P. Ta, M. Ben, and G. Gravier, "Improving cluster selection and event modeling in unsupervised mining for automatic audiovisual video structuring," In: K. Schoeffmann, B, Merialdo, A. G, Hauptmann, and C. W. Ngo, Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, pp. 529-540, 2012.
[25] Z. Ji, K. Xiong, Y. Pang, and X. Li, "Video summarization with attention-based encoder-decoder networks," IEEE Trans. on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709-1717, Jun. 2019.
[26] X. Li, Q. Li, D. Yin, L. Zhang, and D. Peng, "Unsupervised video summarization based on an encoder-decoder architecture," J. of Physics: 5th Int. Conf. on Advanced Algorithms and Control Engineering, ICAACE'22, vol. 2258, Article ID: 012067, Sanya, China, 20-22 Jan, 2022.
[27] S. E. F. de Avila, et al., "VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method," Pattern Recognition Letters, vol. 32, no. 1, pp. 56-68, Jan. 2011.
[28] M. Furini, F. Geraci, M. Montangero, and M. Pellegrini, "STIMO: STIll and MOving video storyboard for the web scenario," Multimedia Tools and Applications, vol. 46, no. 1, pp. 529-540, Jan. 2009.
[29] P. Mundur, Y. Rao, and Y. Yesha, "Keyframe-based video summarization using delaunay clustering," International J. on Digital Libraries, vol. 6, no. 2, pp. 219-232, 2006.
[30] D. DeMenthon, V. Kobla, and D. Doermann, "Video summarization by curve simplification," in Proc. of the 6th ACM Int. Conf. on Multimedia, pp. 211-218, Bristol, UK, 13-16 Sept. 1998.
[31] B. Mahasseni, M. Lam, and S. Todorovic, "Unsupervised video summarization with adversarial LSTM networks," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2982-2991, Honolulu, HI, USA, 21-26 Jul. 2017.
[32] K. Y. Zhou, Y. Qiao, and T. Xiang, "Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward," in Proc. AAAI Conf. on Artificial Intelligence, pp. 7582-7589, New Orleans, LA, USA, 2-7 Feb. 2018.
[33] H. W. Wei, et al., "Video summarization via semantic attended networks," in Proc. AAAI Conf. on Artificial Intelligence, pp. 216-223, New Orleans, LA, USA, 2-7 Feb. 2018.
[34] M. Rochan and Y. Wang, "Video summarization by learning from unpaired data," in Proc IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 7894-7903, Long Beach, CA, USA, 15-20 Jun. 2019.
[35] Y. Jung, D. Cho, D. Kim, and I. S. Kweon, "Discriminative feature learning for unsupervised video summarization," in Proc AAAI Conf. on Artificial Intelligence, pp. 8537-8544, Honolulu, HI, USA, 27 Jun.-1 Feb. 2019.