خلاصهسازی ویدئویی با روش ترکیبی گراف شبکهای و خوشهبندی
محورهای موضوعی : مهندسی برق و کامپیوترمهسا رحیمی رسکتی 1 , همایون موتمنی 2 , ابراهیم اکبری 3 , حسین نعمت زاده 4
1 - دانشگاه آزاد اسلامی واحد ساری
2 - دانشگاه آزاد اسلامی واحد ساری
3 - دانشگاه آزاد اسلامی واحد ساری
4 - دانشگاه آزاد اسلامی واحد ساری
کلید واژه: کاوش ویدئویی, خلاصهسازی ویدئویی, خوشهبندی, K-Medoids, شبکه توجه گرافی کانولوشنالی,
چکیده مقاله :
ما در دنیایی زندگی میکنیم که وجود دوربینهای خانگی و قدرت رسانه باعث شده تا با حجم خیرهکنندهای از دادههای ویدئویی سر و کار داشته باشیم. مسلم است روشی که بتوان با کمک آن، این حجم بالای فیلم را با سرعت و بهینه مورد دسترسی و پردازش قرار داد، اهمیت ویژهای پیدا میکند. با کمک خلاصهسازی ویدئویی این مهم حاصل شده و فیلم به یک سری فریم یا کلیپ کوتاه ولی بامعنی خلاصه میگردد. در این پژوهش سعی گردیده در ابتدا داده با کمک الگوریتم K-Medoids خوشهبندی شود؛ سپس در ادامه با کمک شبکه توجه گرافی کانولوشنالی، جداسازی زمانی و گرافی انجام گیرد و در گام بعدی با کمک روش ردکردن اتصال، نویزها و موارد تکراری حذف گردد. سرانجام با ادغام نتایج بهدستآمده از دو گام متفاوت گرافی و زمانی، خلاصهسازی انجام گیرد. نتایج به دو صورت کیفی و کمی و بر روی سه دیتاست SumMe، TVSum و OpenCv مورد بررسی قرار گرفت. در روش کیفی بهطور میانگین 88% نرخ صحت در خلاصهسازی و 31% میزان خطا دست یافته که به نسبت سایر روشها جزء بالاترین نرخ صحت است. در ارزیابی کمی نیز روش پیشنهادی، کارایی بالاتری نسبت به روشهای موجود دارد.
The increase of cameras nowadays, and the power of the media in people's lives lead to a staggering amount of video data. It is certain that a method to process this large volume of videos quickly and optimally becomes especially important. With the help of video summarization, this task is achieved and the film is summarized into a series of short but meaningful frames or clips. This study tried to cluster the data by an algorithm (K-Medoids) and then with the help of a convolutional graph attention network, temporal and graph separation is done, then in the next step with the connection rejection method, noises and duplicates are removed, and finally summarization is done by merging the results obtained from two different graphical and temporal steps. The results were analyzed qualitatively and quantitatively on three datasets SumMe, TVSum, and OpenCv. In the qualitative method, an average of 88% accuracy rate in summarization and 31% error rate was achieved, which is one of the highest accuracy rates compared to other methods. In quantitative evaluation, the proposed method has a higher efficiency than the existing methods.
[1] A. Messina and M. Montagnuolo, "Fuzzy mining of multimedia genre applied to television archives," in Proc. IEEE Int.Conf. on Multimedia and Expo, pp. 117-120, Hannover, Germany, 23 Jun.-26 Apr. 2008.
[2] A. Bora and S. Sharma, "A review on video summarization approcahes: recent advances and directions," in Proc. Int. Conf. on Advances in Computing, Communication Control and Networking, ICACCCN'18, pp. 601-606, Greater Noida, India, 12-13 Oct. 2018.
[3] M. K. Mahesh and K. Pai, "A survey on video summarization techniques," in Proc. Innovations in Power and Advanced Computing Technologies, i-PACT'19, 5 pp., Vellore, India, 22-23 Mar. 2019.
[4] V. K. Vivekraj, D. Sen, and B. Raman, "Video skimming: taxonomy and comprehensive survey," ACM Computing Surveys, vol. 52, no. 5, Article ID: 106, 38 pp., Sept. 2019.
[5] P. Li, Q. Ye, L. Zhang, L. Yuan, X. Xu, and L. Shao, "Exploring global diverse attention via pairwise temporal relation for video summarization," Computer Vision and Pattern Recognition, vol. 111, Article ID: 107677, Mar. 2020.
[6] M. Gygli, H. Grabner, H. Riemenschneider, and L. V. Gool, "Creating summaries from user videos," In: D. Fleet, T. Pajdla, B. Schiele, and T. Tuytelaars, (eds) Computer Vision-ECCV'14, Lecture Notes in Computer Science, vol 8695. Springer, pp. 505-520, 2014.
[7] Y. Song, J. Vallmitjana, A. Stent, and A. Jaimes, "TVSum: summarizing web videos using titles," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR'15, pp. 5179-5187, Boston, MA, USA, 7-12 Jun. 2015.
[8] G. Bradski, A. Keahler, and V. Pisarevsky, "Learning-based computer vision with Intel's open source computer vision library," Intel. Technology J., vol. 9, no. 2, pp. 119-130, May 2005.
[9] D. Zhao, J. Xiu, Y. Bai, and Z. Yang, "An improved item-based movie recommendation algorithm," in Proc. 4th Int. Conf. on Cloud Computing and Intelligence Systems, CCI'16, pp. 278-281, Beijing, China, 17-19 Aug. 2016.
[10] A. Dimou, D. Matsiki, A. Axenopoulos, and P. Daras, "A user-centric approach for event-driven summarization of surveillance videos," in Proc. 6th Int. Conf. on Imaging for Crime Prevention and Detection, ICDP'15, 6 pp., London, UK, 15-17 Jul. 2015.
[11] H. Zeng, et al., "EmotionCues: emotion-oriented visual summarization of classroom videos," IEEE Trans. on Visualization and Computer Graphics, vol. 27, no. 7, pp. 3168-3181, Jul. 2021.
[12] P. Li, C. Tang, and X. Xu, "Video summarization with a graph convolutional attention network," Frontiers of Information Technology & Electronic Engineering, vol. 22, no. 6, pp. 902-913, 2021.
[13] S. S. de Almeida, et al., "Speeding up a video summarization approach using GPUs and multicore CPUs," Procedia Computer Science, vol. 29, pp. 159-171, 2014.
[14] K. Zhang, W. L. Chao, F. Sha, and K. Grauman, "Video summarization with long short-term memory," In: B. Leibe, J. Matas, N. Sebe, and M. Welling, (eds) Computer Vision-ECCV'16, Lecture Notes in Computer Scienc, vol 9911. Springer, pp. 766-782, 2016.
[15] M. Rochan, L. Ye, and Y. Wang, "Video summarization using fully convolutional sequence networks," In: V. Ferrari, M. Hebert, C. Sminchisescu, and Y. Weiss, (eds) Computer Vision-ECCV'18, Lecture Notes in Computer Science, vol 11216. Springer, pp. 358-374, 2018.
[16] Y. Li, L. Wang, T. Yang, and B. Gong, "How local is the local diversity? reinforcing sequential determinantal point processes with dynamic ground sets for supervised video summarization," In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds) Computer Vision-ECCV'18, Lecture Notes in Computer Science, vol 11216. Springer, pp. 156-174, 2018.
[17] B. Zhao, X. Li, and X. Lu, "Property-constrained dual learning for video summarization," IEEE Trans. on Neural Networks and Learning Systems, vol. 31, no. 10, pp. 3989-4000, Oct. 2020.
[18] B. U. Kota, A. Stone, K. Davila, S. Setlur, and V. Govindaraju, "Automated whiteboard lecture video summarization by content region detection and representation," in Proc. 25th Int. Conf. on Pattern Recognition, ICPR'21, pp. 10704-10711, Milan, Italy, 10-15 Jan. 2021.
[19] G. Liang, Y. Lv, S. Li, S. Zhang, and Y. Zhang, "Video summarization with a convolutional attentive adversarial network," Pattern Recognition, vol. 131, Article ID: 108840, Nov. 2022.
[20] R. Yang, S. Wang, X. Wu, T. Liu, and X. Liu, "Using lightweight convolutional neural network to track vibration displacement in rotating body video," Mechanical Systems and Signal Processing, vol. 177, Article ID: 109137, Sept. 2022.
[21] S. Sikandar, R. Mahmum, and N. Akbar, "Cricket videos summary generation using a novel convolutional neural network," in Mohammad Ali Jinnah University Int. Conf. on Computing, MAJICC'22, 7 pp., Karachi, Pakistan, 27-28 Oct. 2022.
[22] C. Szegedy, W. Liu, Y. Jia, P. Sermanet, et al., "Going deeper with convolutions," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, CVPR'15, 9 pp., Boston, MA, USA, 7-12 Jun. 2015.
[23] A. Rahimi, T. Cohn, and T. Baldwin, "Semi-supervised user geolocation via graph convolutional networks," in Proc of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2009-2019, Melbourne, Australia, Jul. 2018.
[24] A. P. Ta, M. Ben, and G. Gravier, "Improving cluster selection and event modeling in unsupervised mining for automatic audiovisual video structuring," In: K. Schoeffmann, B, Merialdo, A. G, Hauptmann, and C. W. Ngo, Andreopoulos, Y., Breiteneder, C. (eds) Advances in Multimedia Modeling. MMM 2012. Lecture Notes in Computer Science, vol 7131. Springer, Berlin, pp. 529-540, 2012.
[25] Z. Ji, K. Xiong, Y. Pang, and X. Li, "Video summarization with attention-based encoder-decoder networks," IEEE Trans. on Circuits and Systems for Video Technology, vol. 30, no. 6, pp. 1709-1717, Jun. 2019.
[26] X. Li, Q. Li, D. Yin, L. Zhang, and D. Peng, "Unsupervised video summarization based on an encoder-decoder architecture," J. of Physics: 5th Int. Conf. on Advanced Algorithms and Control Engineering, ICAACE'22, vol. 2258, Article ID: 012067, Sanya, China, 20-22 Jan, 2022.
[27] S. E. F. de Avila, et al., "VSUMM: a mechanism designed to produce static video summaries and a novel evaluation method," Pattern Recognition Letters, vol. 32, no. 1, pp. 56-68, Jan. 2011.
[28] M. Furini, F. Geraci, M. Montangero, and M. Pellegrini, "STIMO: STIll and MOving video storyboard for the web scenario," Multimedia Tools and Applications, vol. 46, no. 1, pp. 529-540, Jan. 2009.
[29] P. Mundur, Y. Rao, and Y. Yesha, "Keyframe-based video summarization using delaunay clustering," International J. on Digital Libraries, vol. 6, no. 2, pp. 219-232, 2006.
[30] D. DeMenthon, V. Kobla, and D. Doermann, "Video summarization by curve simplification," in Proc. of the 6th ACM Int. Conf. on Multimedia, pp. 211-218, Bristol, UK, 13-16 Sept. 1998.
[31] B. Mahasseni, M. Lam, and S. Todorovic, "Unsupervised video summarization with adversarial LSTM networks," in Proc. IEEE Conf. on Computer Vision and Pattern Recognition, pp. 2982-2991, Honolulu, HI, USA, 21-26 Jul. 2017.
[32] K. Y. Zhou, Y. Qiao, and T. Xiang, "Deep reinforcement learning for unsupervised video summarization with diversity-representativeness reward," in Proc. AAAI Conf. on Artificial Intelligence, pp. 7582-7589, New Orleans, LA, USA, 2-7 Feb. 2018.
[33] H. W. Wei, et al., "Video summarization via semantic attended networks," in Proc. AAAI Conf. on Artificial Intelligence, pp. 216-223, New Orleans, LA, USA, 2-7 Feb. 2018.
[34] M. Rochan and Y. Wang, "Video summarization by learning from unpaired data," in Proc IEEE/CVF Conf. on Computer Vision and Pattern Recognition, pp. 7894-7903, Long Beach, CA, USA, 15-20 Jun. 2019.
[35] Y. Jung, D. Cho, D. Kim, and I. S. Kweon, "Discriminative feature learning for unsupervised video summarization," in Proc AAAI Conf. on Artificial Intelligence, pp. 8537-8544, Honolulu, HI, USA, 27 Jun.-1 Feb. 2019.