آنالیز حس اسناد فارسی با طراحی حوزه تبدیل بهینه
محورهای موضوعی : مهندسی برق و کامپیوترآصف پورمعصومی 1 , هادی صدوقی یزدی 2 , هادی قائمی 3 , زهرا دلخسته 4
1 - دانشگاه فردوسی مشهد
2 - دانشگاه فردوسی مشهد
3 - دانشگاه فردوسی مشهد
4 - دانشگاه فردوسی مشهد
کلید واژه: آنالیز حس حوزه تبدیل حداکثرکردن انرژی طیفی,
چکیده مقاله :
با توسعه تعاملات مبتنی بر وب نظیر نظرسنجیها، وبلاگهای شخصی و شبکههای اجتماعی، آنالیز حس و یا کاوش عقیده به یکی از حوزههای تحقیقاتی مهم در علوم کامپیوتر تبدیل شده است. تا کنون روشهای زیادی مبتنی بر یادگیری ماشین و همچنین پردازش زبان طبیعی در ارتباط با آنالیز حس ارائه شده است. در این مقاله از توزیع کلمات در مجموعه اسناد جمعآوری شده به عنوان معیاری جدید برای تشخیص حس جمله استفاده شده است. در روش پیشنهادی با طراحی حوزه تبدیل بهینه مناسب روی توزیع کلمات، دو هدف حداکثرکردن انرژی طیفی کلاس 1 در فرکانسهای پایین و حداکثرکردن انرژی طیفی کلاس 2 در فرکانسهای بالا دنبال میشود. با طراحی حوزه تبدیل بهینه، دادهها از حوزه فراوانی به حوزه فوریه نگاشت میشوند. با این تبدیل بهینه، جداسازی الگوهای دوکلاسی از مفاهیم خوشبینی و بدبینی در حوزه تبدیل به راحتی امکانپذیر خواهد بود. برای محققشدن مدل ریاضی، استراتژی استفاده از پروفایل نمونهها روی همه نمونههای سیگنال نماینده کلاس 1 ارائه شده و مسأله حل میشود. طیف این پروفایل دارای مؤلفههای فرکانس پایین میباشد که با فرض تضاد طیفی دوکلاسی 1 و 2، حداکثرکردن انرژی طیفی کلاس 2 نیز ارضا میگردد. این روش به روی متون با زبان فارسی و انگلیسی اجرا شده است.
With development of web-based interactions such as social networks, personal blogs, surveys and user comments, sentiment analysis and opinion mining has become an important research domain in computer science. Up to now, many approaches have been proposed for analysis of sense using machine learning and natural language processing techniques. In this paper, we used the distribution of words in the collection of documents as new criteria for analyzing sentiment. In proposed approach, we model an optimal transform domain over words distribution with two goals: maximizing spectral energy of class at low frequencies and maximizing spectral energy of at high frequencies. Using optimal transform domain, we can map data from frequency domain into Fourier domain and easily distinguish optimism and pessimism patterns. For this purpose, we use samples’ profiles of class which have low-frequency components. Assuming the contrast of the spectrum of two classes and, maximizing the spectral energy of class will be satisfied. We have performed this approach for English and Persian documents.
[1] B. Pang and L. Lee, "Opinion mining and sentiment analysis," Foundations and Trends in Information Retrieval 2(1-2), vol. 2, no. 1-2, pp. 1-135, 07 Jul 2008..
[2] W. Medhat, A. Hassan, and H. Korashy, "Sentiment analysis algorithms and applications: a survey," Ain Shams Engineering J., vol. 5, no. 4, pp. 1093-1113, Dec. 2014.
[3] R. F. Bruce and J. M. Wiebe, "Recognizing subjectivity: a case study in manual tagging," Natural Language Engineering, vol. 5, no. 2, pp. 187-205, Jun. 1999.
[4] K. Dave, S. Lawrence, and D. M. Pennock, "Mining the peanut gallery: opinion extraction and semantic classification of product reviews," in Proc. of 12th Int. Conf. on World Wide Web, WWW'03, pp. 519-528, 2003.
[5] O. Nasraoui, "Book review: web data mining-exploring hyperlinks, contents, and usage data," ACM SIGKDD Explorations Newsletter, vol. 10, no. 2, pp. 23-25, Dec. 2008.
[6] B. Liu, Sentiment Analysis and Subjectivity, Handbook of Natural Language Processing, 2010.
[7] T. Wilson, J. Wiebe, and P. Hoffmann, "Recognizing contextual polarity in phrase-level sentiment analysis," in Proc. of the Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 347-354, 2006.
[8] S. M. Liu and J. H. Chen, "A multi-label classification based approach for sentiment classification," Expert Systems with Applications, vol. 42, no. 3, pp. 1083-1093, 15 Feb 2015.
[9] L. Yung-Ming and L. Tsung-Ying, "Deriving market intelligence from microblogs," Decision Support Systems, vol. 55, no. 1, pp. 206-217, Apr. 2013.
[10] R. Moraes, J. F. Valiati, and W. P. GaviãO Neto, "Document-level sentiment classification: an empirical comparison," Between SVM and ANN," Expert Systems with Applications, vol. 40, no 2, pp.621-633, Feb. 2013.
[11] F. L. Cruz, J. A. Troyano, F. EnríQuez, F. J. Ortega, and C. G. Vallejo, "Long autonomy or long delay? the importance of domain in opinion mining," Expert Systems with Applications, vol. 40, no. 8, pp. 3174-3184, Jun. 2013.
[12] M. Taboada, Lexicon-Based Methods for Sentiment Analysis, Association for Computational Linguistics, 2011.
[13] R. M. Tong, "An operational system for detecting and tracking opinions in on-line discussions," in Working Notes of the ACM SIGIR Workshop on Operational Text Classification, 6 pp., Mar. 2001.
[14] P. Turney and M. Littman, "Measuring praise and criticism: inference of semantic orientation from association," ACM Trans. on Information Systems, vol. 21, no. 4, pp. 315-346, Sep. 2003.
[15] Y. Dang, Y. Zhang, and H. Chen, "A lexicon enhanced method for sentiment classification: an experiment on online product reviews," IEEE Intelligent Systems, vol. 25, no. 4, pp. 46-53, Aug. 2010.
[16] P. Rudy and M. Thelwall, "Sentiment analysis: a combined approach," J. of Informetrics, vol. 3, no. 2, pp. 143-157, Apr. 2009.
[17] S. Dasgupta and V. Ng, "Mine the easy, classify the hard: a semi-supervised approach to automatic sentiment classification," in Proc. of ACL-IJCNLP, vol. 2, pp. 701-709, 2009.
[18] E. Kouloumpis, "Twitter sentiment analysis: the good the bad and the OMG!," in Proc. 5th Int. AAAI Conf. on Weblogs and Social Media, pp. 538-541, Barcelona, Catalonia, Spain, 17-21 Jul. 2011.
[19] C. Tan, et al., "User-level sentiment analysis incorporating social networks," in Proc, of the 17th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, PP. 1397-1405, 2011.
[20] D. Rao and D. Ravichandran, "Semi supervised polarity lexicon induction," in Proc. of the European Chapter of the Association for Computational Linguistics, EACL'09, pp. 675-682, Apr 2009.
[21] O. Tackstrom and R. McDonald, "Semi-supervised latent variable models for sentence-level sentiment analysis," in Proc. of the 49th Annual Meeting of the ACL: Human Language Technologies, HLT'11, vol. 2, pp. 569-57, 2011.
[22] P. Galeas, R. Kretschmer, and B. Freisleben, "Document relevance evaluation via term distribution analysis using Fourier series expansion," in Proc. of the 9th ACM/IEEE-CS Joint Conf. on Digital libraries, pp. 277-284, Mar. 2009.
[23] A. F. Laurence, K. Ramamohanarao, and M. Palaniswami, "Fourier domain scoring: a novel document ranking method," in IEEE Trans. Knowledge and Data Engineering, vol. 16, no. 5, pp. 529-539, May 2004.
[24] S. Steven, "Chapter 8: the discrete Fourier transform," The Scientist and Engineer's Guide to Digital Signal Processing, 2nd Ed., San Diego, CA, USA: California Technical Publishing, 1999.
[25] M. R. Spiegel, Schaum's Outline of Theory and Problems of Fourier Analysis, New York, NY, USA: McGraw Hill, 1974.
[26] S. Mallat, A Wavelet Tour of Signal Processing, New York, NY, USA: Academic Press, 1999.
[27] E. C. Mundim, H. A. Schots, and J. M. Araujo, "WTdecon, a colored deconvolution implemented by wavelet transform," The Leading Edge, vol. 25, no. 4, pp. 398-401, Apr. 2006.
[28] B. Pang and L. Lillian, "Seeing stars: exploiting class relationships for sentiment categorization with respect to rating scales," in Proc. of the 43rd Annual Meeting on Association for Computational Linguistics, ACL'05, pp. 115-124, 2005.
[29] M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proc. of the 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, KDD'04, pp. 168-177, 2004.
[30] H. Cunningham, K. Humphreys, R. Gaizauskas, and Y. Wilks, Developing Language Processing Components with GATE Version 8, University of Sheffield Department of Computer Science, Nov. 2014.
[31] T. Nakagawa, K. Inui, and S. Kurohashi, "Dependency tree-based sentiment classification using CRFs with hidden variables," in Proc. The 2010 Annual Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologie, pp. 786-794, 2010.
[32] R. Socher, B. Huval, C. Manning, and A. Ng, "Semantic compositionality through recursive matrix-vector spaces," in Proc. of the 2012 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP'12, pp. 786-794, 2010.
[33] R. Socher, A. Perelygin, J. Wu, J. Chuang, C. Manning, A. Ng, and C. Potts, "Recursive deep models for semantic compositionality over a sentiment treebank," in Proc. of the 2013 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP'13, pp. 121-135, Aug. 2012.
[34] L. Dong, F. Wei, S. Liu, M. Zhou, and K. Xu, "A Statistical Parsing Framework for Sentiment Classification," Computational Linguistics, vol. 14, no. 2, pp. 293-336, Jun 2014.
[35] Y. Kim, "Convolutional neural networks for sentence classification," in Proc. of the 2014 Joint Conf. on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, EMNLP'14, pp. 135-151, Sep. 2014.