Feature Extraction and Lexicon Expanded in Opinion Mining through Persian Reviews
Subject Areas : electrical and computer engineeringE. Golpar-Rabooki 1 , S. Zarghamifar 2 , S. Zarghamifar 3
1 -
2 -
3 -
Keywords: Opinion mining feature extraction opinion-mining lexicon corpus parts-of-speech tagging, syntactic dependency parsing,
Abstract :
Opinion mining deals with an analysis of user reviews for extracting their opinions, sentiments and demands in a specific area, which plays an important role in making major decisions in such areas. In general, opinion mining extracts user reviews at three levels of document, sentence and feature. Opinion mining at the feature level is taken into consideration more than the other two levels due to orientation analysis of different aspects of an area. In this paper, one method is introduced for a feature extraction. The recommended method consists of four main stages. First, opinion-mining lexicon for Persian is created. This lexicon is used to determine the orientation of users’ reviews. Second, the preprocessing stage includes unification of writing, tokenization, creating parts-of-speech tagging and syntactic dependency parsing for documents. Third, the extraction of features uses the method including dependency grammar based feature extraction. Fourth, the features and polarities of the word reviews extracted in the previous stage are modified and the final features' polarity is determined. To assess the suggested techniques, a set of user reviews in both scopes of university and cell phone areas were collected and the results of the method were compared with frequency-based feature extraction method.
[1] A. Stavrianou and J. H. Chauchat, "Opinion mining issues and agreement identification in forum texts," in Proc. 6th Int.Conf. on Computational Linguistics and Intelligent Text Processing, CICLing'05, 51-58, Feb. 2005.
[2] B. Pang, L. Lee, and S. Vaithyanathan, "Thumbs up? sentiment classification using machine learning techniques," in Proc. Conf. on Empirical Methods in Natural Language Processingm, EMNLP'02, pp. 79-86, Jul. 2002.
[3] M. Sepehri, "Chi-square for features selection in opinion mining in persian text," in Proc. 2nd National Conf. on Computer/Electrical and IT Engineering, CEIC'09, pp. 128-132, Mar. 2009.
[4] C. Nichols, Feature Selection and Weighting for Sentiment Analysis, University of Guelph, 2010.
[5] B. Liu, Sentiment Analysis and Opinion Mining, Morgan and Claypool, 2012.
[6] م. ر. شمس نجفآبادی، اندیشهکاوی و تحلیل نظرات در مستندات فارسی، دانشگاه تهران، 1391.
[7] س. ا. ضرغامیفر، "استخراج ویژگیها در اندیشهکاوی مورد استفاده در متون فارسی،" دومین همایش ملی کامپیوتر دانشکده فنی و حرفهای سما، صص. 95-89، سنندج، 1392.
[8] E. Riloff, J. Wiebe, and T. Wilson, "Learning subjective nouns using extraction pattern bootstrapping," in Proc. Conf. on Natural Language Learning, CoNLL'03, pp. 25-32, Mar. 2003.
[9] C. Zhai and J. Lafferty, "Model-based feedback in the language modeling approach to information retrieval," in Proc. 10th Int. Conf. on Information and Knowledge Management, pp. 403-410, Oct. 2001.
[10] J. Yi, T. Nasukawa, and R. Bunescu, "Sentiment analyzer: extracting sentiments about a given topic using natural language processing techniques," in Proc. 3rd IEEE Int. Conf. on Data Mining, pp. 427-434, Nov. 2003.
[11] M. Hu and B. Liu, "Mining and summarizing customer reviews," in Proc. 10th ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 168-177, Aug. 2004.
[12] A. Popescu and O. Etzioni, "Extracting product features and opinions from reviews," in Proc. Conf. on Human Language Technology and Empirical Methods in Natural Language Processing, pp. 339-346, Oct. 2005.
[13] Q. Mei, X. Ling, and M. Vondra, "Topic sentiment mixture: modeling facets and opinions in weblogs," in Proc. Int. World Wide Web Conf. Committee, pp. 171-180, May. 2007.
[14] Y. Liu, X. Huang, and A. An, "ARSA: a sentiment-aware model for predicting sales performance using blogs," in Proc. 30th Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, pp. 607-614, Jul. 2007.
[15] R. McDonald, K. Hannan, and T. Neylon, "Structured models for fine-to-coarse sentiment analysis," in Proc. 45th Annual Meeting of the Association of Computational Linguistics, pp. 432-439, Jun. 2007.
[16] Q. Su, X. Xu, and H. Guo, "Hidden sentiment association in chinese web opinion mining," in Proc. Int. World Wide Web Conf. Committee, pp. 959-968, Apr. 2008.
[17] I. Titov and R. McDonald, "A joint model of text and aspect ratings for sentiment summarization," Association for Computational Linguistics, pp. 308-316, Jun. 2008.
[18] G. Qiu, B. Liu, J. Bu, and C. Chen, "Opinion word expansion and target extraction through double propagation," Computational Linguistics, vol. 37, no.1, pp. 9-27, Mar. 2011.
[19] M. Hu and B. Liu, "Mining and summarizing customer reviews", in Proc. ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining, pp. 168-177, Aug. 2004.
[20] B. Liu, M. Hu, and J. Cheng, "Opinion observer: analyzing and comparing opinions on the web," in Proc. 14th Int. World Wide Web Conf., pp. 341-351, May. 2005.
[21] M. Shamsfard, et al., "Semi-automatic development of farsnet; the persian wordnet," in Proc. 5th Global WordNet Conf., pp.846-850, Aug. 2010.
[22] م. ح. الهیمنش و ب. مینایی، "برچسبگذاری ادات سخن متون فارسی به کمک مدل مخفی مارکوف،" فصلنامه اطلاع رساني، آموزشي و مطالعات رايانهاي علوم اسلامي، شماره 34، صص. 106-102، بهار 1390.
[23] F. Raja, H. Amiri, and F. Oroumchian, et al., "Evaluation of part of speech tagging on persian text," in Proc. Second Workshop on Computational Approaches to Arabic Script-Based Languages, Linguistic Institute Stanford University, pp. 120-127, Jul. 2007.
[24] S. Tasharofi, et al., "Evaluation of statistical part of speech tagging of Persian text," in Proc. Int. Symp. on Signal Processing and its Applications, 4 pp., Feb. 2007.
[25] Dadegan Research Group, Persian Dependency Treebank Version 1.0, Annotation Manual and User Guide, Supreme Council of Information and Communication Technology (SCICT), 2012.
[26] M. Rasooli, M. Kouhestani, and M. Moloodi, "Development of a persian syntactic dependency treebank," in Proc. 2013 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL HLT'13, pp. 306-314, Atlanta, USA, Jun. 201.
[27] گروه پژوهشی دادگان، پیکره وابستگی نحوی زبان فارسی (نسخه 1.0)، تهران، دبیرخانه شورای عالی اطلاعرسانی، بازیابی از http://dadegan.ir/perdt، ۱۳۹۱.
[28] س. کاووسینژاد، "حذف در گروه اسمی زبان،" نامه فرهنگستان، صص. 127-109، 1379.
[29] م. سنجی و م. ر. داورپناه، "شناسايي واژههاي غير مفهومي (رايج) در نمايهسازي خودكار مدارك فارسي،" فصلنامه كتابداري و اطلاعرساني، جلد 12، شماره 4، صص. 36-9، زمستان 1389.
[30] ف. ا. خداپرستی، فرهنگ جامع واژگان مترادف و متضاد زبان فارسی، شیراز، دانشنامه فارس، ۱۳۷۶.
[31] B. Liu, M. Hu, and J. Cheng, "Opinion observer: analyzing and comparing opinions on the web," in Proc. 14th Int. World Wide Web Conf., pp. 342-351, Chiba, Japan, May 2005.