Using web analytics in forecasting the stock price of chemical products group in the stock exchange
Subject Areas : Generalamir daee 1 , Omid Mahdi Ebadati E. 2 , keyvan borna 3
1 -
2 - عضو هیأت علمی دانشگاه
3 -
Keywords: Text mining, web content exploration, web crawler, stock market forecasting, backup vector machine,
Abstract :
Forecasting markets, including stocks, has been attractive to researchers and investors due to the high volume of transactions and liquidity. The ability to predict the price enables us to achieve higher returns by reducing risk and avoiding financial losses. News plays an important role in the process of assessing current stock prices. The development of data mining methods, computational intelligence and machine learning algorithms have led to the creation of new models in prediction. The purpose of this study is to store news agencies' news and use text mining methods and support vector machine algorithm to predict the next day's stock price. For this purpose, the news published in 17 news agencies has been stored and categorized using a thematic language in Phoenician. Then, using text mining methods, support vector machine algorithm and different kernels, the stock price forecast of the chemical products group in the stock exchange is predicted. In this study, 300,000 news items in political and economic categories and stock prices of 25 selected companies in the period from November to March 1997 in 122 trading days have been used. The results show that with the support vector machine model with linear kernel, prices can be predicted by an average of 83%. Using nonlinear kernels and the quadratic equation of the support vector machine, the prediction accuracy increases by an average of 85% and other kernels show poorer results. ارسال
1.J. D. Velásquez, V. Palade, and L. C. Jain, Advanced techniques in web intelligence: Springer, 2013.
2. Cisco. (2019). Cisco Visual Networking Index: Forecast and Trends, 2017–2022 White Paper. Available: https://www.cisco.com/c/en/us/solutions/collateral/service-provider/visual-networking-index-vni/white-paper-c11-741490.html
3. internetlivestats. (2019). Total number of Websites. Available: https://www.internetlivestats.com/total-number-of-websites/
4. Z. Markov and D. T. Larose, Data mining the Web: uncovering patterns in Web content, structure, and usage: John Wiley & Sons, 2007.
5.B. Liu, Web data mining: exploring hyperlinks, contents, and usage data: Springer Science & Business Media, 2007.
6 A. Khadjeh Nassirtoussi, S. Aghabozorgi, T. Ying Wah, and D. C. L. Ngo, "Text mining for market prediction: A systematic review," Expert Systems with Applications, vol. 41, pp. 7653-7670, 11/15/ 2014.
7. M.-A. Mittermayer and G. Knolmayer, Text mining systems for market response to news: A survey: Institut für Wirtschaftsinformatik der Universität Bern, 2006.
8. C.-J. Huang, J.-J. Liao, D.-X. Yang, T.-Y. Chang, and Y.-C. Luo, "Realization of a news dissemination agent based on weighted association rules and text mining techniques," Expert Systems with Applications, vol. 37, pp. 6409-6413, 2010.
9. B. S. Kumar and V. Ravi, "A survey of the applications of text mining in financial domain," Knowledge-Based Systems, vol. 114, pp. 128-147, 12/15/ 2016.
10. M. Hagenau, M. Liebmann, and D. Neumann, "Automated news reading: Stock price prediction based on financial news using context-capturing features," Decision Support Systems, vol. 55, pp. 685-697, 6// 2013.
11. M. Thelwall, "Data cleansing and validation for multiple site link structure analysis," in Web mining: Applications and techniques, ed: IGI Global, 2005, pp. 208-227.
12. M. Sheng, Y. Qin, L. Yao, and B. Benatallah, Managing the web of things: linking the real world to the web: Morgan Kaufmann, 2017.
13. R. Kosala and H. Blockeel, "Web mining research: A survey," ACM Sigkdd Explorations Newsletter, vol. 2, pp. 1-15, 2000.
14. M. G. Da Costa and Z. Gong, "Web structure mining: an introduction," in Information Acquisition, 2005 IEEE International Conference on, 2005, p. 6 pp.
15. F. Johnson and S. K. Gupta, "Web content mining techniques: a survey," International Journal of Computer Applications, vol. 47, 2012.
16. Kumar and Ravi, "A survey of the applications of text mining in financial domain," vol. 114, pp. 128-147, 2016.
17. A. Hotho, A. Nürnberger, and G. Paaß, "A brief survey of text mining," in Ldv Forum, 2005, pp. 19-62.
18. Gupta and Lehal, "A survey of text mining techniques and applications," vol. 1, pp. 60-76, 2009.
19. Y. Zhang, M. Chen, and L. Liu, "A review on text mining," in Software Engineering and Service Science (ICSESS), 2015 6th IEEE International Conference on, 2015, pp. 681-685.
20. H. Hashimi, A. Hafez, and H. Mathkour, "Selection criteria for text mining approaches," Computers in Human Behavior, vol. 51, pp. 729-733, 2015.
21. K. Javed, S. Maruf, and H. A. Babri, "A two-stage Markov blanket based feature selection algorithm for text classification," Neurocomputing, vol. 157, pp. 91-104, 2015.
22. G. Hackeling, Mastering Machine Learning with scikit-learn: Packt Publishing Ltd, 2017.
23. Khan and A. Ahmad, "Cluster center initialization algorithm for K-means clustering," Pattern recognition letters, vol. 25, pp. 1293-1302, 2004.
24. J. Hou, H. Gao, and X. Li, "DSets-DBSCAN: a parameter-free clustering algorithm," IEEE Transactions on Image Processing, vol. 25, pp. 3182-3193, 2016.
25. Zhang and Z. Xu, "Hesitant fuzzy agglomerative hierarchical clustering algorithms," International Journal of Systems Science, vol. 46, pp. 562-576, 2015.
26. D. M. Farid, L. Zhang, C. M. Rahman, M. A. Hossain, and R. Strachan, "Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks," Expert Systems with Applications, vol. 41, pp. 1937-1946, 2014.
27. V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston, "Random forest: a classification and regression tool for compound classification and QSAR modeling," Journal of chemical information computer sciences, vol. 43, pp. 1947-1958, 2003.
28. Chen, K. Li, Z. Tang, K. Bilal, S. Yu, C. Weng, et al., "A parallel random forest algorithm for big data in a spark cloud computing environment," IEEE Transactions on Parallel Distributed Systems, pp. 1-1, 2017.
29. G. Dreyfus, Neural networks: methodology and applications: Springer Science & Business Media, 2005.
30. C. J. Burges, "A tutorial on support vector machines for pattern recognition," Data mining knowledge discovery, vol. 2, pp. 121-167, 1998.
31. M. Sokolova, N. Japkowicz, and S. Szpakowicz, "Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation," in Australasian joint conference on artificial intelligence, 2006, pp. 1015-1021.
32. S. S. Groth and J. Muntermann, "An intraday market risk management approach based on textual analysis," Decision Support Systems, vol. 50, pp. 680-691, 2011.
33. R. P. Schumaker, Y. Zhang, C.-N. Huang, and H. Chen, "Evaluating sentiment in financial news articles," Decision Support Systems, vol. 53, pp. 458-464, 6// 2012.
34. L. Dey, A. Mahajan, and S. M. Haque, "Document clustering for event identification and trend analysis in market news," in Advances in Pattern Recognition, 2009. ICAPR'09. Seventh International Conference on, 2009, pp. 103-106.
35. A. Mahajan, L. Dey, and S. M. Haque, "Mining Financial News for Major Events and Their Impacts on the Market," in 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008, pp. 423-426.
36. X. Zhong and D. Enke, "A comprehensive cluster and classification mining procedure for daily stock market return forecasting," Neurocomputing, vol. 267, pp. 152-168, 2017/12/06/ 2017.
37. A. E. Khedr, S. Salama, and N. Yaseen, "Predicting Stock Market Behavior using Data Mining Technique and News Sentiment Analysis," International Journal of Intelligent Systems and Applications (IJISA), vol. 9, pp. 22-30, 2017.
38. H. Levy and M. Sarnat, "International Diversification of Investment Portfolios," The American Economic Review, vol. 60, pp. 668-675, 1970.
39. J. M.-T. Wu, Z. Li, C.-W. Lin, and M. Pirouz, "A New Convolution Neural Network Model for Stock Price Prediction," ed, 2020, pp. 581-585.
40. O. M. Ebadati E and M. Mortazavi T, "An efficient hybrid machine learning method for time series stock market forecasting," Neural Network World, vol. 28, pp. 41-55, 2018.
41. A. Mahajan, L. Dey, and S. M. Haque, "Mining financial news for major events and their impacts on the market," in Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT'08. IEEE/WIC/ACM International Conference on, 2008, pp. 423-426.
42. X. Zhong and D. J. N. Enke, "A comprehensive cluster and classification mining procedure for daily stock market return forecasting," vol. 267, pp. 152-168, 2017.
43. A. E. Khedr and N. Yaseen, "Predicting stock market behavior using data mining technique and news sentiment analysis," International Journal of Intelligent Systems Applications, vol. 9, p. 22, 2017.
44. M. Hagenau, M. Liebmann, and D. J. D. S. S. Neumann, "Automated news reading: Stock price prediction based on financial news using context-capturing features," vol. 55, pp. 685-697, 2013.
45. scikit-learn.org. Choosing the right estimator. Available: https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html 46. ع. نوریان. (2018). هضم برای پردازش زبان فارسی در پایتون. Available: http://www.sobhe.ir/hazm/
47. W.-H. Chen, S.-H. Hsu, and H.-P. Shen, "Application of SVM and ANN for intrusion detection," Computers Operations Research, vol. 32, pp. 2617-2634, 2005.
9. B. S. Kumar and V. Ravi, "A survey of the applications of text mining in financial domain," Knowledge-Based Systems, vol. 114, pp. 128-147, 12/15/ 2016.
10. M. Hagenau, M. Liebmann, and D. Neumann, "Automated news reading: Stock price prediction based on financial news using context-capturing features," Decision Support Systems, vol. 55, pp. 685-697, 6// 2013.
11. M. Thelwall, "Data cleansing and validation for multiple site link structure analysis," in Web mining: Applications and techniques, ed: IGI Global, 2005, pp. 208-227.
12. M. Sheng, Y. Qin, L. Yao, and B. Benatallah, Managing the web of things: linking the real world to the web: Morgan Kaufmann, 2017.
13.R. Kosala and H. Blockeel, "Web mining research: A survey," ACM Sigkdd Explorations Newsletter, vol. 2, pp. 1-15, 2000.
14. M. G. Da Costa and Z. Gong, "Web structure mining: an introduction," in Information Acquisition, 2005 IEEE International Conference on, 2005, p. 6 pp.
15. F. Johnson and S. K. Gupta, "Web content mining techniques: a survey," International Journal of Computer Applications, vol. 47, 2012.
16. Kumar and Ravi, "A survey of the applications of text mining in financial domain," vol. 114, pp. 128-147, 2016.
17.A. Hotho, A. Nürnberger, and G. Paaß, "A brief survey of text mining," in Ldv Forum, 2005, pp. 19-62.
18.Gupta and Lehal, "A survey of text mining techniques and applications," vol. 1, pp. 60-76, 2009.
19.Y. Zhang, M. Chen, and L. Liu, "A review on text mining," in Software Engineering and Service Science (ICSESS), 2015 6th IEEE International Conference on, 2015, pp. 681-685.
20. H. Hashimi, A. Hafez, and H. Mathkour, "Selection criteria for text mining approaches," Computers in Human Behavior, vol. 51, pp. 729-733, 2015.
21. K. Javed, S. Maruf, and H. A. Babri, "A two-stage Markov blanket based feature selection algorithm for text classification," Neurocomputing, vol. 157, pp. 91-104, 2015.
22. G. Hackeling, Mastering Machine Learning with scikit-learn: Packt Publishing Ltd, 2017.
23. Khan and A. Ahmad, "Cluster center initialization algorithm for K-means clustering," Pattern recognition letters, vol. 25, pp. 1293-1302, 2004.
24.J. Hou, H. Gao, and X. Li, "DSets-DBSCAN: a parameter-free clustering algorithm," IEEE Transactions on Image Processing, vol. 25, pp. 3182-3193, 2016.
25.Zhang and Z. Xu, "Hesitant fuzzy agglomerative hierarchical clustering algorithms," International Journal of Systems Science, vol. 46, pp. 562-576, 2015.
26. D. M. Farid, L. Zhang, C. M. Rahman, M. A. Hossain, and R. Strachan, "Hybrid decision tree and naïve Bayes classifiers for multi-class classification tasks," Expert Systems with Applications, vol. 41, pp. 1937-1946, 2014.
27.V. Svetnik, A. Liaw, C. Tong, J. C. Culberson, R. P. Sheridan, and B. P. Feuston, "Random forest: a classification and regression tool for compound classification and QSAR modeling," Journal of chemical information computer sciences, vol. 43, pp. 1947-1958, 2003.
28. Chen, K. Li, Z. Tang, K. Bilal, S. Yu, C. Weng, et al., "A parallel random forest algorithm for big data in a spark cloud computing environment," IEEE Transactions on Parallel Distributed Systems, pp. 1-1, 2017.
29.G. Dreyfus, Neural networks: methodology and applications: Springer Science & Business Media, 2005.
30.C. J. Burges, "A tutorial on support vector machines for pattern recognition," Data mining knowledge discovery, vol. 2, pp. 121-167, 1998.
31. M. Sokolova, N. Japkowicz, and S. Szpakowicz, "Beyond accuracy, F-score and ROC: a family of discriminant measures for performance evaluation," in
Australasian joint conference on artificial intelligence, 2006, pp. 1015-1021. 32.S. S. Groth and J. Muntermann, "An intraday market risk management approach based on textual analysis," Decision Support Systems, vol. 50, pp. 680-691, 2011.
33. R. P. Schumaker, Y. Zhang, C.-N. Huang, and H. Chen, "Evaluating sentiment in financial news articles," Decision Support Systems, vol. 53, pp. 458-464, 6// 2012.
34. L. Dey, A. Mahajan, and S. M. Haque, "Document clustering for event identification and trend analysis in market news," in Advances in Pattern Recognition, 2009. ICAPR'09. Seventh International Conference on, 2009, pp. 103-106.
35.A. Mahajan, L. Dey, and S. M. Haque, "Mining Financial News for Major Events and Their Impacts on the Market," in 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, 2008, pp. 423-426.
36. X. Zhong and D. Enke, "A comprehensive cluster and classification mining procedure for daily stock market return forecasting," Neurocomputing, vol. 267, pp. 152-168, 2017/12/06/ 2017.
37. A. E. Khedr, S. Salama, and N. Yaseen, "Predicting Stock Market Behavior using Data Mining Technique and News Sentiment Analysis," International Journal of Intelligent Systems and Applications (IJISA), vol. 9, pp. 22-30, 2017.
38. H. Levy and M. Sarnat, "International Diversification of Investment Portfolios," The American Economic Review, vol. 60, pp. 668-675, 1970.
39. J. M.-T. Wu, Z. Li, C.-W. Lin, and M. Pirouz, "A New Convolution Neural Network Model for Stock Price Prediction," ed, 2020, pp. 581-585.
40. O. M. Ebadati E and M. Mortazavi T, "An efficient hybrid machine learning method for time series stock market forecasting," Neural Network World, vol. 28, pp. 41-55, 2018.
41.A. Mahajan, L. Dey, and S. M. Haque, "Mining financial news for major events and their impacts on the market," in Web Intelligence and Intelligent Agent Technology, 2008. WI-IAT'08. IEEE/WIC/ACM International Conference on, 2008, pp. 423-426.
42.X. Zhong and D. J. N. Enke, "A comprehensive cluster and classification mining procedure for daily stock market return forecasting," vol. 267, pp. 152-168, 2017.
43.A. E. Khedr and N. Yaseen, "Predicting stock market behavior using data mining technique and news sentiment analysis," International Journal of Intelligent Systems Applications, vol. 9, p. 22, 2017.
44. M. Hagenau, M. Liebmann, and D. J. D. S. S. Neumann, "Automated news reading: Stock price prediction based on financial news using context-capturing features," vol. 55, pp. 685-697, 2013.
45. scikit-learn.org. Choosing the right estimator. Available: https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html
46. ع. نوریان. (2018). هضم برای پردازش زبان فارسی در پایتون. Available: http://www.sobhe.ir/hazm/
47.W.-H. Chen, S.-H. Hsu, and H.-P. Shen, "Application of SVM and ANN for intrusion detection," Computers Operations Research, vol. 32, pp. 2617-2634, 2005.