• List of Articles Data Mining

      • Open Access Article

        1 - Applying data mining techniques to regions segmentation for entrance exams to governmental universities
        نرجس سرعتی آَشتیانی somayyeh alizadeh علی  مبصّـری
        The large numbers of Iranian high school graduates are willing to enter in governmental and popular colleges and compete for it. On the other hand, these graduate students are from various regions with different levels of access to facilities. In opinion of directors of More
        The large numbers of Iranian high school graduates are willing to enter in governmental and popular colleges and compete for it. On the other hand, these graduate students are from various regions with different levels of access to facilities. In opinion of directors of relevant agencies, the quota allocation solves this problem and they are looking to use the knowledge hidden in the data are available in this area.By this way volunteers from each region are compared together and managers are helped to allocate proper quota to related students in regions of each segment. In recent years, quota allocation was determined by Taxonomy that its result is a kind of ranking that does not allow group analyzing and identifies number of region theoretically. To solve this problem clustering is a good strategy. This study is carried out by using data mining techniques and Crisp methods on related dataset from education ministry, interior ministry, ministry of health, and center of statistic and evaluation organization for the first time. After extracting of effective attributes in this area, data preparation, data reduction and combination of attributes using Factor Analysis have done.in next step, by using K-means algorithm, similar items assign in to a cluster that has the minimum distance with centroid mean and then by using neural networks and decision trees, new item can be devoted to each cluster. Finally for assessing created models, accuracy of outputs compared with other methods. Outcomes of this research are: determining the optimal number of sectors, segmenting regions, analyzing each section, extracting decision rules, predicting class labels for new areas faster and more accurately, allowing the appropriate strategies formulation for each section Manuscript profile
      • Open Access Article

        2 - A method for clustering customers using RFM model and grey numbers in terms of uncertainty
        azime mozafari
        The purpose of this study is presentation a method for clustering bank customers based on RFM model in terms of uncertainty. According to the proposed framework in this study after determination the parameter values of the RFM model, including recently exchange (R), fre More
        The purpose of this study is presentation a method for clustering bank customers based on RFM model in terms of uncertainty. According to the proposed framework in this study after determination the parameter values of the RFM model, including recently exchange (R), frequency exchange (F), and monetary value of the exchange (M), grey theory is used to eliminate the uncertainty and customers are segmented using a different approach. Thus, bank customers are clustered to three main segments called good, ordinary and bad customers. After cluster validation using Dunn index and Davis Bouldin index, properties of customers are detected in any of the segments. Finally, recommendations are offered to improve customer relationship management system. Manuscript profile
      • Open Access Article

        3 - Proposing a Model for Extracting Information from Textual Documents, Based on Text Mining in E-learning
        Somayeh Ahari
        As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that disco More
        As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. Text mining aims at disclosing the concealed information by means of methods which on the one hand are able to cope with the large number of words and structures in natural language and on the other hand allow handling vagueness, uncertainty and fuzziness. Text mining, referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text that high-quality information is typically derived through the patterns and processes. Moreover, text mining, also known as text data mining or knowledge discovery from textual databases, refers to the process of extracting patterns or knowledge from text documents. In this research, a survey of text mining techniques and applications in e-learning has been presented. During these studies, relevant researches in the field of e-learning were classified. After classification of researches, related problems and solutions were extracted. In this paper, first, definition of text mining is presented. Then, the process of text mining and its applications in e-learning domain are described. Furthermore, text mining techniques are introduced, and each of these methods in the field of e-learning is considered. Finally, a model for the information extraction by text mining techniques in e-learning domain is proposed. Manuscript profile
      • Open Access Article

        4 - Survey different aspects of the problem phishing website detection and Review to existing Methods
        nafise langari
        One of the latest security threats in cyberspace to steal personal and financial information is created by phisher. Due to there Are various methods to detect phishing and also there is not an up-date comprehensive study on the issue, the authors Motivated to review and More
        One of the latest security threats in cyberspace to steal personal and financial information is created by phisher. Due to there Are various methods to detect phishing and also there is not an up-date comprehensive study on the issue, the authors Motivated to review and analysis the proposed phishing detection methods in five categories such as: anti-phishing tools Based, data mining based, heuristic based, meta-heuristic based and machine learning based methods. The advantages and Disadvantages of each method are extracted from the current review and comparison. The outlines of this study can be suitable to identify the probability gaps in phishing detection problems for feature researches. Manuscript profile
      • Open Access Article

        5 - Integrating data envelopment analysis and decision tree models In order to evaluate information technology-based units
        Amir Amini
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        6 - Discovering spam in Facebook social network using data mining.
        amin nazari
        In recent years, by developing new technologies and communication facilities such as internet, new aspects named virtual social networks have been created. Rapid development of social networks and huge number of anonymous Users in these networks, created a suitable en More
        In recent years, by developing new technologies and communication facilities such as internet, new aspects named virtual social networks have been created. Rapid development of social networks and huge number of anonymous Users in these networks, created a suitable environment for scammers. Most of the times, scammers are trying to spread several types of spams into these high potential places. Hence, an effective method is required to detect the spams in order to increase the level of information security of people in the social networks. In this paper, a new method for discovering spammer in Facebook social network is proposed. Findings show 99.96% accuracy. In previous papers, users were divided into two groups of ordinary users and spammer users. The method of classification in these papers recognizes also as a spam the users which attached by spammer. So, in this paper by dividing users into three types of ordinary users, spammer and users attached by spammer, accuracy of spam detection has been increased. Manuscript profile
      • Open Access Article

        7 - Developing A Suitable Data Model For Data Mining Application In Banking
        Shahideh Ahmadi
        Banking domains such as credit assessments, branch efficiency, electronic banking  is tremendous contexts for the broad application of the concepts of business intelligence and its methods include data mining, data warehouses and decision support systems. There are many More
        Banking domains such as credit assessments, branch efficiency, electronic banking  is tremendous contexts for the broad application of the concepts of business intelligence and its methods include data mining, data warehouses and decision support systems. There are many researches in the field of application of data mining in particular domains of banking, each of which analyzes the different entity of the banking sector, such as customers, facilities, accounts, and so on, but there isn’t research that comprehensively addresses all data mining applications in a bank, it integrates them, extracts and categorizes all banking entities for a variety of analytical applications and ultimately provides an appropriate data model according to the required attributes for the banking domains. Currently, information systems of Iranian banks are being developed for responding to new information needs. In this research by using content analysis method was investigated the content of valid research in the field of banking which was carried out with the data mining approach and by extracting the entities and attributes used in these researches is presented an appropriate data model for data analysis applications in banking. Information technology managers  by using this model can assess the status of the bank in terms of the richness of the data needed to conduct data analysis and consider the identified deficiencies in the future development plans of the information systems. After analyzing and evaluating previous researches, 28 entities and 423 attributes were identified and the last entity-relationship model was created. Based on the presented model, a measuring tool was provided as a checklist so that banks can use it to measure their status in terms of the richness of existing data and to measure their readiness from the perspective of the data to do the analysis. To confirm the last data model, were used idea of ten experts by questionnaires and interviews in different sections such as customers and public banking, finance and support, e-banking, credit and corporate affairs, IT domain and international affairs in the bank. Also, using data collected from the researches were presented frequency diagrams of the algorithms, techniques, sampling methods, performance indexes and data mining soft­wares that used in the researches. To decide which data mining algorithms are most used in different domains as an example. Manuscript profile
      • Open Access Article

        8 - The study of the accuracy of real estate experts' evaluations using a data mining model (Case study of Mellat Bank)
        fatemeh davar
        As the main part of the financial system, banks always face different risks, the most important of which is the credit scoring risk and property valuation. One of the issues faced by property valuation experts is how to evaluate property prices. In general, court expert More
        As the main part of the financial system, banks always face different risks, the most important of which is the credit scoring risk and property valuation. One of the issues faced by property valuation experts is how to evaluate property prices. In general, court experts assess real estate based on price indices. In this research, the researcher aimed to verify the accuracy of valuation experts by using data mining models. This action has been taken to help bank managers and audit reporters to make better decisions about experts and their valuations. Using property valuation indexes and data mining, a predictive model has been developed to predict property prices, and a combination of FCM and K-NN algorithms has been used to achieve a high performance prediction model. This measure was able to greatly increase the predictive accuracy and increase the efficiency of the proposed model. The accuracy level in predicting valuated prices was 84.21% and the RMSE rate in its forecast was 0.43. The proposed approach was tested on real estate valuation data of the Mellat Bank. Manuscript profile
      • Open Access Article

        9 - Analyzing the impact of macroeconomic variables on customer churn banking industry With data mining approach
        Mehrnaz Motahari nia
        Today, customer knowledge and understanding of its needs have become a business imperative. Organizations need customer satisfaction to sustain their business and succeed in a competitive market. Knowing customers through customer behavior analysis is possible with the More
        Today, customer knowledge and understanding of its needs have become a business imperative. Organizations need customer satisfaction to sustain their business and succeed in a competitive market. Knowing customers through customer behavior analysis is possible with the use of new technologies such as data mining techniques for organizations. The purpose of this research is to investigate the effective factors on Customers churn in the banking industry. For this purpose, the transaction data of sales terminals of a payment service provider company (PSP) in Iran has been analyzed. In the proposed model using the WRFM method and combining it with the K-Means clustering algorithm, sales terminals are split and loyalty each month. Then, using the additive selection method plus L take R and the multivariate linear regression algorithm, the effective features The percentage of customers discarded is selected from the monthly economic indicators per month. Based on the results of the implementation of the three variables, the index of stock market value, inflation and the price of all coins are the most effective variables among the economic indicators under study. Manuscript profile
      • Open Access Article

        10 - Integrating Data Envelopment Analysis and Decision Tree Models in Order to Evaluate Information Technology-Based Units
        Amir Amini ali alinezhad somaye shafaghizade
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        11 - Provide a method for customer segmentation using the RFM model in conditions of uncertainty
        mohammadreza gholamian azime mozafari
        The purpose of this study is to provide a method for customer segmentation of a private bank in Shiraz based on the RFM model in the face of uncertainty about customer data. In the proposed framework of this study, first, the values ​​of RFM model indicators including e More
        The purpose of this study is to provide a method for customer segmentation of a private bank in Shiraz based on the RFM model in the face of uncertainty about customer data. In the proposed framework of this study, first, the values ​​of RFM model indicators including exchange novelty (R), number of exchanges (F) and monetary value of exchange (M) were extracted from the customer database and preprocessed. Given the breadth of the data, it is not possible to determine the exact number to determine whether the customer is good or bad; Therefore, to eliminate this uncertainty, the gray number theory was used, which considers the customer's situation as a range. In this way, using a different method, the bank's customers were segmented, which according to the results, customers were divided into three main sections or clusters as good, normal and bad customers. After validating the clusters using Don and Davis Boldin indicators, customer characteristics in each sector were identified and at the end, suggestions were made to improve the customer relationship management system. Manuscript profile
      • Open Access Article

        12 - An Improved Method for Detecting Phishing Websites Using Data Mining on Web Pages
        mahdiye baharloo Alireza Yari
        Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is More
        Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is regarded as one of the important prerequisites in designing an accurate detection system. Therefore, in order to detect phishing features, a list of 30 features suggested by phishing websites was first prepared. Then, a two-stage feature reduction method based on feature selection and extraction were proposed to enhance the efficiency of phishing detection systems, which was able to reduce the number of features significantly. Finally, the performance of decision tree J48, random forest, naïve Bayes methods were evaluated{cke_protected_1}{cke_protected_2}{cke_protected_3}{cke_protected_4} on the reduced features. The results indicated that accuracy of the model created to determine the phishing websites by using the two-stage feature reduction based Wrapper and Principal Component Analysis (PCA) algorithm in the random forest method of 96.58%, which is a desirable outcome compared to other methods. Manuscript profile
      • Open Access Article

        13 - Referral Traffic Analysis: A Case Study of the Iranian Students' News Agency (ISNA)
        Roya Hassanian Esfahani Mohammad Javad Kargar
        Web traffic analysis is a well-known e-marketing activity. Today most of the news agencies have entered the web providing a variety of online services to their customers. The number of online news consumers is also increasing dramatically all over the world. A news webs More
        Web traffic analysis is a well-known e-marketing activity. Today most of the news agencies have entered the web providing a variety of online services to their customers. The number of online news consumers is also increasing dramatically all over the world. A news website usually benefits from different acquisition channels including organic search services, paid search services, referral links, direct hits, links from online social media, and e-mails. This article presents the results of an empirical study of analyzing referral traffic of a news website through data mining techniques. Main methods include correlation analysis, outlier detection, clustering, and model performance evaluation. The results decline any significant relationship between the amount of referral traffic coming from a referrer website and the website's popularity state. Furthermore, the referrer websites of the study fit into three clusters applying K-means Squared Euclidean Distance clustering algorithm. Performance evaluations assure the significance of the model. Also, among detected clusters, the most populated one has labeled as "Automatic News Aggregator Websites" by the experts. The findings of the study help to have a better understanding of the different referring behaviors, which form around 15% of the overall traffic of Iranian Students' News Agency (ISNA) website. They are also helpful to develop more efficient online marketing plans, business alliances, and corporate strategies. Manuscript profile
      • Open Access Article

        14 - Privacy Preserving Big Data Mining: Association Rule Hiding
        Golnar Assadat  Afzali shahriyar mohammadi
        Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to More
        Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to cover relationships between seemingly unrelated data in a data base.. Association rule hiding is a research area in privacy preserving data mining (PPDM) which addresses a solution for hiding sensitive rules within the data problem. Many researches have be done in this area, but most of them focus on reducing undesired side effect of deleting sensitive association rules in static databases. However, in the age of big data, we confront with dynamic data bases with new data entrance at any time. So, most of existing techniques would not be practical and must be updated in order to be appropriate for these huge volume data bases. In this paper, data anonymization technique is used for association rule hiding, while parallelization and scalability features are also embedded in the proposed model, in order to speed up big data mining process. In this way, instead of removing some instances of an existing important association rule, generalization is used to anonymize items in appropriate level. So, if necessary, we can update important association rules based on the new data entrances. We have conducted some experiments using three datasets in order to evaluate performance of the proposed model in comparison with Max-Min2 and HSCRIL. Experimental results show that the information loss of the proposed model is less than existing researches in this area and this model can be executed in a parallel manner for less execution time Manuscript profile
      • Open Access Article

        15 - Preserving Data Clustering with Expectation Maximization Algorithm
        Leila Jafar Tafreshi Farzin Yaghmaee
        Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and informatio More
        Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and information security. Therefore, a new class of data mining methods called privacy preserving data mining (PPDM) has been developed. The aim of researches in this field is to develop techniques those could be applied to databases without violating the privacy of individuals. In this work we introduce a new approach to preserve sensitive information in databases with both numerical and categorical attributes using fuzzy logic. We map a database into a new one that conceals private information while preserving mining benefits. In our proposed method, we use fuzzy membership functions (MFs) such as Gaussian, P-shaped, Sigmoid, S-shaped and Z-shaped for private data. Then we cluster modified datasets by Expectation Maximization (EM) algorithm. Our experimental results show that using fuzzy logic for preserving data privacy guarantees valid data clustering results while protecting sensitive information. The accuracy of the clustering algorithm using fuzzy data is approximately equivalent to original data and is better than the state of the art methods in this field. Manuscript profile
      • Open Access Article

        16 - A RFMV Model and Customer Segmentation Based on Variety of Products
        Saman  Qadaki Moghaddam Neda Abdolvand Saeedeh Rajaee Harandi
        Today, increased competition between organizations has led them to seek a better understanding of customer behavior through innovative ways of storing and analyzing their information. Moreover, the emergence of new computing technologies has brought about major change More
        Today, increased competition between organizations has led them to seek a better understanding of customer behavior through innovative ways of storing and analyzing their information. Moreover, the emergence of new computing technologies has brought about major changes in the ability of organizations to collect, store and analyze macro-data. Therefore, over thousands of data can be stored for each customer. Hence, customer satisfaction is one of the most important organizational goals. Since all customers do not represent the same profitability to an organization, understanding and identifying the valuable customers has become the most important organizational challenge. Thus, understanding customers’ behavioral variables and categorizing customers based on these characteristics could provide better insight that will help business owners and industries to adopt appropriate marketing strategies such as up-selling and cross-selling. The use of these strategies is based on a fundamental variable, variety of products. Diversity in individual consumption may lead to increased demand for variety of products; therefore, variety of products can be used, along with other behavioral variables, to better understand and categorize customers’ behavior. Given the importance of the variety of products as one of the main parameters of assessing customer behavior, studying this factor in the field of business-to-business (B2B) communication represents a vital new approach. Hence, this study aims to cluster customers based on a developed RFM model, namely RFMV, by adding a variable of variety of products (V). Therefore, CRISP-DM and K-means algorithm was used for clustering. The results of the study indicated that the variable V, variety of products, is effective in calculating customers’ value. Moreover, the results indicated the better customers clustering and valuation by using the RFMV model. As a whole, the results of modeling indicate that the variety of products along with other behavioral variables provide more accurate clustering than RFM model. Manuscript profile
      • Open Access Article

        17 - Identification of a Nonlinear System by Determining of Fuzzy Rules
        hojatallah hamidi Atefeh  Daraei
        In this article the hybrid optimization algorithm of differential evolution and particle swarm is introduced for designing the fuzzy rule base of a fuzzy controller. For a specific number of rules, a hybrid algorithm for optimizing all open parameters was used to reach More
        In this article the hybrid optimization algorithm of differential evolution and particle swarm is introduced for designing the fuzzy rule base of a fuzzy controller. For a specific number of rules, a hybrid algorithm for optimizing all open parameters was used to reach maximum accuracy in training. The considered hybrid computational approach includes: opposition-based differential evolution algorithm and particle swarm optimization algorithm. To train a fuzzy system hich is employed for identification of a nonlinear system, the results show that the proposed hybrid algorithm approach demonstrates a better identification accuracy compared to other educational approaches in identification of the nonlinear system model. The example used in this article is the Mackey-Glass Chaotic System on which the proposed method is finally applied. Manuscript profile
      • Open Access Article

        18 - Analysis of Business Customers’ Value Network Using Data Mining Techniques
        Forough Farazzmanesh (Isvand) Monireh Hosseini
        In today's competitive environment, customers are the most important asset to any company. Therefore companies should understand what the retention and value drivers are for each customer. An approach that can help consider customers‘ different value dimensions is the More
        In today's competitive environment, customers are the most important asset to any company. Therefore companies should understand what the retention and value drivers are for each customer. An approach that can help consider customers‘ different value dimensions is the value network. This paper aims to introduce a new approach using data mining techniques for mapping and analyzing customers‘ value network. Besides, this approach is applied in a real case study. This research contributes to develop and implement a methodology to identify and define network entities of a value network in the context of B2B relationships. To conduct this work, we use a combination of methods and techniques designed to analyze customer data-sets (e.g. RFM and customer migration) and to analyze value network. As a result, this paper develops a new strategic network view of customers and discusses how a company can add value to its customers. The proposed approach provides an opportunity for marketing managers to gain a deep understanding of their business customers, the characteristics and structure of their customers‘ value network. This paper is the first contribution of its kind to focus exclusively on large data-set analytics to analyze value network. This new approach indicates that future research of value network can further gain the data mining tools. In this case study, we identify the value entities of the network and its value flows in the telecommunication organization using the available data in order to show that it can improve the value in the network by continuous monitoring. Manuscript profile
      • Open Access Article

        19 - DBCACF: A Multidimensional Method for Tourist Recommendation Based on Users’ Demographic, Context and Feedback
        Maral Kolahkaj Ali Harounabadi Alireza Nikravan shalmani Rahim Chinipardaz
        By the advent of some applications in the web 2.0 such as social networks which allow the users to share media, many opportunities have been provided for the tourists to recognize and visit attractive and unfamiliar Areas-of-Interest (AOIs). However, finding the appropr More
        By the advent of some applications in the web 2.0 such as social networks which allow the users to share media, many opportunities have been provided for the tourists to recognize and visit attractive and unfamiliar Areas-of-Interest (AOIs). However, finding the appropriate areas based on user’s preferences is very difficult due to some issues such as huge amount of tourist areas, the limitation of the visiting time, and etc. In addition, the available methods have yet failed to provide accurate tourist’s recommendations based on geo-tagged media because of some problems such as data sparsity, cold start problem, considering two users with different habits as the same (symmetric similarity), and ignoring user’s personal and context information. Therefore, in this paper, a method called “Demographic-Based Context-Aware Collaborative Filtering” (DBCACF) is proposed to investigate the mentioned problems and to develop the Collaborative Filtering (CF) method with providing personalized tourist’s recommendations without users’ explicit requests. DBCACF considers demographic and contextual information in combination with the users' historical visits to overcome the limitations of CF methods in dealing with multi- dimensional data. In addition, a new asymmetric similarity measure is proposed in order to overcome the limitations of symmetric similarity methods. The experimental results on Flickr dataset indicated that the use of demographic and contextual information and the addition of proposed asymmetric scheme to the similarity measure could significantly improve the obtained results compared to other methods which used only user-item ratings and symmetric measures. Manuscript profile
      • Open Access Article

        20 - The Development of a Hybrid Error Feedback Model for Sales Forecasting
        Mehdi Farrokhbakht Foumani Sajad Moazami Goudarzi
        Sales forecasting is one of the significant issues in the industrial and service sector which can lead to facilitated management decisions and reduce the lost values in case of being dealt with properly. Also sales forecasting is one of the complicated problems in analy More
        Sales forecasting is one of the significant issues in the industrial and service sector which can lead to facilitated management decisions and reduce the lost values in case of being dealt with properly. Also sales forecasting is one of the complicated problems in analyzing time series and data mining due to the number of intervening parameters. Various models were presented on this issue and each one found acceptable results. However, developing the methods in this study is still considered by researchers. In this regard, the present study provided a hybrid model with error feedback for sales forecasting. In this study, forecasting was conducted using a supervised learning method. Then, the remaining values (model error) were specified and the error values were forecasted using another learning method. Finally, two trained models were combined together and consecutively used for sales forecasting. In other words, first the forecasting was conducted and then the error rate was determined by the second model. The total forecasting and model error indicated the final forecasting. The computational results obtained from numerical experiments indicated the superiority of the proposed hybrid method performance over the common models in the available literature and reduced the indicators related to forecasting error. Manuscript profile
      • Open Access Article

        21 - Presenting the model for opinion mining at the document feature level for hotel users' reviews
        ELHAM KHALAJJ shahriyar mohammadi
        Nowadays, online review of user’s sentiments and opinions on the Internet is an important part of the process of people deciding whether to choose a product or use the services provided. Despite the Internet platform and easy access to blogs related to opinions in the More
        Nowadays, online review of user’s sentiments and opinions on the Internet is an important part of the process of people deciding whether to choose a product or use the services provided. Despite the Internet platform and easy access to blogs related to opinions in the field of tourism and hotel industry, there are huge and rich sources of ideas in the form of text that people can use text mining methods to discover the opinions of. Due to the importance of user's sentiments and opinions in the industry, especially in the tourism and hotel industry, the topics of opinion research and analysis of emotions and exploration of texts written by users have been considered by those in charge. In this research, a new and combined method based on a common approach in sentiment analysis, the use of words to produce characteristics for classifying reviews is presented. Thus, the development of two methods of vocabulary construction, one using statistical methods and the other using genetic algorithm is presented. The above words are combined with the Vocabulary of public feeling and standard Liu Bing classification of prominent words to increase the accuracy of classification Manuscript profile
      • Open Access Article

        22 - Strategic Human Resource Management in Digital Era Based on Big Data
        Gholamreza Malekzadeh sedigheh sadeghi
        Nowadays intelligent devices, virtual environments and technological innovations is a part of the human’s everyday life. While Technology innovation could easily be representing currently highest business threatens, executive leaders that transform this in Business oppo More
        Nowadays intelligent devices, virtual environments and technological innovations is a part of the human’s everyday life. While Technology innovation could easily be representing currently highest business threatens, executive leaders that transform this in Business opportunities and create a new competitive space out of this threat are the ones succeeding. On the other hand influence of information technology in organizations and besides, extension of various kinds of social media is a good opportunity to gather a mass amount of people’s information and data. Regarding to these facts one can say that creative thinking and alignment with digital age facilities, requirements and needs and respecting the value of knowledge management along with making use of information management is what should be taken under consideration with more attention specially in field of human capital management. We will discuss on HR digital literacy and HR understanding of organization mission and values effects on flexibility in digital transformation. In this article, we will discuss usage of information systems especially big data in human resource managment in digital era with respect to surveys of honored organizations such as Mc keinsy. It is deducted that in new age, regarding that a new generation of workforce with different attitude and expectations is ready to enter labor market, transformation from traditional structures to structures driven from analytical results of big data would lead to a more effective management. Manuscript profile
      • Open Access Article

        23 - decision support system design by using data mining tools (CASE STUDY Cultural Assistance of University of Science and Technology)
        Rouzbeh Ghousi emad chizari hani vahdani
        decision-making,is the most important duty of managers. In the new era , decision-making process has many difficulties and delicacies; so that decision-making without the use of new technologies and information analysis , the objectives will not be achieved as desired. More
        decision-making,is the most important duty of managers. In the new era , decision-making process has many difficulties and delicacies; so that decision-making without the use of new technologies and information analysis , the objectives will not be achieved as desired. Efficient management in addition to knowledge & experience management, needs to know how to use information systems. decision support system(DSS) is one of these systems that support decision-making process for managers. In this paper , at first we review the literature of decision support systems; then data mining as a tool to extract information and knowledge from organizational raw data is introduced. This extracted knowledge, may contain concepts and informations that are neglected in organization up to now, so this knowledge can help managers in Decision-making process. Eventually, The findings of this study has been used to help managers and vicars in their decisions at Iran University of Science and Technology(IUST). Manuscript profile
      • Open Access Article

        24 - Proposing a Density-Based Clustering Algorithm with Ability to Discover Multi-Density Clusters in Spatial Databases
        A. Zadedehbalaei A. Bagheri H.  Afshar
        Clustering is one of the important techniques for knowledge discovery in spatial databases. density-based clustering algorithms are one of the main clustering methods in data mining. DBSCAN which is the base of density-based clustering algorithms, besides its benefits s More
        Clustering is one of the important techniques for knowledge discovery in spatial databases. density-based clustering algorithms are one of the main clustering methods in data mining. DBSCAN which is the base of density-based clustering algorithms, besides its benefits suffers from some issues such as difficulty in determining appropriate values for input parameters and inability to detect clusters with different densities. In this paper, we introduce a new clustering algorithm which unlike DBSCAN algorithm, can detect clusters with different densities. This algorithm also detects nested clusters and clusters sticking together. The idea of the proposed algorithm is as follows. First, we detect the different densities of the dataset by using a technique and Eps parameter is computed for each density. Then DBSCAN algorithm is adapted with the computed parameters to apply on the dataset. The experimental results which are obtained by running the suggested algorithm on standard and synthetic datasets by using well-known clustering assessment criteria are compared to the results of DBSCAN algorithm and some of its variants including VDBSCAN, VMDBSCAN, LDBSCAN, DVBSCAN and MDDBSCAN. All these algorithms have been introduced to solve the problem of multi-density data sets. The results show that the suggested algorithm has higher accuracy and lower error rate in comparison to the other algorithms. Manuscript profile
      • Open Access Article

        25 - Attribute Reduction Based on Rough Set Theory by Soccer League Competition Algorithm
        M. Abdolrazzagh-Nezhad Ali Adibiyan
        Increasing the dimension of the databases have involved the attribute reduction as a critical issue in data mining that it searches to find a subset of attributes with the most effectiveness on the hidden patterns. In the current years, the rough set theory has been con More
        Increasing the dimension of the databases have involved the attribute reduction as a critical issue in data mining that it searches to find a subset of attributes with the most effectiveness on the hidden patterns. In the current years, the rough set theory has been considered by researchers as one of the most effective and efficient tools to the reduction. In this paper, the soccer league competition algorithm is modified and adopted to solve the attribute reduction problem for the first time. The ability to escape the local optimal, the ability to use the information distributed by players in the search space, the rapid convergence to the optimal solutions, and the low algorithm’s parameters were the motivation of considering the algorithm in the current research. The proposed ideas to modify the algorithm consist of utilizing the total power of fixed and saved players in calculating the power of each team, considering the combination of continuous and discrete structures for each player, proposing a novel discretization method, providing a hydraulic analysis appropriate to the research problem for evaluating each player, designing correction in Imitation and Provocation operators based on the challenges in their original version. The proposed ideas are performed on small, medium and large data sets from UCI and the experimental results are compared with the state-of-the-art algorithms. This comparison shows that the competitive advantages of the proposed algorithm over the investigated algorithms. Manuscript profile
      • Open Access Article

        26 - Assessment of Demand Side Resources Potential in Presence of Cooling and Heating Equipment Using Data Mining Method Based Upon K-Means Clustering Algorithm
        fatemeh sheibani M. Mollahassani-pour هنگامه کشاورز
        Under the smart power systems, determining the amount of Demand Response Resources(DRRs) potential is considered as a crucial issue due to affecting in all energy policy decisions. In this paper, the potential of DRRs in presence of cooling and heating equipment are ide More
        Under the smart power systems, determining the amount of Demand Response Resources(DRRs) potential is considered as a crucial issue due to affecting in all energy policy decisions. In this paper, the potential of DRRs in presence of cooling and heating equipment are identified using k-means clustering algorithm as a data mining technique. In this regard, the energy consumption dataset are categorized in different clusters by k-means algorithm based upon variations of energy price and ambient temperature during peak hours of hot (Spring and Summer) and cold (Autumn and Winter) periods. Then, the clusters with the possibility of cooling and heating equipment’s commitment are selected. After that, the confidence interval diagram of energy consumption in elected clusters is provided based upon energy price variations. The nominal potential of DRRs, i.e. flexible load, will be obtained regarding the maximum and minimum differences between the average of energy consumption in upper and middle thresholds of the confidence interval diagram. The energy consumption, ambient temperature and energy price related to BOSTON electricity network over a six-year horizon time is utilized to evaluate the proposed model. Manuscript profile
      • Open Access Article

        27 - A New Data Clustering Method Using 4-Gray Wolf Algorithm
        Laleh Ajami Bakhtiarvand Zahra Beheshti
        Nowadays, clustering methods have received much attention because the volume and variety of data are increasing considerably.The main problem of classical clustering methods is that they easily fall into local optima. Meta-heuristic algorithms have shown good results in More
        Nowadays, clustering methods have received much attention because the volume and variety of data are increasing considerably.The main problem of classical clustering methods is that they easily fall into local optima. Meta-heuristic algorithms have shown good results in data clustering. They can search the problem space to find appropriate cluster centers. One of these algorithms is gray optimization wolf (GWO) algorithm. The GWO algorithm shows a good exploitation and obtains good solutions in some problems, but its disadvantage is poor exploration. As a result, the algorithm converges to local optima in some problems. In this study, an improved version of gray optimization wolf (GWO) algorithm called 4-gray wolf optimization (4GWO) algorithm is proposed for data clustering. In 4GWO, the exploration capability of GWO is improved, using the best position of the fourth group of wolves called scout omega wolves. The movement of each wolf is calculated based on its score. The better score is closer to the best solution and vice versa. The performance of 4GWO algorithm for the data clustering (4GWO-C) is compared with GWO, particle swarm optimization (PSO), artificial bee colony (ABC), symbiotic organisms search (SOS) and salp swarm algorithm (SSA) on fourteen datasets. Also, the efficiency of 4GWO-C is compared with several various GWO algorithms on these datasets. The results show a significant improvement of the proposed algorithm compared with other algorithms. Also, EGWO as an Improved GWO has the second rank among the different versions of GWO algorithms. The average of F-measure obtained by 4GWO-C is 82.172%; while, PSO-C as the second best algorithm provides 78.284% on all datasets. Manuscript profile
      • Open Access Article

        28 - Construction of Scalable Decision Tree Based on Fast Data Partitioning and Pre-Pruning
        سميه لطفي Mohammad Ghasemzadeh Mehran Mohsenzadeh Mitra Mirzarezaee
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing wit More
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing with huge amounts of data, the obtained decision tree would grow in size and complexity, and therefore require excessive running time. Almost all of the tree-construction algorithms need to store all or part of the training data set; but those algorithms which do not face memory shortages because of selecting a subset of data, can save the extra time for data selection. In order to select the best feature to create a branch in the tree, a lot of calculations are required. In this paper we presents an incremental scalable approach based on fast partitioning and pruning; The proposed algorithm builds the decision tree via using the entire training data set but it doesn't require to store the whole data in the main memory. The pre-pruning method has also been used to reduce the complexity of the tree. The experimental results on the UCI data set show that the proposed algorithm, in addition to preserving the competitive accuracy and construction time, could conquer the mentioned disadvantages of former methods. Manuscript profile
      • Open Access Article

        29 - Combination of Instance Selection and Data Augmentation Techniques for Imbalanced Data Classification
        Parastoo Mohaghegh Samira Noferesti Mehri Rajaei
        Mohaghegh, S. Noferesti*, and M. Rajaei Abstract: In the era of big data, automatic data analysis techniques such as data mining have been widely used for decision-making and have become very effective. Among data mining techniques, classification is a common method fo More
        Mohaghegh, S. Noferesti*, and M. Rajaei Abstract: In the era of big data, automatic data analysis techniques such as data mining have been widely used for decision-making and have become very effective. Among data mining techniques, classification is a common method for decision making and prediction. Classification algorithms usually work well on balanced datasets. However, one of the challenges of the classification algorithms is how to correctly predicting the label of new samples based on learning on imbalanced datasets. In this type of dataset, the heterogeneous distribution of the data in different classes causes examples of the minority class to be ignored in the learning process, while this class is more important in some prediction problems. To deal with this issue, in this paper, an efficient method for balancing the imbalanced dataset is presented, which improves the accuracy of the machine learning algorithms to correct prediction of the class label of new samples. According to the evaluations, the proposed method has a better performance compared to other methods based on two common criteria in evaluating the classification of imbalanced datasets, namely "Balanced Accuracy" and "Specificity". Manuscript profile
      • Open Access Article

        30 - Design and implementation of a survival model for patients with melanoma based on data mining algorithms
        farinaz sanaei Seyed Abdollah  Amin Mousavi Abbas Toloie Eshlaghy ali rajabzadeh ghotri
        Background/Purpose: Among the most commonly diagnosed cancers, melanoma is the second leading cause of cancer-related death. A growing number of people are becoming victims of melanoma. Melanoma is also the most malignant and rare form of skin cancer. Advanced cases of More
        Background/Purpose: Among the most commonly diagnosed cancers, melanoma is the second leading cause of cancer-related death. A growing number of people are becoming victims of melanoma. Melanoma is also the most malignant and rare form of skin cancer. Advanced cases of the disease may cause death due to the spread of the disease to internal organs. The National Cancer Institute reported that approximately 99,780 people were diagnosed with melanoma in 2022, and approximately 7,650 died. Therefore, this study aims to develop an optimization algorithm for predicting melanoma patients' survival. Methodology: This applied research was a descriptive-analytical and retrospective study. The study population included patients with melanoma cancer identified from the National Cancer Research Center at Shahid Beheshti University between 2008 and 2013, with a follow-up period of five years. An optimization model was selected for melanoma survival prognosis based on the evaluation metrics of data mining algorithms. Findings: A neural network algorithm, a Naïve Bayes network, a Bayesian network, a combination of decision tree and Naïve Bayes network, logistic regression, J48, and ID3 were selected as the models used in the national database. Statistically, the studied neural network outperformed other selected algorithms in all evaluation metrics. Conclusion: The results of the present study showed that the neural network with a value of 0.97 has optimal performance in terms of reliability. Therefore, the predictive model of melanoma survival showed a better performance both in terms of discrimination power and reliability. Therefore, this algorithm was proposed as a melanoma survival prediction model. Manuscript profile
      • Open Access Article

        31 - Presenting a web recommender system for user nose pages using DBSCAN clustering algorithm and machine learning SVM method.
        reza molaee fard Mohammad mosleh
        Recommender systems can predict future user requests and then generate a list of the user's favorite pages. In other words, recommender systems can obtain an accurate profile of users' behavior and predict the page that the user will choose in the next move, which can s More
        Recommender systems can predict future user requests and then generate a list of the user's favorite pages. In other words, recommender systems can obtain an accurate profile of users' behavior and predict the page that the user will choose in the next move, which can solve the problem of the cold start of the system and improve the quality of the search. In this research, a new method is presented in order to improve recommender systems in the field of the web, which uses the DBSCAN clustering algorithm to cluster data, and this algorithm obtained an efficiency score of 99%. Then, using the Page rank algorithm, the user's favorite pages are weighted. Then, using the SVM method, we categorize the data and give the user a combined recommender system to generate predictions, and finally, this recommender system will provide the user with a list of pages that may be of interest to the user. The evaluation of the results of the research indicated that the use of this proposed method can achieve a score of 95% in the recall section and a score of 99% in the accuracy section, which proves that this recommender system can reach more than 90%. It detects the user's intended pages correctly and solves the weaknesses of other previous systems to a large extent. Manuscript profile
      • Open Access Article

        32 - Predicting Generalized Anxiety Disorder Among Female Students Using Random Forest Approach
        Zahra Gholami Habibeh Zare
        <p>Mental health is considered one of the major challenges for the generations. Generalized anxiety disorder (GAD) is one of many mental health complications. However, individuals with the disorder experience hyperbolic concerns and tensions regarding daily events. Furt More
        <p>Mental health is considered one of the major challenges for the generations. Generalized anxiety disorder (GAD) is one of many mental health complications. However, individuals with the disorder experience hyperbolic concerns and tensions regarding daily events. Furthermore, it is reported that approximately 5% of the population of developed countries suffer from GAD. Additionally, women are affected by this disease twice as often as men, and it is an increasing disorder among women, particularly female students. This paper aims to predict generalized anxiety disorder among female students using the random decision forest algorithm. The data mining method was utilized for prediction. Female students of Shiraz Azad University developed the research community. Therefore, 150 female students were selected by simple random method and tested with a DSM-IV questionnaire. Accordingly, a random forest algorithm is proposed to generate a prediction model. Moreover, NetBeans IDE was applied for operationalization. Java was the programming language to code the prototype, and the WEKA library was involved in the operation. However, the results showed that the prediction accuracy with the random forest algorithm exceeds 0.9, which indicates that the algorithm is likely to predict GAD accurately. The random decision forest algorithm consistently predicts an individual not suffering from GAD. The results are relatively consistent compared to the baseline employed in the R. However, the random decision forest algorithm produces high predictive performance and may display significant relationships between the proposed and dependent parameters.</p> Manuscript profile
      • Open Access Article

        33 - Anomaly and Intrusion Detection Through Data Mining and Feature Selection using PSO Algorithm
        Fereidoon Rezaei Mohamad Ali Afshar Kazemi Mohammad Ali Keramati
        Today, considering technology development, increased use of Internet in businesses, and movement of business types from physical to virtual and internet, attacks and anomalies have also changed from physical to virtual. That is, instead of thieving a store or market, th More
        Today, considering technology development, increased use of Internet in businesses, and movement of business types from physical to virtual and internet, attacks and anomalies have also changed from physical to virtual. That is, instead of thieving a store or market, the individuals intrude the websites and virtual markets through cyberattacks and disrupt them. Detection of attacks and anomalies is one of the new challenges in promoting e-commerce technologies. Detecting anomalies of a network and the process of detecting destructive activities in e-commerce can be executed by analyzing the behavior of network traffic. Data mining systems/techniques are used extensively in intrusion detection systems (IDS) in order to detect anomalies. Reducing the size/dimensions of features plays an important role in intrusion detection since detecting anomalies, which are features of network traffic with high dimensions, is a time-consuming process. Choosing suitable and accurate features influences the speed of the proposed task/work analysis, resulting in an improved speed of detection. In this article, by using data mining algorithms such as Bayesian, Multilayer Perceptron, CFS, Best First, J48 and PSO, we were able to increase the accuracy of detecting anomalies and attacks to 0.996 and the error rate to 0.004. Manuscript profile