• List of Articles Data Mining

      • Open Access Article

        1 - A method for clustering customers using RFM model and grey numbers in terms of uncertainty
        azime mozafari
        The purpose of this study is presentation a method for clustering bank customers based on RFM model in terms of uncertainty. According to the proposed framework in this study after determination the parameter values of the RFM model, including recently exchange (R), fre More
        The purpose of this study is presentation a method for clustering bank customers based on RFM model in terms of uncertainty. According to the proposed framework in this study after determination the parameter values of the RFM model, including recently exchange (R), frequency exchange (F), and monetary value of the exchange (M), grey theory is used to eliminate the uncertainty and customers are segmented using a different approach. Thus, bank customers are clustered to three main segments called good, ordinary and bad customers. After cluster validation using Dunn index and Davis Bouldin index, properties of customers are detected in any of the segments. Finally, recommendations are offered to improve customer relationship management system. Manuscript profile
      • Open Access Article

        2 - Survey different aspects of the problem phishing website detection and Review to existing Methods
        nafise langari
        One of the latest security threats in cyberspace to steal personal and financial information is created by phisher. Due to there Are various methods to detect phishing and also there is not an up-date comprehensive study on the issue, the authors Motivated to review and More
        One of the latest security threats in cyberspace to steal personal and financial information is created by phisher. Due to there Are various methods to detect phishing and also there is not an up-date comprehensive study on the issue, the authors Motivated to review and analysis the proposed phishing detection methods in five categories such as: anti-phishing tools Based, data mining based, heuristic based, meta-heuristic based and machine learning based methods. The advantages and Disadvantages of each method are extracted from the current review and comparison. The outlines of this study can be suitable to identify the probability gaps in phishing detection problems for feature researches. Manuscript profile
      • Open Access Article

        3 - Integrating data envelopment analysis and decision tree models In order to evaluate information technology-based units
        Amir Amini
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        4 - The study of the accuracy of real estate experts' evaluations using a data mining model (Case study of Mellat Bank)
        fatemeh davar
        As the main part of the financial system, banks always face different risks, the most important of which is the credit scoring risk and property valuation. One of the issues faced by property valuation experts is how to evaluate property prices. In general, court expert More
        As the main part of the financial system, banks always face different risks, the most important of which is the credit scoring risk and property valuation. One of the issues faced by property valuation experts is how to evaluate property prices. In general, court experts assess real estate based on price indices. In this research, the researcher aimed to verify the accuracy of valuation experts by using data mining models. This action has been taken to help bank managers and audit reporters to make better decisions about experts and their valuations. Using property valuation indexes and data mining, a predictive model has been developed to predict property prices, and a combination of FCM and K-NN algorithms has been used to achieve a high performance prediction model. This measure was able to greatly increase the predictive accuracy and increase the efficiency of the proposed model. The accuracy level in predicting valuated prices was 84.21% and the RMSE rate in its forecast was 0.43. The proposed approach was tested on real estate valuation data of the Mellat Bank. Manuscript profile
      • Open Access Article

        5 - Analyzing the impact of macroeconomic variables on customer churn banking industry With data mining approach
        Mehrnaz Motahari nia
        Today, customer knowledge and understanding of its needs have become a business imperative. Organizations need customer satisfaction to sustain their business and succeed in a competitive market. Knowing customers through customer behavior analysis is possible with the More
        Today, customer knowledge and understanding of its needs have become a business imperative. Organizations need customer satisfaction to sustain their business and succeed in a competitive market. Knowing customers through customer behavior analysis is possible with the use of new technologies such as data mining techniques for organizations. The purpose of this research is to investigate the effective factors on Customers churn in the banking industry. For this purpose, the transaction data of sales terminals of a payment service provider company (PSP) in Iran has been analyzed. In the proposed model using the WRFM method and combining it with the K-Means clustering algorithm, sales terminals are split and loyalty each month. Then, using the additive selection method plus L take R and the multivariate linear regression algorithm, the effective features The percentage of customers discarded is selected from the monthly economic indicators per month. Based on the results of the implementation of the three variables, the index of stock market value, inflation and the price of all coins are the most effective variables among the economic indicators under study. Manuscript profile
      • Open Access Article

        6 - Integrating Data Envelopment Analysis and Decision Tree Models in Order to Evaluate Information Technology-Based Units
        Amir Amini ali alinezhad somaye shafaghizade
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        7 - Provide a method for customer segmentation using the RFM model in conditions of uncertainty
        mohammadreza gholamian azime mozafari
        The purpose of this study is to provide a method for customer segmentation of a private bank in Shiraz based on the RFM model in the face of uncertainty about customer data. In the proposed framework of this study, first, the values ​​of RFM model indicators including e More
        The purpose of this study is to provide a method for customer segmentation of a private bank in Shiraz based on the RFM model in the face of uncertainty about customer data. In the proposed framework of this study, first, the values ​​of RFM model indicators including exchange novelty (R), number of exchanges (F) and monetary value of exchange (M) were extracted from the customer database and preprocessed. Given the breadth of the data, it is not possible to determine the exact number to determine whether the customer is good or bad; Therefore, to eliminate this uncertainty, the gray number theory was used, which considers the customer's situation as a range. In this way, using a different method, the bank's customers were segmented, which according to the results, customers were divided into three main sections or clusters as good, normal and bad customers. After validating the clusters using Don and Davis Boldin indicators, customer characteristics in each sector were identified and at the end, suggestions were made to improve the customer relationship management system. Manuscript profile
      • Open Access Article

        8 - An Improved Method for Detecting Phishing Websites Using Data Mining on Web Pages
        mahdiye baharloo Alireza Yari
        Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is More
        Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is regarded as one of the important prerequisites in designing an accurate detection system. Therefore, in order to detect phishing features, a list of 30 features suggested by phishing websites was first prepared. Then, a two-stage feature reduction method based on feature selection and extraction were proposed to enhance the efficiency of phishing detection systems, which was able to reduce the number of features significantly. Finally, the performance of decision tree J48, random forest, naïve Bayes methods were evaluated{cke_protected_1}{cke_protected_2}{cke_protected_3}{cke_protected_4} on the reduced features. The results indicated that accuracy of the model created to determine the phishing websites by using the two-stage feature reduction based Wrapper and Principal Component Analysis (PCA) algorithm in the random forest method of 96.58%, which is a desirable outcome compared to other methods. Manuscript profile
      • Open Access Article

        9 - Preserving Data Clustering with Expectation Maximization Algorithm
        Leila Jafar Tafreshi Farzin Yaghmaee
        Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and informatio More
        Data mining and knowledge discovery are important technologies for business and research. Despite their benefits in various areas such as marketing, business and medical analysis, the use of data mining techniques can also result in new threats to privacy and information security. Therefore, a new class of data mining methods called privacy preserving data mining (PPDM) has been developed. The aim of researches in this field is to develop techniques those could be applied to databases without violating the privacy of individuals. In this work we introduce a new approach to preserve sensitive information in databases with both numerical and categorical attributes using fuzzy logic. We map a database into a new one that conceals private information while preserving mining benefits. In our proposed method, we use fuzzy membership functions (MFs) such as Gaussian, P-shaped, Sigmoid, S-shaped and Z-shaped for private data. Then we cluster modified datasets by Expectation Maximization (EM) algorithm. Our experimental results show that using fuzzy logic for preserving data privacy guarantees valid data clustering results while protecting sensitive information. The accuracy of the clustering algorithm using fuzzy data is approximately equivalent to original data and is better than the state of the art methods in this field. Manuscript profile
      • Open Access Article

        10 - A RFMV Model and Customer Segmentation Based on Variety of Products
        Saman  Qadaki Moghaddam Neda Abdolvand Saeedeh Rajaee Harandi
        Today, increased competition between organizations has led them to seek a better understanding of customer behavior through innovative ways of storing and analyzing their information. Moreover, the emergence of new computing technologies has brought about major change More
        Today, increased competition between organizations has led them to seek a better understanding of customer behavior through innovative ways of storing and analyzing their information. Moreover, the emergence of new computing technologies has brought about major changes in the ability of organizations to collect, store and analyze macro-data. Therefore, over thousands of data can be stored for each customer. Hence, customer satisfaction is one of the most important organizational goals. Since all customers do not represent the same profitability to an organization, understanding and identifying the valuable customers has become the most important organizational challenge. Thus, understanding customers’ behavioral variables and categorizing customers based on these characteristics could provide better insight that will help business owners and industries to adopt appropriate marketing strategies such as up-selling and cross-selling. The use of these strategies is based on a fundamental variable, variety of products. Diversity in individual consumption may lead to increased demand for variety of products; therefore, variety of products can be used, along with other behavioral variables, to better understand and categorize customers’ behavior. Given the importance of the variety of products as one of the main parameters of assessing customer behavior, studying this factor in the field of business-to-business (B2B) communication represents a vital new approach. Hence, this study aims to cluster customers based on a developed RFM model, namely RFMV, by adding a variable of variety of products (V). Therefore, CRISP-DM and K-means algorithm was used for clustering. The results of the study indicated that the variable V, variety of products, is effective in calculating customers’ value. Moreover, the results indicated the better customers clustering and valuation by using the RFMV model. As a whole, the results of modeling indicate that the variety of products along with other behavioral variables provide more accurate clustering than RFM model. Manuscript profile
      • Open Access Article

        11 - The Development of a Hybrid Error Feedback Model for Sales Forecasting
        Mehdi Farrokhbakht Foumani Sajad Moazami Goudarzi
        Sales forecasting is one of the significant issues in the industrial and service sector which can lead to facilitated management decisions and reduce the lost values in case of being dealt with properly. Also sales forecasting is one of the complicated problems in analy More
        Sales forecasting is one of the significant issues in the industrial and service sector which can lead to facilitated management decisions and reduce the lost values in case of being dealt with properly. Also sales forecasting is one of the complicated problems in analyzing time series and data mining due to the number of intervening parameters. Various models were presented on this issue and each one found acceptable results. However, developing the methods in this study is still considered by researchers. In this regard, the present study provided a hybrid model with error feedback for sales forecasting. In this study, forecasting was conducted using a supervised learning method. Then, the remaining values (model error) were specified and the error values were forecasted using another learning method. Finally, two trained models were combined together and consecutively used for sales forecasting. In other words, first the forecasting was conducted and then the error rate was determined by the second model. The total forecasting and model error indicated the final forecasting. The computational results obtained from numerical experiments indicated the superiority of the proposed hybrid method performance over the common models in the available literature and reduced the indicators related to forecasting error. Manuscript profile
      • Open Access Article

        12 - Presenting the model for opinion mining at the document feature level for hotel users' reviews
        ELHAM KHALAJJ shahriyar mohammadi
        Nowadays, online review of user’s sentiments and opinions on the Internet is an important part of the process of people deciding whether to choose a product or use the services provided. Despite the Internet platform and easy access to blogs related to opinions in the More
        Nowadays, online review of user’s sentiments and opinions on the Internet is an important part of the process of people deciding whether to choose a product or use the services provided. Despite the Internet platform and easy access to blogs related to opinions in the field of tourism and hotel industry, there are huge and rich sources of ideas in the form of text that people can use text mining methods to discover the opinions of. Due to the importance of user's sentiments and opinions in the industry, especially in the tourism and hotel industry, the topics of opinion research and analysis of emotions and exploration of texts written by users have been considered by those in charge. In this research, a new and combined method based on a common approach in sentiment analysis, the use of words to produce characteristics for classifying reviews is presented. Thus, the development of two methods of vocabulary construction, one using statistical methods and the other using genetic algorithm is presented. The above words are combined with the Vocabulary of public feeling and standard Liu Bing classification of prominent words to increase the accuracy of classification Manuscript profile
      • Open Access Article

        13 - decision support system design by using data mining tools (CASE STUDY Cultural Assistance of University of Science and Technology)
        Rouzbeh Ghousi emad chizari hani vahdani
        decision-making,is the most important duty of managers. In the new era , decision-making process has many difficulties and delicacies; so that decision-making without the use of new technologies and information analysis , the objectives will not be achieved as desired. More
        decision-making,is the most important duty of managers. In the new era , decision-making process has many difficulties and delicacies; so that decision-making without the use of new technologies and information analysis , the objectives will not be achieved as desired. Efficient management in addition to knowledge & experience management, needs to know how to use information systems. decision support system(DSS) is one of these systems that support decision-making process for managers. In this paper , at first we review the literature of decision support systems; then data mining as a tool to extract information and knowledge from organizational raw data is introduced. This extracted knowledge, may contain concepts and informations that are neglected in organization up to now, so this knowledge can help managers in Decision-making process. Eventually, The findings of this study has been used to help managers and vicars in their decisions at Iran University of Science and Technology(IUST). Manuscript profile
      • Open Access Article

        14 - Proposing a Density-Based Clustering Algorithm with Ability to Discover Multi-Density Clusters in Spatial Databases
        A. Zadedehbalaei A. Bagheri H.  Afshar
        Clustering is one of the important techniques for knowledge discovery in spatial databases. density-based clustering algorithms are one of the main clustering methods in data mining. DBSCAN which is the base of density-based clustering algorithms, besides its benefits s More
        Clustering is one of the important techniques for knowledge discovery in spatial databases. density-based clustering algorithms are one of the main clustering methods in data mining. DBSCAN which is the base of density-based clustering algorithms, besides its benefits suffers from some issues such as difficulty in determining appropriate values for input parameters and inability to detect clusters with different densities. In this paper, we introduce a new clustering algorithm which unlike DBSCAN algorithm, can detect clusters with different densities. This algorithm also detects nested clusters and clusters sticking together. The idea of the proposed algorithm is as follows. First, we detect the different densities of the dataset by using a technique and Eps parameter is computed for each density. Then DBSCAN algorithm is adapted with the computed parameters to apply on the dataset. The experimental results which are obtained by running the suggested algorithm on standard and synthetic datasets by using well-known clustering assessment criteria are compared to the results of DBSCAN algorithm and some of its variants including VDBSCAN, VMDBSCAN, LDBSCAN, DVBSCAN and MDDBSCAN. All these algorithms have been introduced to solve the problem of multi-density data sets. The results show that the suggested algorithm has higher accuracy and lower error rate in comparison to the other algorithms. Manuscript profile
      • Open Access Article

        15 - Assessment of Demand Side Resources Potential in Presence of Cooling and Heating Equipment Using Data Mining Method Based Upon K-Means Clustering Algorithm
        fatemeh sheibani M. Mollahassani-pour هنگامه کشاورز
        Under the smart power systems, determining the amount of Demand Response Resources(DRRs) potential is considered as a crucial issue due to affecting in all energy policy decisions. In this paper, the potential of DRRs in presence of cooling and heating equipment are ide More
        Under the smart power systems, determining the amount of Demand Response Resources(DRRs) potential is considered as a crucial issue due to affecting in all energy policy decisions. In this paper, the potential of DRRs in presence of cooling and heating equipment are identified using k-means clustering algorithm as a data mining technique. In this regard, the energy consumption dataset are categorized in different clusters by k-means algorithm based upon variations of energy price and ambient temperature during peak hours of hot (Spring and Summer) and cold (Autumn and Winter) periods. Then, the clusters with the possibility of cooling and heating equipment’s commitment are selected. After that, the confidence interval diagram of energy consumption in elected clusters is provided based upon energy price variations. The nominal potential of DRRs, i.e. flexible load, will be obtained regarding the maximum and minimum differences between the average of energy consumption in upper and middle thresholds of the confidence interval diagram. The energy consumption, ambient temperature and energy price related to BOSTON electricity network over a six-year horizon time is utilized to evaluate the proposed model. Manuscript profile
      • Open Access Article

        16 - Construction of Scalable Decision Tree Based on Fast Data Partitioning and Pre-Pruning
        سميه لطفي Mohammad Ghasemzadeh Mehran Mohsenzadeh Mitra Mirzarezaee
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing wit More
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing with huge amounts of data, the obtained decision tree would grow in size and complexity, and therefore require excessive running time. Almost all of the tree-construction algorithms need to store all or part of the training data set; but those algorithms which do not face memory shortages because of selecting a subset of data, can save the extra time for data selection. In order to select the best feature to create a branch in the tree, a lot of calculations are required. In this paper we presents an incremental scalable approach based on fast partitioning and pruning; The proposed algorithm builds the decision tree via using the entire training data set but it doesn't require to store the whole data in the main memory. The pre-pruning method has also been used to reduce the complexity of the tree. The experimental results on the UCI data set show that the proposed algorithm, in addition to preserving the competitive accuracy and construction time, could conquer the mentioned disadvantages of former methods. Manuscript profile
      • Open Access Article

        17 - Combination of Instance Selection and Data Augmentation Techniques for Imbalanced Data Classification
        Parastoo Mohaghegh Samira Noferesti Mehri Rajaei
        Mohaghegh, S. Noferesti*, and M. Rajaei Abstract: In the era of big data, automatic data analysis techniques such as data mining have been widely used for decision-making and have become very effective. Among data mining techniques, classification is a common method fo More
        Mohaghegh, S. Noferesti*, and M. Rajaei Abstract: In the era of big data, automatic data analysis techniques such as data mining have been widely used for decision-making and have become very effective. Among data mining techniques, classification is a common method for decision making and prediction. Classification algorithms usually work well on balanced datasets. However, one of the challenges of the classification algorithms is how to correctly predicting the label of new samples based on learning on imbalanced datasets. In this type of dataset, the heterogeneous distribution of the data in different classes causes examples of the minority class to be ignored in the learning process, while this class is more important in some prediction problems. To deal with this issue, in this paper, an efficient method for balancing the imbalanced dataset is presented, which improves the accuracy of the machine learning algorithms to correct prediction of the class label of new samples. According to the evaluations, the proposed method has a better performance compared to other methods based on two common criteria in evaluating the classification of imbalanced datasets, namely "Balanced Accuracy" and "Specificity". Manuscript profile
      • Open Access Article

        18 - Presenting a web recommender system for user nose pages using DBSCAN clustering algorithm and machine learning SVM method.
        reza molaee fard Mohammad mosleh
        Recommender systems can predict future user requests and then generate a list of the user's favorite pages. In other words, recommender systems can obtain an accurate profile of users' behavior and predict the page that the user will choose in the next move, which can s More
        Recommender systems can predict future user requests and then generate a list of the user's favorite pages. In other words, recommender systems can obtain an accurate profile of users' behavior and predict the page that the user will choose in the next move, which can solve the problem of the cold start of the system and improve the quality of the search. In this research, a new method is presented in order to improve recommender systems in the field of the web, which uses the DBSCAN clustering algorithm to cluster data, and this algorithm obtained an efficiency score of 99%. Then, using the Page rank algorithm, the user's favorite pages are weighted. Then, using the SVM method, we categorize the data and give the user a combined recommender system to generate predictions, and finally, this recommender system will provide the user with a list of pages that may be of interest to the user. The evaluation of the results of the research indicated that the use of this proposed method can achieve a score of 95% in the recall section and a score of 99% in the accuracy section, which proves that this recommender system can reach more than 90%. It detects the user's intended pages correctly and solves the weaknesses of other previous systems to a large extent. Manuscript profile
      • Open Access Article

        19 - Anomaly and Intrusion Detection Through Data Mining and Feature Selection using PSO Algorithm
        Fereidoon Rezaei Mohamad Ali Afshar Kazemi Mohammad Ali Keramati
        Today, considering technology development, increased use of Internet in businesses, and movement of business types from physical to virtual and internet, attacks and anomalies have also changed from physical to virtual. That is, instead of thieving a store or market, th More
        Today, considering technology development, increased use of Internet in businesses, and movement of business types from physical to virtual and internet, attacks and anomalies have also changed from physical to virtual. That is, instead of thieving a store or market, the individuals intrude the websites and virtual markets through cyberattacks and disrupt them. Detection of attacks and anomalies is one of the new challenges in promoting e-commerce technologies. Detecting anomalies of a network and the process of detecting destructive activities in e-commerce can be executed by analyzing the behavior of network traffic. Data mining systems/techniques are used extensively in intrusion detection systems (IDS) in order to detect anomalies. Reducing the size/dimensions of features plays an important role in intrusion detection since detecting anomalies, which are features of network traffic with high dimensions, is a time-consuming process. Choosing suitable and accurate features influences the speed of the proposed task/work analysis, resulting in an improved speed of detection. In this article, by using data mining algorithms such as Bayesian, Multilayer Perceptron, CFS, Best First, J48 and PSO, we were able to increase the accuracy of detecting anomalies and attacks to 0.996 and the error rate to 0.004. Manuscript profile