• Home
  • داده‌کاوی
    • List of Articles داده‌کاوی

      • Open Access Article

        1 - Applying data mining techniques to regions segmentation for entrance exams to governmental universities
        نرجس سرعتی آَشتیانی somayyeh alizadeh علی  مبصّـری
        The large numbers of Iranian high school graduates are willing to enter in governmental and popular colleges and compete for it. On the other hand, these graduate students are from various regions with different levels of access to facilities. In opinion of directors of More
        The large numbers of Iranian high school graduates are willing to enter in governmental and popular colleges and compete for it. On the other hand, these graduate students are from various regions with different levels of access to facilities. In opinion of directors of relevant agencies, the quota allocation solves this problem and they are looking to use the knowledge hidden in the data are available in this area.By this way volunteers from each region are compared together and managers are helped to allocate proper quota to related students in regions of each segment. In recent years, quota allocation was determined by Taxonomy that its result is a kind of ranking that does not allow group analyzing and identifies number of region theoretically. To solve this problem clustering is a good strategy. This study is carried out by using data mining techniques and Crisp methods on related dataset from education ministry, interior ministry, ministry of health, and center of statistic and evaluation organization for the first time. After extracting of effective attributes in this area, data preparation, data reduction and combination of attributes using Factor Analysis have done.in next step, by using K-means algorithm, similar items assign in to a cluster that has the minimum distance with centroid mean and then by using neural networks and decision trees, new item can be devoted to each cluster. Finally for assessing created models, accuracy of outputs compared with other methods. Outcomes of this research are: determining the optimal number of sectors, segmenting regions, analyzing each section, extracting decision rules, predicting class labels for new areas faster and more accurately, allowing the appropriate strategies formulation for each section Manuscript profile
      • Open Access Article

        2 - A method for clustering customers using RFM model and grey numbers in terms of uncertainty
        azime mozafari
        The purpose of this study is presentation a method for clustering bank customers based on RFM model in terms of uncertainty. According to the proposed framework in this study after determination the parameter values of the RFM model, including recently exchange (R), fre More
        The purpose of this study is presentation a method for clustering bank customers based on RFM model in terms of uncertainty. According to the proposed framework in this study after determination the parameter values of the RFM model, including recently exchange (R), frequency exchange (F), and monetary value of the exchange (M), grey theory is used to eliminate the uncertainty and customers are segmented using a different approach. Thus, bank customers are clustered to three main segments called good, ordinary and bad customers. After cluster validation using Dunn index and Davis Bouldin index, properties of customers are detected in any of the segments. Finally, recommendations are offered to improve customer relationship management system. Manuscript profile
      • Open Access Article

        3 - Proposing a Model for Extracting Information from Textual Documents, Based on Text Mining in E-learning
        Somayeh Ahari
        As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that disco More
        As computer networks become the backbones of science and economy, enormous quantities documents become available. So, for extracting useful information from textual data, text mining techniques have been used. Text Mining has become an important research area that discoveries unknown information, facts or new hypotheses by automatically extracting information from different written documents. Text mining aims at disclosing the concealed information by means of methods which on the one hand are able to cope with the large number of words and structures in natural language and on the other hand allow handling vagueness, uncertainty and fuzziness. Text mining, referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text that high-quality information is typically derived through the patterns and processes. Moreover, text mining, also known as text data mining or knowledge discovery from textual databases, refers to the process of extracting patterns or knowledge from text documents. In this research, a survey of text mining techniques and applications in e-learning has been presented. During these studies, relevant researches in the field of e-learning were classified. After classification of researches, related problems and solutions were extracted. In this paper, first, definition of text mining is presented. Then, the process of text mining and its applications in e-learning domain are described. Furthermore, text mining techniques are introduced, and each of these methods in the field of e-learning is considered. Finally, a model for the information extraction by text mining techniques in e-learning domain is proposed. Manuscript profile
      • Open Access Article

        4 - Discovering spam in Facebook social network using data mining.
        amin nazari
        In recent years, by developing new technologies and communication facilities such as internet, new aspects named virtual social networks have been created. Rapid development of social networks and huge number of anonymous Users in these networks, created a suitable en More
        In recent years, by developing new technologies and communication facilities such as internet, new aspects named virtual social networks have been created. Rapid development of social networks and huge number of anonymous Users in these networks, created a suitable environment for scammers. Most of the times, scammers are trying to spread several types of spams into these high potential places. Hence, an effective method is required to detect the spams in order to increase the level of information security of people in the social networks. In this paper, a new method for discovering spammer in Facebook social network is proposed. Findings show 99.96% accuracy. In previous papers, users were divided into two groups of ordinary users and spammer users. The method of classification in these papers recognizes also as a spam the users which attached by spammer. So, in this paper by dividing users into three types of ordinary users, spammer and users attached by spammer, accuracy of spam detection has been increased. Manuscript profile
      • Open Access Article

        5 - The study of the accuracy of real estate experts' evaluations using a data mining model (Case study of Mellat Bank)
        fatemeh davar
        As the main part of the financial system, banks always face different risks, the most important of which is the credit scoring risk and property valuation. One of the issues faced by property valuation experts is how to evaluate property prices. In general, court expert More
        As the main part of the financial system, banks always face different risks, the most important of which is the credit scoring risk and property valuation. One of the issues faced by property valuation experts is how to evaluate property prices. In general, court experts assess real estate based on price indices. In this research, the researcher aimed to verify the accuracy of valuation experts by using data mining models. This action has been taken to help bank managers and audit reporters to make better decisions about experts and their valuations. Using property valuation indexes and data mining, a predictive model has been developed to predict property prices, and a combination of FCM and K-NN algorithms has been used to achieve a high performance prediction model. This measure was able to greatly increase the predictive accuracy and increase the efficiency of the proposed model. The accuracy level in predicting valuated prices was 84.21% and the RMSE rate in its forecast was 0.43. The proposed approach was tested on real estate valuation data of the Mellat Bank. Manuscript profile
      • Open Access Article

        6 - Analyzing the impact of macroeconomic variables on customer churn banking industry With data mining approach
        Mehrnaz Motahari nia
        Today, customer knowledge and understanding of its needs have become a business imperative. Organizations need customer satisfaction to sustain their business and succeed in a competitive market. Knowing customers through customer behavior analysis is possible with the More
        Today, customer knowledge and understanding of its needs have become a business imperative. Organizations need customer satisfaction to sustain their business and succeed in a competitive market. Knowing customers through customer behavior analysis is possible with the use of new technologies such as data mining techniques for organizations. The purpose of this research is to investigate the effective factors on Customers churn in the banking industry. For this purpose, the transaction data of sales terminals of a payment service provider company (PSP) in Iran has been analyzed. In the proposed model using the WRFM method and combining it with the K-Means clustering algorithm, sales terminals are split and loyalty each month. Then, using the additive selection method plus L take R and the multivariate linear regression algorithm, the effective features The percentage of customers discarded is selected from the monthly economic indicators per month. Based on the results of the implementation of the three variables, the index of stock market value, inflation and the price of all coins are the most effective variables among the economic indicators under study. Manuscript profile
      • Open Access Article

        7 - Provide a method for customer segmentation using the RFM model in conditions of uncertainty
        mohammadreza gholamian azime mozafari
        The purpose of this study is to provide a method for customer segmentation of a private bank in Shiraz based on the RFM model in the face of uncertainty about customer data. In the proposed framework of this study, first, the values ​​of RFM model indicators including e More
        The purpose of this study is to provide a method for customer segmentation of a private bank in Shiraz based on the RFM model in the face of uncertainty about customer data. In the proposed framework of this study, first, the values ​​of RFM model indicators including exchange novelty (R), number of exchanges (F) and monetary value of exchange (M) were extracted from the customer database and preprocessed. Given the breadth of the data, it is not possible to determine the exact number to determine whether the customer is good or bad; Therefore, to eliminate this uncertainty, the gray number theory was used, which considers the customer's situation as a range. In this way, using a different method, the bank's customers were segmented, which according to the results, customers were divided into three main sections or clusters as good, normal and bad customers. After validating the clusters using Don and Davis Boldin indicators, customer characteristics in each sector were identified and at the end, suggestions were made to improve the customer relationship management system. Manuscript profile
      • Open Access Article

        8 - An Improved Method for Detecting Phishing Websites Using Data Mining on Web Pages
        mahdiye baharloo Alireza Yari
        Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is More
        Phishing plays a negative role in reducing the trust among the users in the business network based on the E-commerce framework. therefore, in this research, we tried to detect phishing websites using data mining. The detection of the outstanding features of phishing is regarded as one of the important prerequisites in designing an accurate detection system. Therefore, in order to detect phishing features, a list of 30 features suggested by phishing websites was first prepared. Then, a two-stage feature reduction method based on feature selection and extraction were proposed to enhance the efficiency of phishing detection systems, which was able to reduce the number of features significantly. Finally, the performance of decision tree J48, random forest, naïve Bayes methods were evaluated{cke_protected_1}{cke_protected_2}{cke_protected_3}{cke_protected_4} on the reduced features. The results indicated that accuracy of the model created to determine the phishing websites by using the two-stage feature reduction based Wrapper and Principal Component Analysis (PCA) algorithm in the random forest method of 96.58%, which is a desirable outcome compared to other methods. Manuscript profile
      • Open Access Article

        9 - Strategic Human Resource Management in Digital Era Based on Big Data
        Gholamreza Malekzadeh sedigheh sadeghi
        Nowadays intelligent devices, virtual environments and technological innovations is a part of the human’s everyday life. While Technology innovation could easily be representing currently highest business threatens, executive leaders that transform this in Business oppo More
        Nowadays intelligent devices, virtual environments and technological innovations is a part of the human’s everyday life. While Technology innovation could easily be representing currently highest business threatens, executive leaders that transform this in Business opportunities and create a new competitive space out of this threat are the ones succeeding. On the other hand influence of information technology in organizations and besides, extension of various kinds of social media is a good opportunity to gather a mass amount of people’s information and data. Regarding to these facts one can say that creative thinking and alignment with digital age facilities, requirements and needs and respecting the value of knowledge management along with making use of information management is what should be taken under consideration with more attention specially in field of human capital management. We will discuss on HR digital literacy and HR understanding of organization mission and values effects on flexibility in digital transformation. In this article, we will discuss usage of information systems especially big data in human resource managment in digital era with respect to surveys of honored organizations such as Mc keinsy. It is deducted that in new age, regarding that a new generation of workforce with different attitude and expectations is ready to enter labor market, transformation from traditional structures to structures driven from analytical results of big data would lead to a more effective management. Manuscript profile
      • Open Access Article

        10 - decision support system design by using data mining tools (CASE STUDY Cultural Assistance of University of Science and Technology)
        Rouzbeh Ghousi emad chizari hani vahdani
        decision-making,is the most important duty of managers. In the new era , decision-making process has many difficulties and delicacies; so that decision-making without the use of new technologies and information analysis , the objectives will not be achieved as desired. More
        decision-making,is the most important duty of managers. In the new era , decision-making process has many difficulties and delicacies; so that decision-making without the use of new technologies and information analysis , the objectives will not be achieved as desired. Efficient management in addition to knowledge & experience management, needs to know how to use information systems. decision support system(DSS) is one of these systems that support decision-making process for managers. In this paper , at first we review the literature of decision support systems; then data mining as a tool to extract information and knowledge from organizational raw data is introduced. This extracted knowledge, may contain concepts and informations that are neglected in organization up to now, so this knowledge can help managers in Decision-making process. Eventually, The findings of this study has been used to help managers and vicars in their decisions at Iran University of Science and Technology(IUST). Manuscript profile
      • Open Access Article

        11 - Proposing a Density-Based Clustering Algorithm with Ability to Discover Multi-Density Clusters in Spatial Databases
        A. Zadedehbalaei A. Bagheri H.  Afshar
        Clustering is one of the important techniques for knowledge discovery in spatial databases. density-based clustering algorithms are one of the main clustering methods in data mining. DBSCAN which is the base of density-based clustering algorithms, besides its benefits s More
        Clustering is one of the important techniques for knowledge discovery in spatial databases. density-based clustering algorithms are one of the main clustering methods in data mining. DBSCAN which is the base of density-based clustering algorithms, besides its benefits suffers from some issues such as difficulty in determining appropriate values for input parameters and inability to detect clusters with different densities. In this paper, we introduce a new clustering algorithm which unlike DBSCAN algorithm, can detect clusters with different densities. This algorithm also detects nested clusters and clusters sticking together. The idea of the proposed algorithm is as follows. First, we detect the different densities of the dataset by using a technique and Eps parameter is computed for each density. Then DBSCAN algorithm is adapted with the computed parameters to apply on the dataset. The experimental results which are obtained by running the suggested algorithm on standard and synthetic datasets by using well-known clustering assessment criteria are compared to the results of DBSCAN algorithm and some of its variants including VDBSCAN, VMDBSCAN, LDBSCAN, DVBSCAN and MDDBSCAN. All these algorithms have been introduced to solve the problem of multi-density data sets. The results show that the suggested algorithm has higher accuracy and lower error rate in comparison to the other algorithms. Manuscript profile
      • Open Access Article

        12 - Attribute Reduction Based on Rough Set Theory by Soccer League Competition Algorithm
        M. Abdolrazzagh-Nezhad Ali Adibiyan
        Increasing the dimension of the databases have involved the attribute reduction as a critical issue in data mining that it searches to find a subset of attributes with the most effectiveness on the hidden patterns. In the current years, the rough set theory has been con More
        Increasing the dimension of the databases have involved the attribute reduction as a critical issue in data mining that it searches to find a subset of attributes with the most effectiveness on the hidden patterns. In the current years, the rough set theory has been considered by researchers as one of the most effective and efficient tools to the reduction. In this paper, the soccer league competition algorithm is modified and adopted to solve the attribute reduction problem for the first time. The ability to escape the local optimal, the ability to use the information distributed by players in the search space, the rapid convergence to the optimal solutions, and the low algorithm’s parameters were the motivation of considering the algorithm in the current research. The proposed ideas to modify the algorithm consist of utilizing the total power of fixed and saved players in calculating the power of each team, considering the combination of continuous and discrete structures for each player, proposing a novel discretization method, providing a hydraulic analysis appropriate to the research problem for evaluating each player, designing correction in Imitation and Provocation operators based on the challenges in their original version. The proposed ideas are performed on small, medium and large data sets from UCI and the experimental results are compared with the state-of-the-art algorithms. This comparison shows that the competitive advantages of the proposed algorithm over the investigated algorithms. Manuscript profile
      • Open Access Article

        13 - Feature Selection and Cancer Classification Based on Microarray Data Using Multi-Objective Cuckoo Search Algorithm
        kh. Kamari f. rashidi a. Khalili
        Microarray datasets have an important role in identification and classification of the cancer tissues. In cancer researches, having a few samples of microarrays in cancer researches is one of the most concerns which lead to some problems in designing the classifiers. Mo More
        Microarray datasets have an important role in identification and classification of the cancer tissues. In cancer researches, having a few samples of microarrays in cancer researches is one of the most concerns which lead to some problems in designing the classifiers. Moreover, due to the large number of features in microarrays, feature selection and classification are even more challenging for such datasets. Not all of these numerous features contribute to the classification task, and some even impede performance. Hence, appropriate gene selection method can significantly improve the performance of cancer classification. In this paper, a modified multi-objective cuckoo search algorithm is used to feature selection and sample selection to find the best available solutions. For accelerating the optimization process and preventing local optimum trapping, new heuristic approaches are included to the original algorithm. The proposed algorithm is applied on six cancer datasets and its results are compared with other existing methods. The results show that the proposed method has higher accuracy and validity in comparison to other existing approaches and is able to select the small subset of informative genes in order to increase the classification accuracy. Manuscript profile
      • Open Access Article

        14 - Assessment of Demand Side Resources Potential in Presence of Cooling and Heating Equipment Using Data Mining Method Based Upon K-Means Clustering Algorithm
        fatemeh sheibani M. Mollahassani-pour هنگامه کشاورز
        Under the smart power systems, determining the amount of Demand Response Resources(DRRs) potential is considered as a crucial issue due to affecting in all energy policy decisions. In this paper, the potential of DRRs in presence of cooling and heating equipment are ide More
        Under the smart power systems, determining the amount of Demand Response Resources(DRRs) potential is considered as a crucial issue due to affecting in all energy policy decisions. In this paper, the potential of DRRs in presence of cooling and heating equipment are identified using k-means clustering algorithm as a data mining technique. In this regard, the energy consumption dataset are categorized in different clusters by k-means algorithm based upon variations of energy price and ambient temperature during peak hours of hot (Spring and Summer) and cold (Autumn and Winter) periods. Then, the clusters with the possibility of cooling and heating equipment’s commitment are selected. After that, the confidence interval diagram of energy consumption in elected clusters is provided based upon energy price variations. The nominal potential of DRRs, i.e. flexible load, will be obtained regarding the maximum and minimum differences between the average of energy consumption in upper and middle thresholds of the confidence interval diagram. The energy consumption, ambient temperature and energy price related to BOSTON electricity network over a six-year horizon time is utilized to evaluate the proposed model. Manuscript profile
      • Open Access Article

        15 - Construction of Scalable Decision Tree Based on Fast Data Partitioning and Pre-Pruning
        سميه لطفي Mohammad Ghasemzadeh Mehran Mohsenzadeh Mitra Mirzarezaee
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing wit More
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing with huge amounts of data, the obtained decision tree would grow in size and complexity, and therefore require excessive running time. Almost all of the tree-construction algorithms need to store all or part of the training data set; but those algorithms which do not face memory shortages because of selecting a subset of data, can save the extra time for data selection. In order to select the best feature to create a branch in the tree, a lot of calculations are required. In this paper we presents an incremental scalable approach based on fast partitioning and pruning; The proposed algorithm builds the decision tree via using the entire training data set but it doesn't require to store the whole data in the main memory. The pre-pruning method has also been used to reduce the complexity of the tree. The experimental results on the UCI data set show that the proposed algorithm, in addition to preserving the competitive accuracy and construction time, could conquer the mentioned disadvantages of former methods. Manuscript profile
      • Open Access Article

        16 - Combination of Instance Selection and Data Augmentation Techniques for Imbalanced Data Classification
        Parastoo Mohaghegh Samira Noferesti Mehri Rajaei
        Mohaghegh, S. Noferesti*, and M. Rajaei Abstract: In the era of big data, automatic data analysis techniques such as data mining have been widely used for decision-making and have become very effective. Among data mining techniques, classification is a common method fo More
        Mohaghegh, S. Noferesti*, and M. Rajaei Abstract: In the era of big data, automatic data analysis techniques such as data mining have been widely used for decision-making and have become very effective. Among data mining techniques, classification is a common method for decision making and prediction. Classification algorithms usually work well on balanced datasets. However, one of the challenges of the classification algorithms is how to correctly predicting the label of new samples based on learning on imbalanced datasets. In this type of dataset, the heterogeneous distribution of the data in different classes causes examples of the minority class to be ignored in the learning process, while this class is more important in some prediction problems. To deal with this issue, in this paper, an efficient method for balancing the imbalanced dataset is presented, which improves the accuracy of the machine learning algorithms to correct prediction of the class label of new samples. According to the evaluations, the proposed method has a better performance compared to other methods based on two common criteria in evaluating the classification of imbalanced datasets, namely "Balanced Accuracy" and "Specificity". Manuscript profile
      • Open Access Article

        17 - Presenting a web recommender system for user nose pages using DBSCAN clustering algorithm and machine learning SVM method.
        reza molaee fard Mohammad mosleh
        Recommender systems can predict future user requests and then generate a list of the user's favorite pages. In other words, recommender systems can obtain an accurate profile of users' behavior and predict the page that the user will choose in the next move, which can s More
        Recommender systems can predict future user requests and then generate a list of the user's favorite pages. In other words, recommender systems can obtain an accurate profile of users' behavior and predict the page that the user will choose in the next move, which can solve the problem of the cold start of the system and improve the quality of the search. In this research, a new method is presented in order to improve recommender systems in the field of the web, which uses the DBSCAN clustering algorithm to cluster data, and this algorithm obtained an efficiency score of 99%. Then, using the Page rank algorithm, the user's favorite pages are weighted. Then, using the SVM method, we categorize the data and give the user a combined recommender system to generate predictions, and finally, this recommender system will provide the user with a list of pages that may be of interest to the user. The evaluation of the results of the research indicated that the use of this proposed method can achieve a score of 95% in the recall section and a score of 99% in the accuracy section, which proves that this recommender system can reach more than 90%. It detects the user's intended pages correctly and solves the weaknesses of other previous systems to a large extent. Manuscript profile
      • Open Access Article

        18 - Anomaly and Intrusion Detection Through Data Mining and Feature Selection using PSO Algorithm
        Fereidoon Rezaei Mohamad Ali Afshar Kazemi Mohammad Ali Keramati
        Today, considering technology development, increased use of Internet in businesses, and movement of business types from physical to virtual and internet, attacks and anomalies have also changed from physical to virtual. That is, instead of thieving a store or market, th More
        Today, considering technology development, increased use of Internet in businesses, and movement of business types from physical to virtual and internet, attacks and anomalies have also changed from physical to virtual. That is, instead of thieving a store or market, the individuals intrude the websites and virtual markets through cyberattacks and disrupt them. Detection of attacks and anomalies is one of the new challenges in promoting e-commerce technologies. Detecting anomalies of a network and the process of detecting destructive activities in e-commerce can be executed by analyzing the behavior of network traffic. Data mining systems/techniques are used extensively in intrusion detection systems (IDS) in order to detect anomalies. Reducing the size/dimensions of features plays an important role in intrusion detection since detecting anomalies, which are features of network traffic with high dimensions, is a time-consuming process. Choosing suitable and accurate features influences the speed of the proposed task/work analysis, resulting in an improved speed of detection. In this article, by using data mining algorithms such as Bayesian, Multilayer Perceptron, CFS, Best First, J48 and PSO, we were able to increase the accuracy of detecting anomalies and attacks to 0.996 and the error rate to 0.004. Manuscript profile