• Home
  • decision tree
    • List of Articles decision tree

      • Open Access Article

        1 - Integrating data envelopment analysis and decision tree models In order to evaluate information technology-based units
        Amir Amini
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        2 - Integrating Data Envelopment Analysis and Decision Tree Models in Order to Evaluate Information Technology-Based Units
        Amir Amini ali alinezhad somaye shafaghizade
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        3 - Membrane Cholesterol Prediction from Human Receptor using Rough Set based Mean-Shift Approach
        Rudra Kalyan Nayak Ramamani  Tripathy Hitesh  Mohapatra Amiya  Kumar Rath Debahuti  Mishra
        In human physiology, cholesterol plays an imperative part in membrane cells which regulates the function of G-protein-coupled receptors (GPCR) family. Cholesterol is an individual type of lipid structure and about 90 percent of cellular cholesterol is present at plasma More
        In human physiology, cholesterol plays an imperative part in membrane cells which regulates the function of G-protein-coupled receptors (GPCR) family. Cholesterol is an individual type of lipid structure and about 90 percent of cellular cholesterol is present at plasma membrane region. Cholesterol Recognition/interaction Amino acid Consensus (CRAC) sequence, generally referred as the CRAC (L/V)-X1−5-(Y)-X1−5-(K/R) and the new cholesterol-binding domain is similar to the CRAC sequence, but exhibits the inverse orientation along the polypeptide chain i.e. CARC (K/R)-X1−5-(Y/F)-X1−5-(L/V). GPCR is treated as a biggest super family in human physiology and probably more than 900 protein genes included in this family. Among all membrane proteins GPCR is responsible for novel drug discovery in all pharmaceuticals industry. In earlier researches the researchers did not find the required number of valid motifs in terms of helices and motif types so they were lacking clinical relevance. The research gap here is that they were not able to predict the motifs effectively which are belonging to multiple motif types. To find out better motif sequences from human GPCR, we explored a hybrid computational model consisting of hybridization of Rough Set with Mean-Shift algorithm. In this paper we made comparison among our resulted output with other techniques such as fuzzy C-means (FCM), FCM with spectral clustering and we concluded that our proposed method targeted well on CRAC region in comparison to CARC region which have higher biological relevance in medicine industry and drug discovery. Manuscript profile
      • Open Access Article

        4 - Construction of Scalable Decision Tree Based on Fast Data Partitioning and Pre-Pruning
        سميه لطفي Mohammad Ghasemzadeh Mehran Mohsenzadeh Mitra Mirzarezaee
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing wit More
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing with huge amounts of data, the obtained decision tree would grow in size and complexity, and therefore require excessive running time. Almost all of the tree-construction algorithms need to store all or part of the training data set; but those algorithms which do not face memory shortages because of selecting a subset of data, can save the extra time for data selection. In order to select the best feature to create a branch in the tree, a lot of calculations are required. In this paper we presents an incremental scalable approach based on fast partitioning and pruning; The proposed algorithm builds the decision tree via using the entire training data set but it doesn't require to store the whole data in the main memory. The pre-pruning method has also been used to reduce the complexity of the tree. The experimental results on the UCI data set show that the proposed algorithm, in addition to preserving the competitive accuracy and construction time, could conquer the mentioned disadvantages of former methods. Manuscript profile
      • Open Access Article

        5 - An Approximate Binary Tree-Based Solution to Speed Up the Search for the Nearest Neighbor in Big Data
        Hosein Kalateh M. D.
        Due to the increasing speed of information production and the need to convert information into knowledge, old machine learning methods are no longer responsive. When using classifications with the old machine learning methods, especially the use of inherently lazy class More
        Due to the increasing speed of information production and the need to convert information into knowledge, old machine learning methods are no longer responsive. When using classifications with the old machine learning methods, especially the use of inherently lazy classifications such as the k-nearest neighbor (KNN) method, the operation of classifying large data sets is very slow. Nearest Neighborhood is a popular method of data classification due to its simplicity and practical accuracy. The proposed method is based on sorting the training data feature vectors in a binary search tree to expedite the classification of big data using the nearest neighbor method. This is done by finding the approximate two farthest local data in each tree node. These two data are used as a criterion for dividing the data in the current node into two groups. The data set in each node is assigned to the left and right child of the current node based on their similarity to the two data. The results of several experiments performed on different data sets from the UCI repository show a good degree of accuracy due to the low execution time of the proposed method. Manuscript profile
      • Open Access Article

        6 - Determining Geological, Environmental and Economical Impact Weight for Oil Field Prioritization to Implement Smart Well Technology
        تورج بهروز سید مهدیا مطهری مهدی ندری پری
        Deep oil reservoirs with high heterogeneity need thorough management to maximize production and recovery along with minimizing OPEX and CAPEX. This management is integration between technology, human resource and processes. Smart Well technology helps oil companies t More
        Deep oil reservoirs with high heterogeneity need thorough management to maximize production and recovery along with minimizing OPEX and CAPEX. This management is integration between technology, human resource and processes. Smart Well technology helps oil companies to meet aforementioned goals. Since smart well technology imposes high initial expenditure it is a risky and costly decision for oil companies to apply it for all companies. Indeed, this fact dictates prioritization of oil fields based on several parameters to decide where this technology should be implemented first. In this paper we present a novel screening technique under Analytical Hierarchy Process (AHP) engine. This technique needs criteria and sub-criteria affecting smart well potential of fields such as Geological, Geographical, Environmental and Economical parameters. In this study, the main components of the four main mentioned parameters have been extracted. All of them weighted according to our objective function. The result of this research would be impact weight of each parameter with respect to each other that can be used an engineering box for making decision among several fields for implementing smart well technology. Manuscript profile
      • Open Access Article

        7 - Wide Area out of Step Prediction of Interconnected Power System Using Decision Tree C5.0 Based on WAMS Data
        Soheil Ranjbar
        This paper presents a new method for Out-of-Step detection in synchronous generators based on Decision Tree theory. For distinguishing between power swing and out-of-step conditions a series of input features are introduced and used for decision tree training. For gener More
        This paper presents a new method for Out-of-Step detection in synchronous generators based on Decision Tree theory. For distinguishing between power swing and out-of-step conditions a series of input features are introduced and used for decision tree training. For generating input training samples, a series of measurements are taken under various faults including operational and topological disturbances. The proposed method is simulated over 10 machines 39-bus IEEE test system and the simulation results are prepared as input-output pairs for decision tree induction and deduction. The merit of proposed out-of-step protection scheme lies in adaptivity and robustness of input features under different input scenarios Manuscript profile
      • Open Access Article

        8 - Automatic Lung Diseases Identification using Discrete Cosine Transform-based Features in Radiography Images
        Shamim Yousefi Samad Najjar-Ghabel
        The use of raw radiography results in lung disease identification has not acceptable performance. Machine learning can help identify diseases more accurately. Extensive studies were performed in classical and deep learning-based disease identification, but these methods More
        The use of raw radiography results in lung disease identification has not acceptable performance. Machine learning can help identify diseases more accurately. Extensive studies were performed in classical and deep learning-based disease identification, but these methods do not have acceptable accuracy and efficiency or require high learning data. In this paper, a new method is presented for automatic interstitial lung disease identification on radiography images to address these challenges. In the first step, patient information is removed from the images; the remaining pixels are standardized for more precise processing. In the second step, the reliability of the proposed method is improved by Radon transform, extra data is removed using the Top-hat filter, and the detection rate is increased by Discrete Wavelet Transform and Discrete Cosine Transform. Then, the number of final features is reduced with Locality Sensitive Discriminant Analysis. The processed images are divided into learning and test categories in the third step to create different models using learning data. Finally, the best model is selected using test data. Simulation results on the NIH dataset show that the decision tree provides the most accurate model by improving the harmonic mean of sensitivity and accuracy by up to 1.09times compared to similar approaches. Manuscript profile
      • Open Access Article

        9 - Design and implementation of a survival model for patients with melanoma based on data mining algorithms
        farinaz sanaei Seyed Abdollah  Amin Mousavi Abbas Toloie Eshlaghy ali rajabzadeh ghotri
        Background/Purpose: Among the most commonly diagnosed cancers, melanoma is the second leading cause of cancer-related death. A growing number of people are becoming victims of melanoma. Melanoma is also the most malignant and rare form of skin cancer. Advanced cases of More
        Background/Purpose: Among the most commonly diagnosed cancers, melanoma is the second leading cause of cancer-related death. A growing number of people are becoming victims of melanoma. Melanoma is also the most malignant and rare form of skin cancer. Advanced cases of the disease may cause death due to the spread of the disease to internal organs. The National Cancer Institute reported that approximately 99,780 people were diagnosed with melanoma in 2022, and approximately 7,650 died. Therefore, this study aims to develop an optimization algorithm for predicting melanoma patients' survival. Methodology: This applied research was a descriptive-analytical and retrospective study. The study population included patients with melanoma cancer identified from the National Cancer Research Center at Shahid Beheshti University between 2008 and 2013, with a follow-up period of five years. An optimization model was selected for melanoma survival prognosis based on the evaluation metrics of data mining algorithms. Findings: A neural network algorithm, a Naïve Bayes network, a Bayesian network, a combination of decision tree and Naïve Bayes network, logistic regression, J48, and ID3 were selected as the models used in the national database. Statistically, the studied neural network outperformed other selected algorithms in all evaluation metrics. Conclusion: The results of the present study showed that the neural network with a value of 0.97 has optimal performance in terms of reliability. Therefore, the predictive model of melanoma survival showed a better performance both in terms of discrimination power and reliability. Therefore, this algorithm was proposed as a melanoma survival prediction model. Manuscript profile