• Home
  • درخت تصمیم
    • List of Articles درخت تصمیم

      • Open Access Article

        1 - Classifying Two Class data using Hyper Rectangle Parallel to the Coordinate Axes
        zahra moslehi palhang palhang
        One of the machine learning tasks is supervised learning. In supervised learning we infer a function from labeled training data. The goal of supervised learning algorithms is learning a good hypothesis that minimizes the sum of the errors. A wide range of supervised alg More
        One of the machine learning tasks is supervised learning. In supervised learning we infer a function from labeled training data. The goal of supervised learning algorithms is learning a good hypothesis that minimizes the sum of the errors. A wide range of supervised algorithms is available such as decision tress, SVM, and KNN methods. In this paper we focus on decision tree algorithms. When we use the decision tree algorithms, the data is partitioned by axis- aligned hyper planes. The geometric concept of decision tree algorithms is relative to separability problems in computational geometry. One of the famous problems in separability concept is computing the maximum bichromatic discrepancy problem. There exists an -time algorithm to compute the maximum bichromatic discrepancy in d dimensions. This problem is closely relative to decision trees in machine learning. We implement this problem in 1, 2, 3 and d dimension. Also, we implement the C4.5 algorithm. The experiments showed that results of this algorithm and C4.5 algorithm are comparable. Manuscript profile
      • Open Access Article

        2 - Integrating data envelopment analysis and decision tree models In order to evaluate information technology-based units
        Amir Amini
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        3 - Integrating Data Envelopment Analysis and Decision Tree Models in Order to Evaluate Information Technology-Based Units
        Amir Amini ali alinezhad somaye shafaghizade
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        4 - ۹۳ / ۵٬۰۰۰ Integration of data envelopment analysis model and decision tree in order to evaluate units based on information technology
        Amir Amini علی رضا علی نژاد سمیه  شفقی¬زاده
        Every organization needs an evaluation system to measure this usefulness in order to know the performance and usefulness of its units, and this issue is more important for financial institutions, including companies based on information technology. Data envelopment anal More
        Every organization needs an evaluation system to measure this usefulness in order to know the performance and usefulness of its units, and this issue is more important for financial institutions, including companies based on information technology. Data envelopment analysis is a non-parametric method for measuring the efficiency and productivity of decision making units (DMUs). On the other hand, data mining techniques allow DMUs to explore and discover meaningful information, which was previously hidden in large databases. This paper proposes a general framework combining data envelopment analysis with regression trees to evaluate the efficiency and productivity of DMUs. The result of the hybrid model is a set of rules that can be used by policy makers to discover the reasons for efficient and inefficient DMUs. As a case study using the proposed method to investigate the factors related to productivity, a sample including 18 branches of Iranian insurance in Tehran was selected and after modeling based on the advanced input-oriented LVM model with poor accessibility in data coverage analysis with Undesirable output was calculated and with the decision tree technique, rules are extracted to discover the reasons for productivity increase and productivity regression. Manuscript profile
      • Open Access Article

        5 - Classification of two-level data with hyperrectangles parallel to the coordinate axes
        zahra moslehi palhang palhang
        One of the learning methods in machine learning and pattern recognition is supervised learning. In supervised learning and in two-category problems, the available educational data labels include positive and negative categories. The goal of the supervised learning algor More
        One of the learning methods in machine learning and pattern recognition is supervised learning. In supervised learning and in two-category problems, the available educational data labels include positive and negative categories. The goal of the supervised learning algorithm is to calculate a hypothesis that can separate positive and negative data with the least amount of error. In this article, among all supervised learning algorithms, we focus on the performance of decision trees. The geometric view of the decision tree brings us closer to the concept of separability in computational geometry. Among all the available resolution algorithms related to the decision tree, we raise the problem of calculating the rectangle with the maximum difference of two colors and implement the algorithm in one, two, three and m dimensions, where m represents the number of data features. The implementation result shows that this algorithm is competitive with the well-known C4.5 algorithm. Manuscript profile
      • Open Access Article

        6 - Analysis of Supervised Learners to Extract Knowledge of Lighting Angels in Face Images
        S. Naderi N. Moghadam Charkari E. Kabir
        Variation of Light intensity and its direction have been the main challenges in many face recognition systems that lead to the different normal and abnormal shadows. Today, various methods are presented for face recognition under different lighting conditions which requ More
        Variation of Light intensity and its direction have been the main challenges in many face recognition systems that lead to the different normal and abnormal shadows. Today, various methods are presented for face recognition under different lighting conditions which require previous knowledge about Light source and the angle of radiation as well. In this paper, a new approach is proposed to extract the knowledge of/about the lighting angle/direction in face images based on learning techniques. At First, some effective coefficients on lighting variation are extracted on DCT domain. They will be used to determine lighting classes after normalization. Then, three different learning algorithms, Decision tree, SVM, and WAODE (Weightily Averaged One-Dependence Estimators) are used to learn the lighting classes. The algorithms have been tested on the well known YaleB and Extended Yale face databases. The comparative results indicate that the SVM achieves the best average accuracy for classification. On the other hand, WAODE Bayesian approach attains the better accuracy in classes with large lighting angle because of its resistance against data loss. Manuscript profile
      • Open Access Article

        7 - Construction of Scalable Decision Tree Based on Fast Data Partitioning and Pre-Pruning
        سميه لطفي Mohammad Ghasemzadeh Mehran Mohsenzadeh Mitra Mirzarezaee
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing wit More
        Classification is one of the most important tasks in data mining and machine learning; and the decision tree, as one of the most widely used classification algorithms, has the advantage of simplicity and the ability to interpret results more easily. But when dealing with huge amounts of data, the obtained decision tree would grow in size and complexity, and therefore require excessive running time. Almost all of the tree-construction algorithms need to store all or part of the training data set; but those algorithms which do not face memory shortages because of selecting a subset of data, can save the extra time for data selection. In order to select the best feature to create a branch in the tree, a lot of calculations are required. In this paper we presents an incremental scalable approach based on fast partitioning and pruning; The proposed algorithm builds the decision tree via using the entire training data set but it doesn't require to store the whole data in the main memory. The pre-pruning method has also been used to reduce the complexity of the tree. The experimental results on the UCI data set show that the proposed algorithm, in addition to preserving the competitive accuracy and construction time, could conquer the mentioned disadvantages of former methods. Manuscript profile
      • Open Access Article

        8 - An Approximate Binary Tree-Based Solution to Speed Up the Search for the Nearest Neighbor in Big Data
        Hosein Kalateh M. D.
        Due to the increasing speed of information production and the need to convert information into knowledge, old machine learning methods are no longer responsive. When using classifications with the old machine learning methods, especially the use of inherently lazy class More
        Due to the increasing speed of information production and the need to convert information into knowledge, old machine learning methods are no longer responsive. When using classifications with the old machine learning methods, especially the use of inherently lazy classifications such as the k-nearest neighbor (KNN) method, the operation of classifying large data sets is very slow. Nearest Neighborhood is a popular method of data classification due to its simplicity and practical accuracy. The proposed method is based on sorting the training data feature vectors in a binary search tree to expedite the classification of big data using the nearest neighbor method. This is done by finding the approximate two farthest local data in each tree node. These two data are used as a criterion for dividing the data in the current node into two groups. The data set in each node is assigned to the left and right child of the current node based on their similarity to the two data. The results of several experiments performed on different data sets from the UCI repository show a good degree of accuracy due to the low execution time of the proposed method. Manuscript profile
      • Open Access Article

        9 - Wide Area out of Step Prediction of Interconnected Power System Using Decision Tree C5.0 Based on WAMS Data
        Soheil Ranjbar
        This paper presents a new method for Out-of-Step detection in synchronous generators based on Decision Tree theory. For distinguishing between power swing and out-of-step conditions a series of input features are introduced and used for decision tree training. For gener More
        This paper presents a new method for Out-of-Step detection in synchronous generators based on Decision Tree theory. For distinguishing between power swing and out-of-step conditions a series of input features are introduced and used for decision tree training. For generating input training samples, a series of measurements are taken under various faults including operational and topological disturbances. The proposed method is simulated over 10 machines 39-bus IEEE test system and the simulation results are prepared as input-output pairs for decision tree induction and deduction. The merit of proposed out-of-step protection scheme lies in adaptivity and robustness of input features under different input scenarios Manuscript profile
      • Open Access Article

        10 - Automatic Lung Diseases Identification using Discrete Cosine Transform-based Features in Radiography Images
        Shamim Yousefi Samad Najjar-Ghabel
        The use of raw radiography results in lung disease identification has not acceptable performance. Machine learning can help identify diseases more accurately. Extensive studies were performed in classical and deep learning-based disease identification, but these methods More
        The use of raw radiography results in lung disease identification has not acceptable performance. Machine learning can help identify diseases more accurately. Extensive studies were performed in classical and deep learning-based disease identification, but these methods do not have acceptable accuracy and efficiency or require high learning data. In this paper, a new method is presented for automatic interstitial lung disease identification on radiography images to address these challenges. In the first step, patient information is removed from the images; the remaining pixels are standardized for more precise processing. In the second step, the reliability of the proposed method is improved by Radon transform, extra data is removed using the Top-hat filter, and the detection rate is increased by Discrete Wavelet Transform and Discrete Cosine Transform. Then, the number of final features is reduced with Locality Sensitive Discriminant Analysis. The processed images are divided into learning and test categories in the third step to create different models using learning data. Finally, the best model is selected using test data. Simulation results on the NIH dataset show that the decision tree provides the most accurate model by improving the harmonic mean of sensitivity and accuracy by up to 1.09times compared to similar approaches. Manuscript profile
      • Open Access Article

        11 - Design and implementation of a survival model for patients with melanoma based on data mining algorithms
        farinaz sanaei Seyed Abdollah  Amin Mousavi Abbas Toloie Eshlaghy ali rajabzadeh ghotri
        Background/Purpose: Among the most commonly diagnosed cancers, melanoma is the second leading cause of cancer-related death. A growing number of people are becoming victims of melanoma. Melanoma is also the most malignant and rare form of skin cancer. Advanced cases of More
        Background/Purpose: Among the most commonly diagnosed cancers, melanoma is the second leading cause of cancer-related death. A growing number of people are becoming victims of melanoma. Melanoma is also the most malignant and rare form of skin cancer. Advanced cases of the disease may cause death due to the spread of the disease to internal organs. The National Cancer Institute reported that approximately 99,780 people were diagnosed with melanoma in 2022, and approximately 7,650 died. Therefore, this study aims to develop an optimization algorithm for predicting melanoma patients' survival. Methodology: This applied research was a descriptive-analytical and retrospective study. The study population included patients with melanoma cancer identified from the National Cancer Research Center at Shahid Beheshti University between 2008 and 2013, with a follow-up period of five years. An optimization model was selected for melanoma survival prognosis based on the evaluation metrics of data mining algorithms. Findings: A neural network algorithm, a Naïve Bayes network, a Bayesian network, a combination of decision tree and Naïve Bayes network, logistic regression, J48, and ID3 were selected as the models used in the national database. Statistically, the studied neural network outperformed other selected algorithms in all evaluation metrics. Conclusion: The results of the present study showed that the neural network with a value of 0.97 has optimal performance in terms of reliability. Therefore, the predictive model of melanoma survival showed a better performance both in terms of discrimination power and reliability. Therefore, this algorithm was proposed as a melanoma survival prediction model. Manuscript profile