• Home
  • داده کاوی
    • List of Articles داده کاوی

      • Open Access Article

        1 - A proper method for the advertising email classification based on user’s profiles
        rahim hazratgholizadeh Mohammad Fathian
        In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising email More
        In general, Spam is related to satisfy or not satisfy the client and isn’t related to the content of the client’s email. According to this definition, problems arise in the field of marketing and advertising for example, it is possible that some of the advertising emails become spam for some users, and not spam for others. To deal with this problem, many researchers design an anti-spam based on personal profiles. Normally machine learning methods for spam classification with good accuracy are used. However, there isn’t a unique successful way based on Electronic Commerce approach. In this paper, at first were prepared a new profile that can lead to better simulations of user’s behavior. Then we gave this profile with advertising emails to students and collected their answers. In continue, were examined famous methods for email classification. Finally, comparing different methods by criteria of data mining standards, it can be shown that neural network method has the best accuracy for various data sets. Manuscript profile
      • Open Access Article

        2 - Survey different aspects of the problem phishing website detection and Review to existing Methods
        nafise langari
        One of the latest security threats in cyberspace to steal personal and financial information is created by phisher. Due to there Are various methods to detect phishing and also there is not an up-date comprehensive study on the issue, the authors Motivated to review and More
        One of the latest security threats in cyberspace to steal personal and financial information is created by phisher. Due to there Are various methods to detect phishing and also there is not an up-date comprehensive study on the issue, the authors Motivated to review and analysis the proposed phishing detection methods in five categories such as: anti-phishing tools Based, data mining based, heuristic based, meta-heuristic based and machine learning based methods. The advantages and Disadvantages of each method are extracted from the current review and comparison. The outlines of this study can be suitable to identify the probability gaps in phishing detection problems for feature researches. Manuscript profile
      • Open Access Article

        3 - Integrating data envelopment analysis and decision tree models In order to evaluate information technology-based units
        Amir Amini
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        4 - Developing A Suitable Data Model For Data Mining Application In Banking
        Shahideh Ahmadi
        Banking domains such as credit assessments, branch efficiency, electronic banking  is tremendous contexts for the broad application of the concepts of business intelligence and its methods include data mining, data warehouses and decision support systems. There are many More
        Banking domains such as credit assessments, branch efficiency, electronic banking  is tremendous contexts for the broad application of the concepts of business intelligence and its methods include data mining, data warehouses and decision support systems. There are many researches in the field of application of data mining in particular domains of banking, each of which analyzes the different entity of the banking sector, such as customers, facilities, accounts, and so on, but there isn’t research that comprehensively addresses all data mining applications in a bank, it integrates them, extracts and categorizes all banking entities for a variety of analytical applications and ultimately provides an appropriate data model according to the required attributes for the banking domains. Currently, information systems of Iranian banks are being developed for responding to new information needs. In this research by using content analysis method was investigated the content of valid research in the field of banking which was carried out with the data mining approach and by extracting the entities and attributes used in these researches is presented an appropriate data model for data analysis applications in banking. Information technology managers  by using this model can assess the status of the bank in terms of the richness of the data needed to conduct data analysis and consider the identified deficiencies in the future development plans of the information systems. After analyzing and evaluating previous researches, 28 entities and 423 attributes were identified and the last entity-relationship model was created. Based on the presented model, a measuring tool was provided as a checklist so that banks can use it to measure their status in terms of the richness of existing data and to measure their readiness from the perspective of the data to do the analysis. To confirm the last data model, were used idea of ten experts by questionnaires and interviews in different sections such as customers and public banking, finance and support, e-banking, credit and corporate affairs, IT domain and international affairs in the bank. Also, using data collected from the researches were presented frequency diagrams of the algorithms, techniques, sampling methods, performance indexes and data mining soft­wares that used in the researches. To decide which data mining algorithms are most used in different domains as an example. Manuscript profile
      • Open Access Article

        5 - Integrating Data Envelopment Analysis and Decision Tree Models in Order to Evaluate Information Technology-Based Units
        Amir Amini ali alinezhad somaye shafaghizade
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data enve More
        In order to evaluate the performance and desirability of the activities of its units each organization needs an evaluation system to assess this desirability and it is more important for financial institutions, including information technology-based companies. Data envelopment analysis (DEA) is a non-parametric method to measure the effectiveness and efficiency of decision-making units (DMUs). On the other hand, data mining technique allows DMUs to explore and discover meaningful information, which had previously been hidden in large databases. . This paper presents a general framework for combining DEA and regression tree for evaluating the effectiveness and efficiency of the DMUs. Resulting hybrid model is a set of rules that can be used by policy makers to discover reasons behind efficient and inefficient DMUs. Using the proposed method for examining factors related to productivity, a sample of 18 branches of Iran insurance in Tehran was elected as a case study. After modeling based on advanced model the input oriented LVM model with weak disposability in data envelopment analysis was calculated using undesirable output, and by use of decision tree technique deals with extracting and discovering the rules for the cause of increased productivity and reduced productivity. Manuscript profile
      • Open Access Article

        6 - Presenting the model for opinion mining at the document feature level for hotel users' reviews
        ELHAM KHALAJJ shahriyar mohammadi
        Nowadays, online review of user’s sentiments and opinions on the Internet is an important part of the process of people deciding whether to choose a product or use the services provided. Despite the Internet platform and easy access to blogs related to opinions in the More
        Nowadays, online review of user’s sentiments and opinions on the Internet is an important part of the process of people deciding whether to choose a product or use the services provided. Despite the Internet platform and easy access to blogs related to opinions in the field of tourism and hotel industry, there are huge and rich sources of ideas in the form of text that people can use text mining methods to discover the opinions of. Due to the importance of user's sentiments and opinions in the industry, especially in the tourism and hotel industry, the topics of opinion research and analysis of emotions and exploration of texts written by users have been considered by those in charge. In this research, a new and combined method based on a common approach in sentiment analysis, the use of words to produce characteristics for classifying reviews is presented. Thus, the development of two methods of vocabulary construction, one using statistical methods and the other using genetic algorithm is presented. The above words are combined with the Vocabulary of public feeling and standard Liu Bing classification of prominent words to increase the accuracy of classification Manuscript profile
      • Open Access Article

        7 - Technology Watch” via “Information Technology
        Kiyarash Jahanpour
        Information is power, but knowledge is more powerful .information in patents and papers are good source of codified knowledge. Everyday a higher number of businesses make use of information from patents (as a main indicator of technology) and papers(as a principal More
        Information is power, but knowledge is more powerful .information in patents and papers are good source of codified knowledge. Everyday a higher number of businesses make use of information from patents (as a main indicator of technology) and papers(as a principal indicator of science) to see what products and systems are appearing in our globe. In an era of rapidly expanding digital content, overwhelming data available on the web and the high speed of S&T progress makes it difficult for experts to extract useful knowledge without powerful tools and they need to find new ways of reviewing and managing vast quantities of textual information. “Technology watch” is a collective voluntary process with which the companies work the information in an active manner. Purpose of “technology watch” is to gather process and integrate the technical information. TW has at least 3 objectives: Facilitating the innovation process; Easy and cost effective access to information and Answering to technological questions and problems. “Technology Watch” maintains awareness at all levels of global S&T through a combination of human-based overt and IT-based approaches for analyzing and tracking the myriad S&T outputs. Powerful IT-based techniques, such as text mining, now exist to identify and extract relevant data from the S&T literature and are especially useful in making sense out of disjointed and disparate data. Regarded by many as the next wave of knowledge discovery, text mining has very high commercial values. Manuscript profile
      • Open Access Article

        8 - Using hyperion hyperpectral data and field spectrometry for identification of hydrocarbon leakagesvia VISA-SCM combined methodology and spatial data mining
        Mohammad حمزه علی درویش بلورانی سید کاظم  علوی پناه فروغ  بیک حسین نصیری
        The hydrocarbon seepages theory puts forward a cause and effect relation ship between the oil and gas reservoir s and the specific surface anomalies which are basically related to hydrocarbon leakages as well as their related alterations. Hence,the s More
        The hydrocarbon seepages theory puts forward a cause and effect relation ship between the oil and gas reservoir s and the specific surface anomalies which are basically related to hydrocarbon leakages as well as their related alterations. Hence,the spectral reflectance of the hydrocarbons and their linked mineral alterntions produce credible pieces of evidence for oil and gas ex ploration .Hyperion images of EO-1 satellite was used in this study for identifying the oil seepages and their relevant alterations. After collecting the required data,the images under went the needed preprocessing. In order to recognize the oil seepages, these corrected data along with field-sampled spectrometric ones were employed. Then, VISA and SCM combined model was applied to indirectly identify the hydrocarbon seepages . Moreover, two hydrocarbon indexes were developed for direct recognition of the hydrocarbon seeps using Hyperion images. The finding indicate that the two mentioned techniques are efficacious for the purpose of the study at hand Manuscript profile
      • Open Access Article

        9 - Anomaly and Intrusion Detection through Datamining and Feature Selection using PSO Algorithm
        Fereidoon Rezaei Mohamad Ali Afshar Kazemi Mohammad Ali Keramati
        Today, considering technology development, increased use of Internet in businesses, and movement of business types from physical to virtual and internet, attacks and anomalies have also changed from physical to virtual. That is, instead of thieving a store or market, th More
        Today, considering technology development, increased use of Internet in businesses, and movement of business types from physical to virtual and internet, attacks and anomalies have also changed from physical to virtual. That is, instead of thieving a store or market, the individuals intrude the websites and virtual markets through cyberattacks and disrupt them. Detection of attacks and anomalies is one of the new challenges in promoting e-commerce technologies. Detecting anomalies of a network and the process of detecting destructive activities in e-commerce can be executed by analyzing the behavior of network traffic. Data mining systems/techniques are used extensively in intrusion detection systems (IDS) in order to detect anomalies. Reducing the size/dimensions of features plays an important role in intrusion detection since detecting anomalies, which are features of network traffic with high dimensions, is a time-consuming process. Choosing suitable and accurate features influences the speed of the proposed task/work analysis, resulting in an improved speed of detection. In this article, by using data mining algorithms such as J48 and PSO, we were able to significantly improve the accuracy of detecting anomalies and attacks. Manuscript profile
      • Open Access Article

        10 - Risk Parity Portfolio Optimization Based on CVaR
        Seyed javad  Pourhoseini sayyed mohammad reza davoodi Mansour Momeni
        Risk parity is one of the stock portfolio selection models that has received much attention after the American financial crisis in 2008. The philosophy of this model is to allocate the risk of the portfolio to the same extent among its constituent assets. Conditional va More
        Risk parity is one of the stock portfolio selection models that has received much attention after the American financial crisis in 2008. The philosophy of this model is to allocate the risk of the portfolio to the same extent among its constituent assets. Conditional value at risk is one of the popular and common measures of risk measurement in finance, which measures the mathematical expectation of loss of a stock portfolio for values beyond a threshold value and at a known confidence level and time horizon. The aim of the current research is to design and optimize the performance of the risk parity stock portfolio model with the criterion of conditional risk value. There are different approaches in modeling optimal portfolio selection that use different criteria and methods to calculate and estimate returns and risks. Various criteria have been proposed to measure risk in finance, each of which has its own advantages and disadvantages. One of the criteria that has been introduced with the aim of reducing the disadvantages of the common and popular measure of value at risk is the conditional value at risk or expected drop, which is used as a measure of risk in the present study. Conditional value at risk measures the average loss of the portfolio for cases where the amount of loss exceeds value at risk Manuscript profile
      • Open Access Article

        11 - Predicting Generalized Anxiety Disorder Among Female Students Using Random Forest Approach
        Zahra Gholami Habibeh Zare
        <p>Mental health is considered one of the major challenges for the generations. Generalized anxiety disorder (GAD) is one of many mental health complications. However, individuals with the disorder experience hyperbolic concerns and tensions regarding daily events. Furt More
        <p>Mental health is considered one of the major challenges for the generations. Generalized anxiety disorder (GAD) is one of many mental health complications. However, individuals with the disorder experience hyperbolic concerns and tensions regarding daily events. Furthermore, it is reported that approximately 5% of the population of developed countries suffer from GAD. Additionally, women are affected by this disease twice as often as men, and it is an increasing disorder among women, particularly female students. This paper aims to predict generalized anxiety disorder among female students using the random decision forest algorithm. The data mining method was utilized for prediction. Female students of Shiraz Azad University developed the research community. Therefore, 150 female students were selected by simple random method and tested with a DSM-IV questionnaire. Accordingly, a random forest algorithm is proposed to generate a prediction model. Moreover, NetBeans IDE was applied for operationalization. Java was the programming language to code the prototype, and the WEKA library was involved in the operation. However, the results showed that the prediction accuracy with the random forest algorithm exceeds 0.9, which indicates that the algorithm is likely to predict GAD accurately. The random decision forest algorithm consistently predicts an individual not suffering from GAD. The results are relatively consistent compared to the baseline employed in the R. However, the random decision forest algorithm produces high predictive performance and may display significant relationships between the proposed and dependent parameters.</p> Manuscript profile