• List of Articles Big Data

      • Open Access Article

        1 - A Database Selection Method and Migration Model for Big Data
        Mohammad Reza Ahmadi
        Development of infrastructure and public services, especially in cloud computing applications, databases and storage techniques in traditional patterns has serious limitations and challenges. Increasing development tools and data services created widespread need for sto More
        Development of infrastructure and public services, especially in cloud computing applications, databases and storage techniques in traditional patterns has serious limitations and challenges. Increasing development tools and data services created widespread need for storing large data processing activities from public, private and comprehensive social networks, the need for data migration to new databases with different characteristics become an inevitable task. In traditional models, digital data were stored in storage systems or in separated databases. But with the expansion of the size and composition of data and generation of large data structure, practices and traditional patterns is not acceptable for new needs and the use of storage systems in the new formats and models are essential. In this paper, we have studied the dimensions of database structure and functions of traditional and new storage systems together with technical solutions for migration from traditional structured databases to unstructured data are presented. At the end, main features of distributed storage systems compared with traditional models and their performance are presented. Manuscript profile
      • Open Access Article

        2 - Privacy Preserving Big Data Mining: Association Rule Hiding
        Golnar Assadat  Afzali shahriyar mohammadi
        Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to More
        Data repositories contain sensitive information which must be protected from unauthorized access. Existing data mining techniques can be considered as a privacy threat to sensitive data. Association rule mining is one of the utmost data mining techniques which tries to cover relationships between seemingly unrelated data in a data base.. Association rule hiding is a research area in privacy preserving data mining (PPDM) which addresses a solution for hiding sensitive rules within the data problem. Many researches have be done in this area, but most of them focus on reducing undesired side effect of deleting sensitive association rules in static databases. However, in the age of big data, we confront with dynamic data bases with new data entrance at any time. So, most of existing techniques would not be practical and must be updated in order to be appropriate for these huge volume data bases. In this paper, data anonymization technique is used for association rule hiding, while parallelization and scalability features are also embedded in the proposed model, in order to speed up big data mining process. In this way, instead of removing some instances of an existing important association rule, generalization is used to anonymize items in appropriate level. So, if necessary, we can update important association rules based on the new data entrances. We have conducted some experiments using three datasets in order to evaluate performance of the proposed model in comparison with Max-Min2 and HSCRIL. Experimental results show that the information loss of the proposed model is less than existing researches in this area and this model can be executed in a parallel manner for less execution time Manuscript profile
      • Open Access Article

        3 - A Novel Approach for Cluster Self-Optimization Using Big Data Analytics
        Abbas Mirzaei Amir Rahimi
        One of the current challenges in providing high bitrate services in next generation mobile networks is limitation of available resources. The goal of proposing a self-optimization model is to maximize the network efficiency and increase the quality of services provided More
        One of the current challenges in providing high bitrate services in next generation mobile networks is limitation of available resources. The goal of proposing a self-optimization model is to maximize the network efficiency and increase the quality of services provided to femto-cell users, considering the limited resources in radio access networks. The basis for our proposed scheme is to introduce a self-optimization model based on neighbouring relations. Using this model, we can create the possibility of controlling resources and neighbouring parameters without the need of human manipulation and only based on the network’s intelligence. To increase the model efficiency, we applied the big data technique for analyzing data and increasing the accuracy of the decision-making process in a way that on the uplink, the sent data by users is to be analyzed in self-optimization engine. The experimental results show that despite the tremendous volume of the analyzed data – which is hundreds of times bigger than usual methods – it is possible to improve the KPIs, such as throughput, up to 30 percent by optimal resource allocation and reducing the signaling load. Also, the presence of feature extraction and parameter selection modules will reduce the response time of the self-optimization model up to 25 percent when the number of parameters is too high Moreover, numerical results indicate the superiority of using support vector machine (SVM) learning algorithm. It improves the accuracy level of decision making based on the rule-based expert system. Finally, uplink quality improvement and 15-percent increment of the coverage area under satisfied SINR conditions can be considered as outcome of the proposed scheme. Manuscript profile
      • Open Access Article

        4 - Strategic Human Resource Management in Digital Era Based on Big Data
        Gholamreza Malekzadeh sedigheh sadeghi
        Nowadays intelligent devices, virtual environments and technological innovations is a part of the human’s everyday life. While Technology innovation could easily be representing currently highest business threatens, executive leaders that transform this in Business oppo More
        Nowadays intelligent devices, virtual environments and technological innovations is a part of the human’s everyday life. While Technology innovation could easily be representing currently highest business threatens, executive leaders that transform this in Business opportunities and create a new competitive space out of this threat are the ones succeeding. On the other hand influence of information technology in organizations and besides, extension of various kinds of social media is a good opportunity to gather a mass amount of people’s information and data. Regarding to these facts one can say that creative thinking and alignment with digital age facilities, requirements and needs and respecting the value of knowledge management along with making use of information management is what should be taken under consideration with more attention specially in field of human capital management. We will discuss on HR digital literacy and HR understanding of organization mission and values effects on flexibility in digital transformation. In this article, we will discuss usage of information systems especially big data in human resource managment in digital era with respect to surveys of honored organizations such as Mc keinsy. It is deducted that in new age, regarding that a new generation of workforce with different attitude and expectations is ready to enter labor market, transformation from traditional structures to structures driven from analytical results of big data would lead to a more effective management. Manuscript profile
      • Open Access Article

        5 - Comparative Study, Applications and Challenges of Big Data Analysis Technologies
        Yaser Ghasemi nejad Abbass Ketabchi
        oday, receiving and sharing information is easier and cheaper than before, enabling organizations to handle large volumes of data at a high speed and variety in the name of big data. Big data technology provides many opportunities when problems are resolved correctly. D More
        oday, receiving and sharing information is easier and cheaper than before, enabling organizations to handle large volumes of data at a high speed and variety in the name of big data. Big data technology provides many opportunities when problems are resolved correctly. Data processing technologies in the past are not suitable for dealing with large quantities of generated data. While Suggested frameworks for big data applications help to store, analyze and process data. In this study, we first reviewed and summarized the big data definitions, and challenges of using it and then a number of important big data frameworks (Hadoop, Flink, Storm, Spark and Samza) have been studied and compared comparatively. The studied framework of big data is generally classified into two categories: (1) batch mode; and (2) stream mode. The Hadoop framework processes data in batch mode, while other frameworks allow stream or real time processing. Ultimately, the most important applications of using big data technology have been described. The most important applications for big data analysis are healthcare applications, advisory systems, smart cities and social networks analysis. Due to the growth of Internet-connected devices, social networking data is growing widely and requires more big data technology. Also, the most challenges of big data application, including confidentiality in storage systems, software deficiencies and the limitation of existing hardware and equipment, the need for large initial investment and the lack of technical skills and expert workforce. Manuscript profile
      • Open Access Article

        6 - Big IoT Data from the Perspective of Smart Agriculture
        Bahareh Jamshidi Hossein  Dehghanisani
        Internet of Things (IoT) as an emerging technology in the field of Information and Communication Technology is the next revolution related to the Internet application. IoT focuses on the communication of things such as sensors, drivers, devices, etc., with data collecti More
        Internet of Things (IoT) as an emerging technology in the field of Information and Communication Technology is the next revolution related to the Internet application. IoT focuses on the communication of things such as sensors, drivers, devices, etc., with data collection capability controlling remote communication rather than focusing on the communication between people. Development of smart solutions and new technologies of IoT in agriculture can pave the way to a new paradigm of farming called “Smart Agriculture” by making a fundamental change in all aspects of current practices. IoT-based Smart Agriculture can improve agricultural productivity with more food production through the optimal utilization of the basic resources, minimizing environmental impacts, reducing the costs, and increasing the incomes with linking to the business market that facilitates sustainable agricultural development goals. IoT-based data is a collection of large data called “Big Data” that cannot be processed and managed by traditional databases and conventional management tools. IoT and Big Data technologies are interconnected and it can be predicted that the future of optimal agriculture in the world would not be possible to meet the food demand and sustainability of production without these technologies and Smart Agriculture. This article introduces IoT and Big Data technologies, as well as the relationship between them from the vision of Smart Agriculture. Moreover, the article aims to assist in the decision-making of the strategy from the pre-production stage to the business marketing in the country by assessing life cycle and technology trends. Some of the big IoT data applications in the Smart Agriculture cycle are also introduced. Manuscript profile
      • Open Access Article

        7 - The role of big data management in improving the decision-making of banking organizations (Case study of Sepah Bank)
        Yaser Ghasemi Nezhad peyman hajizadeh Hamed Kordi
        In recent years, the size of data in the world has increased dramatically several times. Banks also generate large amounts of data in their processes and sometimes spend exorbitant costs to collect and maintain it. Some banking industry experts estimate a sevenfold incr More
        In recent years, the size of data in the world has increased dramatically several times. Banks also generate large amounts of data in their processes and sometimes spend exorbitant costs to collect and maintain it. Some banking industry experts estimate a sevenfold increase in existing data by 2020. Today, big data technology is considered as a solution to exploit and use this volume of information. However, reviewing and processing big data as well as examining the effectiveness of its use in the field of banking is considered as a challenge. Therefore, in this study, the role of big data management in improving the decision-making of banking organizations (Case study of Sepah Bank) was investigated. For this purpose, the statistical population of this study includes 130 experts from all departments of the information technology department of Sepah Bank, which was not sampled due to the limited statistical population. The standard 20-item big data management questionnaire, the 22-item decision-making empowerment questionnaire and the 10-item decision-making quality questionnaire based on Shamim et al.'s (2019) research were the basis of research after localization. Descriptive and inferential results of research data were analyzed using SPSS 19 and PLS software. The results showed that organizational culture with a coefficient of 0.446 has the most positive and significant relationship with empowerment. Empowerment also has a positive and significant relationship with a coefficient of 0.645 with decision effectiveness and a coefficient of 0.884 with decision efficiency. Manuscript profile
      • Open Access Article

        8 - A Distributed Solution for Mixed Big Data Clustering
        M. Mahmoudi نگین دانشپور
        Due to the high-speed of information generation and the need for data-knowledge conversion, there is an increasing need for data mining algorithms. Clustering is one of the data mining techniques, and its development leads to further understanding of the surrounding env More
        Due to the high-speed of information generation and the need for data-knowledge conversion, there is an increasing need for data mining algorithms. Clustering is one of the data mining techniques, and its development leads to further understanding of the surrounding environments. In this paper, a dynamic and scalable solution for clustering mixed big data with a lack of data is presented. In this solution, the integration of common distance metrics with the concept of the closest neighborhood, as well as a kind of geometric coding are used. There is also a way to recover missing data in the dataset. By utilizing parallelization and distribution techniques, multiple nodes can be scalable and accelerated. The evaluation of this solution is based on speed, precision, and memory usage criteria compared to other ones. Manuscript profile
      • Open Access Article

        9 - High Performance Computing via Improvement of Random Forest Algorithm Using Compression and Parallelization Techniques
        Naeimeh Mohammad Karimi Mohammad Ghasemzadeh Mahdi  Yazdian Dehkordi Amin Nezarat
        This research seeks to promote one of the widely being used algorithms in machine learning, known as the random forest algorithm. For this purpose, we use compression and parallelization techniques. The main challenge we address in this research is about application of More
        This research seeks to promote one of the widely being used algorithms in machine learning, known as the random forest algorithm. For this purpose, we use compression and parallelization techniques. The main challenge we address in this research is about application of the random forest algorithm in processing and analyzing big data. In such cases, this algorithm does not show the usual and required performance, due to the needed large number of memory access. This research demonstrates how we can achieve the desired goal by using an innovative compression method, along with parallelization techniques. In this regard, the same components of the trees in the random forest are combined and shared. Also, a vectorization-based parallelization approach, along with a shared-memory-based parallelization method, are used in the processing phase. In order to evaluate its performance, we run it on the Kaggle benchmarks, which are being used widely in machine learning competitions. The experimental results show that contribution of the proposed compression method, could reduce 61% of the required processing time; meanwhile, application of the compression along with the named parallelization methods could lead to about 95% of improvement. Overall, this research implies that the proposed solution can provide an effective step toward high performance computing. Manuscript profile
      • Open Access Article

        10 - An Approximate Binary Tree-Based Solution to Speed Up the Search for the Nearest Neighbor in Big Data
        Hosein Kalateh M. D.
        Due to the increasing speed of information production and the need to convert information into knowledge, old machine learning methods are no longer responsive. When using classifications with the old machine learning methods, especially the use of inherently lazy class More
        Due to the increasing speed of information production and the need to convert information into knowledge, old machine learning methods are no longer responsive. When using classifications with the old machine learning methods, especially the use of inherently lazy classifications such as the k-nearest neighbor (KNN) method, the operation of classifying large data sets is very slow. Nearest Neighborhood is a popular method of data classification due to its simplicity and practical accuracy. The proposed method is based on sorting the training data feature vectors in a binary search tree to expedite the classification of big data using the nearest neighbor method. This is done by finding the approximate two farthest local data in each tree node. These two data are used as a criterion for dividing the data in the current node into two groups. The data set in each node is assigned to the left and right child of the current node based on their similarity to the two data. The results of several experiments performed on different data sets from the UCI repository show a good degree of accuracy due to the low execution time of the proposed method. Manuscript profile
      • Open Access Article

        11 - Clustering Iranian Gas Industry Managers and Ranking Their Competencies via the EFQM Excellence Model-based Evaluation with an Artificial Intelligence Approach
        Ali reza Zamanian Majid Jahangirfard Farshad Hajalian
        This study attempted to lay the ground for linking human resources data based on the results of the organizational excellence model for about 51 parent and subsidiary companies of the National Iranian Gas Company using artificial intelligence (AI) and machine learning m More
        This study attempted to lay the ground for linking human resources data based on the results of the organizational excellence model for about 51 parent and subsidiary companies of the National Iranian Gas Company using artificial intelligence (AI) and machine learning methods. The goal was to present a model for clustering chief organizational managers based on the companies’ evaluation using the European Foundation for Quality Management (EFQM)-based excellence model. The unique characteristic of this method is that it is formed based on the actual performance and output of successful organizations, headed by successful managers and leaders. Accordingly, a performance-based excellence model can be achieved in the future. The outcomes of model evaluation for 2017, 2018, and 2019 for 51 companies affiliated with the National Iranian Gas Company were first clustered. Clustering was performed for 3776 pieces of data via AI-based methods, and coding was done in Python. This applied study aimed to design and develop a novel method for discovering the experts and scientifically classifying the organization’s human resources based on credible data. It also aimed to integrate novel scientific domains of AI, including clustering, to pave the ground for human resources research. In the applied dimension, the results were used in organizational planning and decision-making to generate a tool whereby the future managerial performance of the organization and staff can be predicted based on appropriate human resources data. Finally, a ranking is presented based on the competency gap by using Fisher discriminant ratio (FDR). Manuscript profile
      • Open Access Article

        12 - Presenting a novel solution to choose a proper database for storing big data in national network services
        Mohammad Reza Ahmadi davood maleki ehsan arianyan
        The increasing development of tools producing data in different services and the need to store the results of large-scale processing results produced from various activities in the national information network services and also the data produced by the private sector an More
        The increasing development of tools producing data in different services and the need to store the results of large-scale processing results produced from various activities in the national information network services and also the data produced by the private sector and social networks, has made the migration to new databases solutions with appropriate features inevitable. With the expansion and change in the size and composition of data and the formation of big data, traditional practices and patterns do not meet new requirements. Therefore, the necessity of using information storage systems in new and scalable formats and models has become necessary. In this paper, the basic structural dimensions and different functions of both traditional databases and modern storage systems are reviewed and a new technical solution for migrating from traditional databases to modern databases is presented. Also, the basic features regarding the connection of traditional and modern databases for storing and processing data obtained from the comprehensive services of the national information network are presented and the parameters and capabilities of databases in the standard and Hadoop context are examined. In addition, as a practical example, a solution for combining traditional and modern databases has been presented, evaluated and compared using the BSC method. Moreover, it is shown that in different data sets with different data volumes, a combined use of both traditional and modern databases can be the most efficient solution. Manuscript profile
      • Open Access Article

        13 - Data-driven Marketing in Digital Businesses from Dynamic Capabilities View
        Maede  Amini vlashani ayoub mohamadian Seyed Mohammadbagher Jafari
        Despite the enormous volume of data and the benefits it can bring to marketing activities, it is unclear how to use it in the literature, and very few studies have been conducted in this field. In this regard, this study uses dynamic capabilities view to identify the dy More
        Despite the enormous volume of data and the benefits it can bring to marketing activities, it is unclear how to use it in the literature, and very few studies have been conducted in this field. In this regard, this study uses dynamic capabilities view to identify the dynamic capabilities of data-driven marketing to focus on data in the development of marketing strategies, make effective decisions, and improve efficiency in marketing processes and operations. This research has been carried out in a qualitative method utilizing the content analysis strategy and interviews with specialists. The subjects were 18 professionals in the field of data analytics and marketing. They were selected by the purposeful sampling method. This study provides data-driven marketing dynamic capabilities, including; Ability to absorb marketing data, aggregate and analyze marketing data, the ability to data-driven decision-making, the ability to improve the data-driven experience with the customer, data-driven innovation, networking, agility, and data-driven transformation. The results of this study can be a step towards developing the theory of dynamic capabilities in the field of marketing with a data-driven approach. Therefore, it can be used in training and creating new organizational capabilities to use big data in the marketing activities of organizations, to develop and improve data-driven products and services, and improve the customer experience Manuscript profile
      • Open Access Article

        14 - Fuzzy Multicore Clustering of Big Data in the Hadoop Map Reduce Framework
        Seyed Omid Azarkasb Seyed Hossein Khasteh Mostafa  Amiri
        A logical solution to consider the overlap of clusters is assigning a set of membership degrees to each data point. Fuzzy clustering, due to its reduced partitions and decreased search space, generally incurs lower computational overhead and easily handles ambiguous, no More
        A logical solution to consider the overlap of clusters is assigning a set of membership degrees to each data point. Fuzzy clustering, due to its reduced partitions and decreased search space, generally incurs lower computational overhead and easily handles ambiguous, noisy, and outlier data. Thus, fuzzy clustering is considered an advanced clustering method. However, fuzzy clustering methods often struggle with non-linear data relationships. This paper proposes a method based on feasible ideas that utilizes multicore learning within the Hadoop map reduce framework to identify inseparable linear clusters in complex big data structures. The multicore learning model is capable of capturing complex relationships among data, while Hadoop enables us to interact with a logical cluster of processing and data storage nodes instead of interacting with individual operating systems and processors. In summary, the paper presents the modeling of non-linear data relationships using multicore learning, determination of appropriate values for fuzzy parameterization and feasibility, and the provision of an algorithm within the Hadoop map reduce model. The experiments were conducted on one of the commonly used datasets from the UCI Machine Learning Repository, as well as on the implemented CloudSim dataset simulator, and satisfactory results were obtained.According to published studies, the UCI Machine Learning Repository is suitable for regression and clustering purposes in analyzing large-scale datasets, while the CloudSim dataset is specifically designed for simulating cloud computing scenarios, calculating time delays, and task scheduling. Manuscript profile
      • Open Access Article

        15 - The main components of evaluating the credibility of users according to organizational goals in the life cycle of big data
        Sogand Dehghan shahriyar mohammadi rojiar pirmohamadiani
        Social networks have become one of the most important decision-making factors in organizations due to the speed of publishing events and the large amount of information. For this reason, they are one of the most important factors in the decision-making process of inform More
        Social networks have become one of the most important decision-making factors in organizations due to the speed of publishing events and the large amount of information. For this reason, they are one of the most important factors in the decision-making process of information validity. The accuracy, reliability and value of the information are clarified by these networks. For this purpose, it is possible to check the validity of information with the features of these networks at the three levels of user, content and event. Checking the user level is the most reliable level in this field, because a valid user usually publishes valid content. Despite the importance of this topic and the various researches conducted in this field, important components in the process of evaluating the validity of social network information have received less attention. Hence, this research identifies, collects and examines the related components with the narrative method that it does on 30 important and original articles in this field. Usually, the articles in this field are comparable from three dimensions to the description of credit analysis approaches, content topic detection, feature selection methods. Therefore, these dimensions have been investigated and divided. In the end, an initial framework was presented focusing on evaluating the credibility of users as information sources. This article is a suitable guide for calculating the amount of credit of users in the decision-making process. Manuscript profile
      • Open Access Article

        16 - Providing a New Solution in Selecting Suitable Databases for Storing Big Data in the National Information Network
        Mohammad Reza Ahmadi davood maleki ehsan arianyan
        The development of infrastructure and applications, especially public services in the form of cloud computing, traditional models of database services and their storage methods have faced sever limitations and challenges. The increasing development of data service produ More
        The development of infrastructure and applications, especially public services in the form of cloud computing, traditional models of database services and their storage methods have faced sever limitations and challenges. The increasing development of data service productive tools and the need to store the results of large-scale processing resulting from various activities in the national network of information and data produced by the private sector and pervasive social networks has made the process of migrating to new databases with appropriate features inevitable. With the expansion and change in the size and composition of data and the formation of big data, traditional practices and patterns do not meet new needs. Therefore, it is necessary to use data storage systems in new and scalable formats and models. This paper reviews the essential solution regarding the structural dimensions and different functions of traditional databases and modern storage systems and technical solutions for migrating from traditional databases to modern ones suitable for big data. Also, the basic features regarding the connection of traditional and modern databases for storing and processing data obtained from the national information network are presented and the parameters and capabilities of databases in the standard platform context and Hadoop context are examined. As a practical example, a combination of traditional and modern databases using the balanced scorecard method is presented as well as evaluated and compared. Manuscript profile