• Home
  • Word Embedding
    • List of Articles Word Embedding

      • Open Access Article

        1 - Utilizing Gated Recurrent Units to Retain Long Term Dependencies with Recurrent Neural Network in Text Classification
        Nidhi Chandra Laxmi  Ahuja Sunil Kumar Khatri Himanshu Monga
        The classification of text is one of the key areas of research for natural language processing. Most of the organizations get customer reviews and feedbacks for their products for which they want quick reviews to action on them. Manual reviews would take a lot of time a More
        The classification of text is one of the key areas of research for natural language processing. Most of the organizations get customer reviews and feedbacks for their products for which they want quick reviews to action on them. Manual reviews would take a lot of time and effort and may impact their product sales, so to make it quick these organizations have asked their IT to leverage machine learning algorithms to process such text on a real-time basis. Gated recurrent units (GRUs) algorithms which is an extension of the Recurrent Neural Network and referred to as gating mechanism in the network helps provides such mechanism. Recurrent Neural Networks (RNN) has demonstrated to be the main alternative to deal with sequence classification and have demonstrated satisfactory to keep up the information from past outcomes and influence those outcomes for performance adjustment. The GRU model helps in rectifying gradient problems which can help benefit multiple use cases by making this model learn long-term dependencies in text data structures. A few of the use cases that follow are – sentiment analysis for NLP. GRU with RNN is being used as it would need to retain long-term dependencies. This paper presents a text classification technique using a sequential word embedding processed using gated recurrent unit sigmoid function in a Recurrent neural network. This paper focuses on classifying text using the Gated Recurrent Units method that makes use of the framework for embedding fixed size, matrix text. It helps specifically inform the network of long-term dependencies. We leveraged the GRU model on the movie review dataset with a classification accuracy of 87%. Manuscript profile
      • Open Access Article

        2 - Using Sentiment Analysis and Combining Classifiers for Spam Detection in Twitter
        mehdi salkhordeh haghighi Aminolah Kermani
        The welcoming of social networks, especially Twitter, has posed a new challenge to researchers, and it is nothing but spam. Numerous different approaches to deal with spam are presented. In this study, we attempt to enhance the accuracy of spam detection by applying one More
        The welcoming of social networks, especially Twitter, has posed a new challenge to researchers, and it is nothing but spam. Numerous different approaches to deal with spam are presented. In this study, we attempt to enhance the accuracy of spam detection by applying one of the latest spam detection techniques and its combination with sentiment analysis. Using the word embedding technique, we give the tweet text as input to a convolutional neural network (CNN) architecture, and the output will detect spam text or normal text. Simultaneously, by extracting the suitable features in the Twitter network and applying machine learning methods to them, we separately calculate the Tweeter spam detection. Eventually, we enter the output of both approaches into a Meta Classifier so that its output specifies the final spam detection or the normality of the tweet text. In this study, we employ both balanced and unbalanced datasets to examine the impact of the proposed model on two types of data. The results indicate an increase in the accuracy of the proposed method in both datasets. Manuscript profile
      • Open Access Article

        3 - Semantic Word Embedding Using BERT on the Persian Web
        shekoofe bostan Ali-Mohammad Zare-Bidoki mohamad reza pajohan
        Using the context and order of words in sentence can lead to its better understanding and comprehension. Pre-trained language models have recently achieved great success in natural language processing. Among these models, The BERT algorithm has been increasingly popular More
        Using the context and order of words in sentence can lead to its better understanding and comprehension. Pre-trained language models have recently achieved great success in natural language processing. Among these models, The BERT algorithm has been increasingly popular. This problem has not been investigated in Persian language and considered as a challenge in Persian web domain. In this article, the embedding of Persian words forming a sentence was investigated using the BERT algorithm. In the proposed approach, a model was trained based on the Persian web dataset, and the final model was produced with two stages of fine-tuning the model with different architectures. Finally, the features of the model were extracted and evaluated in document ranking. The results obtained from this model are improved compared to results obtained from other investigated models in terms of accuracy compared to the multilingual BERT model by at least one percent. Also, applying the fine-tuning process with our proposed structure on other existing models has resulted in the improvement of the model and embedding accuracy after each fine-tuning process. This process will improve result in around 5% accuracy of the Persian web ranking. Manuscript profile