• Home
  • Speech Recognitio
    • List of Articles Speech Recognitio

      • Open Access Article

        1 - Long-Term Spectral Pseudo-Entropy (LTSPE): A New Robust Feature for Speech Activity Detection
        Mohammad Rasoul  kahrizi Seyed jahanshah kabudian
        Speech detection systems are known as a type of audio classifier systems which are used to recognize, detect or mark parts of an audio signal including human speech. Applications of these types of systems include speech enhancement, noise cancellation, identification, r More
        Speech detection systems are known as a type of audio classifier systems which are used to recognize, detect or mark parts of an audio signal including human speech. Applications of these types of systems include speech enhancement, noise cancellation, identification, reducing the size of audio signals in communication and storage, and many other applications. Here, a novel robust feature named Long-Term Spectral Pseudo-Entropy (LTSPE) is proposed to detect speech and its purpose is to improve performance in combination with other features, increase accuracy and to have acceptable performance. To this end, the proposed method is compared to other new and well-known methods of this context in two different conditions, with uses a well-known speech enhancement algorithm to improve the quality of audio signals and without using speech enhancement algorithm. In this research, the MUSAN dataset has been used, which includes a large number of audio signals in the form of music, speech and noise. Also various known methods of machine learning have been used. As well as Criteria for measuring accuracy and error in this paper are the criteria for F-Score and Equal-Error Rate (EER) respectively. Experimental results on MUSAN dataset show that if our proposed feature LTSPE is combined with other features, the performance of the detector is improved. Moreover, this feature has higher accuracy and lower error compared to similar ones. Manuscript profile
      • Open Access Article

        2 - Extraction and Modeling Context Dependent Phone Units for Improvement of Continuous Speech Recognition Accuracy by Phonemes Clustering
        Mohammad Bahrani H. Sameti
        This paper proposes a proper context dependent method for improving the accuracy of a Persian continuous speech recognition system. Due to some constraints in speech recognition system, the multiple phone units approach is utilized for extracting context dependent phone More
        This paper proposes a proper context dependent method for improving the accuracy of a Persian continuous speech recognition system. Due to some constraints in speech recognition system, the multiple phone units approach is utilized for extracting context dependent phone units. In this approach, each phoneme is clustered to some phoneme variations, and then each phoneme variation is modeled separately. Unsupervised phoneme clustering is done using k-means clustering algorithm. The new effective method is proposed for calculating the centroid of clusters. The proper number of cluster for each phoneme is determined according to amount of training data for that phoneme and recognition accuracy of that phoneme using context independent models. The number of clusters is then optimized by try and error methods. Then each cluster is modeled as a context dependent phone unit. The reduction in word error rate is about 22% using these models. Manuscript profile
      • Open Access Article

        3 - Robust Recognition of Direct and Telephony Speech Using Proper Extraction of Feature Vectors and Their Modification by Neural Networks Inversion
        M. Vali S. A. Seyed Salehi
        A vast amount of research is going on for design of robust speech recognition in to alleviate speech variability conditions. One of the variability aspects is the difference between telephony speech and direct speech (recorded in noise free conditions). In this paper by More
        A vast amount of research is going on for design of robust speech recognition in to alleviate speech variability conditions. One of the variability aspects is the difference between telephony speech and direct speech (recorded in noise free conditions). In this paper by using a set of experiments, it is shown that LHCB parameters are superior to traditional MFCCs for speech recognition applications when they are used in a neural network based speech recognition system for both direct and telephony speech. Then by extraction of LHCBs from direct and telephony speech, and training of a MLP based speech recognition model, a direct and telephony speech recognition system is developed. Using a neural network inversion based on gradient descent method, the telephony speech feature vectors are modified toward to the direct speech feature vectors and by training a second network on modified telephony and direct speech feature vectors a 1.4% enhancement on speech recognition was achieved. Later, using general inversion method of neural networks both telephony and direct speech feature vectors are modified in a manner which mainly contains phonetic information and not other speech variations. Then by the training of the second neural network on this dataset, the system achieved 2.98% and 1.68% higher recognition rate for direct and telephony speech, respectively. Manuscript profile
      • Open Access Article

        4 - Face recognition and Liveness Detection Based on Speech Recognition for Electronical Authentication
        Ahmad dolatkhah Behnam Dorostkar Yaghouti raheb hashempour
        As technology develops, institutions and organizations provide many services electronically and intelligently over the Internet. The police, as an institution that provides services to people and other institutions, aims to make its services smarter. Various electronic More
        As technology develops, institutions and organizations provide many services electronically and intelligently over the Internet. The police, as an institution that provides services to people and other institutions, aims to make its services smarter. Various electronic and intelligent systems have been offered in this regard. Because these systems lack authentication, many services that can be provided online require a visit to +10 police stations. Budget and equipment limitations for face-to-face responses, limitations of the police force and their focus on essential issues, a lack of service offices in villages and a limited number of service offices in cities, and the growing demand for online services, especially in crisis situations like Corona disease, electronic authentication is becoming increasingly important. This article reviews electronic authentication and its necessity, liveness detection methods and face recognition which are two of the most important technologies in this area. In the following, we present an efficient method of face recognition using deep learning models for face matching, as well as an interactive liveness detection method based on Persian speech recognition. A final section of the paper presents the results of testing these models on relevant data from this field. Manuscript profile