Sharing Features and Abstractions across Data for Robust Speech Recognition
Subject Areas : electrical and computer engineeringP. Zarei Eskikand 1 , S. A. Seyed Salehi 2
1 -
2 -
Abstract :
In this work, in order to increase the capacity of a recurrent neural network, we present a model for extracting common features and sharing them across data. As a result of using this model, extracted principle components of data will be invariant to unwanted variations. The recurrent connection of the network removes the noise using a continuous attractor formed during the training phase. The defined speaker codes will be transformed to the information need for switching the continuous attractor in the input space. As a result, speaker variations can be compensated and the recognition will performed when a clean signal is available. We compared the performance of this method with a reference network described in the paper. The results show that the proposed model is more useful in removing noise and unwanted variations. We compared the performance of this method with the reference network. The results show that the proposed model performs better in removing noise and unwanted variations, it increased the phoneme recognition accuracy about 5% when the signal to noise ratio is 0 dB.
[1] L. Dehyadegary, S. A. Seyyedsalehi, and I. Nejadgholi, "Nonlinear enhancement of noisy speech, using continuous attractor dynamics formed in recurrent neural networks," J. Neurocomputing in Press, vol. 74, no. 17, pp. 2716-2724, Jun. 2011.
[2] M. P. Ghaemmaghami, F. Razzazi, H. Sameti, S. Dabbaghchian, and B. BabaAli, "Noise reduction algorithm for robust speech recognition using MLP neural network," in 2nd Asia - Pacific Conf. on Computational Intelligence and Industrial Applications, IEEE, vol. 2, pp. 377-380, Nov. 2009.
[3] R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality reduction by learning an invariant mapping," in Proc. of IEEE Computer Vision and Pattern Recognition Conf., pp. 1735-1742, Oct. 2006.
[4] M. Ranzato and Y. LeCun, "A sparse and locally shift invariant feature extractor applied to document images," in Proc. of IEEE Int. Conf. on Document Analysis and Recognition, vol. 2, pp. 1213-1217, Sep. 2007.
[5] H. Wersing and E. Korner, "Learning optimized features for hierarchical models of invariant object recognition," Neural Computation, vol. 15, no. 7, pp. 1559-1588, Jul. 2003.
[6] Y. Wu, X. Liu, and W. Mio, "Learning representations for object classification using multi - stage optimal component analysis," Neural Networks, vol. 21, pp. 214-221, Dec. 2008.
[7] Y. Bengio, "Learning deep architectures for AI," Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
[8] P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proc. of the 25th Int. Conf. on Machine Learning, vol. 307, pp. 1096-1103, 2008.
[9] M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Proc. of IEEE Conf. one Computer Vision and Pattern Recognition, 8 pp., Jun. 2007.
[10] M. Ranzato, Y. L. Boureau, and Y. LeCun, Sparse Feature Learning for Deep Belief Networks, MIT Press, 2008.
[11] B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete basis set: a strategy employed by V1?" Vision Research, vol. 37, no. 23, pp. 3311-3325, Dec. 1997.
[12] S. Aly, N. Tsuruta, and R. Taniguchi, "Feature map sharing hypercolumn model for shift invariant face recognition," Artificial Life and Robotics, vol. 14, no. 2, pp. 271-274, May 2009.
[13] T. P. Trappenberg, "Continuous attractor neural networks," in Recent Developments in Biologically Inspired Computing, L. N. de Castro and F. J. Von Zuben, eds., IDEA Group Publishing, 2003.
[14] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645- 678, May 2005.
[15] I. B. Ciocoiu, "Invariant pattern recognition using analog recurrent associative memories," Neurocomputing, vol. 73, no. 1-3, pp. 119-126, Dec. 2009.
[16] Z. Hu, X. Fan, Y. Song, and D. Liang, "Joint trajectory tracking and recognition based on bi - directional nonlinear learning," Image and Vision Computing, vol. 27, no. 9, pp. 1302-1312, Aug. 2009.
[17] ک. کریمی، بهکارگیری مشخصات گوینده در جهت بهبود کیفیت مدلهای بازشناخت گفتار، پاياننامه كارشناسي ارشد، دانشگاه صنعتي اميركبير، دانشكده مهندسي پزشكي، 1381.
[18] م. ولی و س. ع. سیدصالحی، "ارزیابی کارایی دو بازنمایی MFCC و LHCB در بازشناسی مقاوم به تنوعات گفتار مستقیم و تلفنی،" دهمین کنفرانس سالانه انجمن کامپیوتر ایران، جلد 1، صص. 312-305، آذر 1383.
[19] A. K. Jain, R. P. W. Duin, and J. Mao, "Statistical pattern recognition: a review," IEEE Trans., Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[20] J. Zhang and S. Z. Li, "Adaptive nonlinear auto - associative modeling through manifold learning," Lecture Notes in Computer Science, vol. 3518, pp. 599-604, May 2005.
[21] L. Gillick and S. Cox, "Some statistical issues in the comparision of speech recognition algorithms," in Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 532-535, Glasgow, UK, May 1989.