بازشناسی مقاوم به نویز و تنوعات گفتار از طریق به اشتراک گذاشتن مؤلفه‌های مشترک

الموضوعات : مهندسی برق و کامپیوتر

پروین زارعی اسکی کند ¹ , سیدعلی سیدصالحی ²

1 - دانشگاه صنعتی امیرکبیر
2 - دانشگاه صنعتی امیرکبیر

تاريخ الإرسال : 16 السبت , صفر, 1437 تاريخ التأكيد : 16 السبت , صفر, 1437 تاريخ الإصدار : 19 الثلاثاء , رجب, 1432

الکلمات المفتاحية: استخراج مؤلفه‌های اساسی بازشناسی گفتار مقاوم به نویز به اشتراک گذاشتن مؤلفه‌های مشترک جاذب پیوسته پویا کاهش بعد غیر خطی,

ملخص المقالة :

یکی از روش‌های بهبود عملکرد سامانه‌های بازشناسی در برابر نویز و یا تنوعات ناخواسته، استخراج اطلاعات مشترک بین داده‌های مختلف ورودی می‌باشد. در مورد شبکه‌هایی که ظرفیت بسیار پایینی دارند امکان ذخیره‌سازی الگوها به‌صورت مفاهیم جداگانه وجود ندارد، لذا کیفیت بازشناسی شدیداً افت پیدا می‌کند. در این مقاله ساختاری ارائه شده است که بتواند زیرفضای مشترک بین داده‌های ورودی را استخراج کرده و آن را در میان گویندگان مختلف به اشتراک بگذارد. ساختار چندتکلیفی شبکه این امکان را فراهم می‌کند که این زیرفضا به‌صورت یک جاذب پیوسته واحد شکل بگیرد که این جاذب نسبت به تنوعاتی مانند تغییرات گوینده در فضای ورودی پویا می‌باشد. لذا داده‌های ورودی آغشته به نویز توسط یک نگاشت غیر خطی به یک مانیفولد در ابعاد پایین فیلتر می‌شوند که پویایی این مانیفولد مقاوم‌بودن آن را نسبت به تنوعاتی مثل تغییر گوینده تأمین می‌کند. اتصالات بازگشتی در طی روند تعلیم، یک جاذب پیوسته را در فضای ورودی شکل می‌دهند که کدهای گوینده به اطلاعات لازم جهت پویاسازی این جاذب تبدیل می‌شوند. پس از فرایند جذب‌شدن داده آغشته به نویز، عمل بازشناسی بر روی داده تمیز حاصله اعمال می‌شود. استخراج و به اشتراک گذاشتن مؤلفه‌های مشترک در این ساختار توانسته است کارایی جاذب‌ها را در بازشناسی مقاوم آوا تا حدود 5% نسبت به مدل مشابه، بدون پویایی جاذب‌ها، در نسبت سیگنال به نویز dB 0 بهبود بخشد.

المصادر:

[1] L. Dehyadegary, S. A. Seyyedsalehi, and I. Nejadgholi, "Nonlinear enhancement of noisy speech, using continuous attractor dynamics formed in recurrent neural networks," J. Neurocomputing in Press, vol. 74, no. 17, pp. 2716-2724, Jun. 2011.
[2] M. P. Ghaemmaghami, F. Razzazi, H. Sameti, S. Dabbaghchian, and B. BabaAli, "Noise reduction algorithm for robust speech recognition using MLP neural network," in 2nd Asia - Pacific Conf. on Computational Intelligence and Industrial Applications, IEEE, vol. 2, pp. 377-380, Nov. 2009.
[3] R. Hadsell, S. Chopra, and Y. LeCun, "Dimensionality reduction by learning an invariant mapping," in Proc. of IEEE Computer Vision and Pattern Recognition Conf., pp. 1735-1742, Oct. 2006.
[4] M. Ranzato and Y. LeCun, "A sparse and locally shift invariant feature extractor applied to document images," in Proc. of IEEE Int. Conf. on Document Analysis and Recognition, vol. 2, pp. 1213-1217, Sep. 2007.
[5] H. Wersing and E. Korner, "Learning optimized features for hierarchical models of invariant object recognition," Neural Computation, vol. 15, no. 7, pp. 1559-1588, Jul. 2003.
[6] Y. Wu, X. Liu, and W. Mio, "Learning representations for object classification using multi - stage optimal component analysis," Neural Networks, vol. 21, pp. 214-221, Dec. 2008.
[7] Y. Bengio, "Learning deep architectures for AI," Foundations and Trends in Machine Learning, vol. 2, no. 1, pp. 1-127, 2009.
[8] P. Vincent, H. Larochelle, Y. Bengio, and P. A. Manzagol, "Extracting and composing robust features with denoising autoencoders," in Proc. of the 25th Int. Conf. on Machine Learning, vol. 307, pp. 1096-1103, 2008.
[9] M. Ranzato, F. Huang, Y. Boureau, and Y. LeCun, "Unsupervised learning of invariant feature hierarchies with applications to object recognition," in Proc. of IEEE Conf. one Computer Vision and Pattern Recognition, 8 pp., Jun. 2007.
[10] M. Ranzato, Y. L. Boureau, and Y. LeCun, Sparse Feature Learning for Deep Belief Networks, MIT Press, 2008.
[11] B. A. Olshausen and D. J. Field, "Sparse coding with an overcomplete basis set: a strategy employed by V1?" Vision Research, vol. 37, no. 23, pp. 3311-3325, Dec. 1997.
[12] S. Aly, N. Tsuruta, and R. Taniguchi, "Feature map sharing hypercolumn model for shift invariant face recognition," Artificial Life and Robotics, vol. 14, no. 2, pp. 271-274, May 2009.
[13] T. P. Trappenberg, "Continuous attractor neural networks," in Recent Developments in Biologically Inspired Computing, L. N. de Castro and F. J. Von Zuben, eds., IDEA Group Publishing, 2003.
[14] R. Xu and D. Wunsch, "Survey of clustering algorithms," IEEE Trans. Neural Networks, vol. 16, no. 3, pp. 645- 678, May 2005.
[15] I. B. Ciocoiu, "Invariant pattern recognition using analog recurrent associative memories," Neurocomputing, vol. 73, no. 1-3, pp. 119-126, Dec. 2009.
[16] Z. Hu, X. Fan, Y. Song, and D. Liang, "Joint trajectory tracking and recognition based on bi - directional nonlinear learning," Image and Vision Computing, vol. 27, no. 9, pp. 1302-1312, Aug. 2009.
[17] ک. کریمی، به‌کارگیری مشخصات گوینده در جهت بهبود کیفیت مدل‌های بازشناخت گفتار، پايان‌نامه كارشناسي ارشد، دانشگاه صنعتي اميركبير، دانشكده مهندسي پزشكي، 1381.
[18] م. ولی و س. ع. سیدصالحی، "ارزیابی کارایی دو بازنمایی MFCC و LHCB در بازشناسی مقاوم به تنوعات گفتار مستقیم و تلفنی،" دهمین کنفرانس سالانه انجمن کامپیوتر ایران، جلد 1، صص. 312-305، آذر 1383.
[19] A. K. Jain, R. P. W. Duin, and J. Mao, "Statistical pattern recognition: a review," IEEE Trans., Pattern Analysis and Machine Intelligence, vol. 22, no. 1, pp. 4-37, Jan. 2000.
[20] J. Zhang and S. Z. Li, "Adaptive nonlinear auto - associative modeling through manifold learning," Lecture Notes in Computer Science, vol. 3518, pp. 599-604, May 2005.
[21] L. Gillick and S. Cox, "Some statistical issues in the comparision of speech recognition algorithms," in Proc. IEEE Conf. on Acoustics, Speech and Signal Processing, vol. 1, pp. 532-535, Glasgow, UK, May 1989.

شارک

عنوان URL للمقالة

بازشناسی مقاوم به نویز و تنوعات گفتار از طریق به اشتراک گذاشتن مؤلفه‌های مشترک

رایمگ

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية