بازشناسي مقاوم و توأم گفتار مستقيم و تلفني با استخراج مناسب بردارهاي بازنمايي و اصلاح آنها توسط معكوس‌سازي شبكه‌هاي عصبي

محورهای موضوعی : مهندسی برق و کامپیوتر

1 - دانشگاه صنعتي اميرکبير
2 - دانشگاه صنعتی امیرکبیر

تاریخ دریافت : 1384/03/09 تاریخ پذیرش : 1384/09/03 تاریخ انتشار : 1385/03/31

کلید واژه: بازشناسي‌ مقاوم‌گفتاربازنماييشبكه ‌‌عصبيمعكوس‌سازي,

چکیده مقاله :

در حال حاضر تلاش فراگيري براي طراحي سيستم‌هاي بازشناسي گفتار مقاوم نسبت به تنوعات گفتار صورت مي‌گيرد. يكي از اين تنوعات، گفتار تلفني نسبت به گفتار مستقيم (تهيه شده در شرايط عاري از هر گونه نويز محيط) مي‌باشد. در مقاله حاضر با بهره‌گيري از پارامترهاي طيفي LHCB و طراحي يك سري آزمايشهاي عملي مشخص مي‌گردد كه اين نوع بازنمايي براي طراحي سيستم‌هاي بازشناسي گفتار تلفني و سيستم‌هاي بازشناسي توأم گفتار مستقيم و تلفني كه مبتني بر شبكه‌هاي عصبي باشد نسبت به روش متداول MFCC مناسب‌تر است. سپس با استخراج بردارهاي بازنمايي LHCB از گفتار مستقيم و تلفني و طراحي مدل بازشناسي گفتار مبتني بر شبكه عصبي MLP، يك سيستم بازشناسي توأم گفتار مستقيم و تلفني ساخته مي‌شود. آنگاه با استفاده از معكوس‌سازي شبكه‎هاي عصبي به روش گراديان بردارهاي بازنمايي گفتار تلفني به سمت بردارهاي بازنمايي گفتار مستقيم اصلاح مي‌گردد و با تعليم شبكه ديگري روي دادگان اصلاح شده تلفني و دادگان مستقيم دست نخورده، افزايش4/1٪ در صحت بازشناسي گفتار تلفني حاصل شده است. در مرحله بعد با استفاده از معكوس سازي عمومي شبكه‌هاي عصبي هر دو دسته بردارهاي بازنمايي گفتار مستقيم و تلفني به گونه‌اي اصلاح مي‌شوند كه بيشتر حاوي اطلاعات آوايي گفتار باشند و ساير تنوعات تا جاي ممكن حذف شوند. با تعليم شبكه ديگري روي اين دادگان اصلاح شده افزايش 98/2٪ در صحت بازشناسي گفتار تلفني و 68/1٪ در صحت بازشناسي گفتار مستقيم بدست آمده است.

چکیده انگلیسی:

A vast amount of research is going on for design of robust speech recognition in to alleviate speech variability conditions. One of the variability aspects is the difference between telephony speech and direct speech (recorded in noise free conditions). In this paper by using a set of experiments, it is shown that LHCB parameters are superior to traditional MFCCs for speech recognition applications when they are used in a neural network based speech recognition system for both direct and telephony speech. Then by extraction of LHCBs from direct and telephony speech, and training of a MLP based speech recognition model, a direct and telephony speech recognition system is developed. Using a neural network inversion based on gradient descent method, the telephony speech feature vectors are modified toward to the direct speech feature vectors and by training a second network on modified telephony and direct speech feature vectors a 1.4% enhancement on speech recognition was achieved. Later, using general inversion method of neural networks both telephony and direct speech feature vectors are modified in a manner which mainly contains phonetic information and not other speech variations. Then by the training of the second neural network on this dataset, the system achieved 2.98% and 1.68% higher recognition rate for direct and telephony speech, respectively.

منابع و مأخذ:

[1] S. Fouri, "Robust methods in automatic speech recognition and understanding," in Proc. Eurospeech, pp. 1993-1997, Geneva, Switzerland, 2003.
[2] Y. Gong, "Speech recognition in noisy environments: A survey," Speech Communication, vol. 16, no. 3, pp. 261-291, Apr. 1995.
[3] C. H. Lee and Q. Huo, "On adaptive decision rules and decision parameter adaptation for automatic speech recognition," in Proceedings of the IEEE, vol. 88, pp. 1241- 1269, Aug. 2000.
[4] A. Martin, J. Fiscus, B. Fisher, D. Pallet, and M. Przybocki, "System Descriptions and Performance Summary," presented at the Conversational Speech Recognition Workshop: DARPA Hub-5E Evaluation, Baltimore, Maryland, US, May 1997.
[5] D. Yuk and J. Flanagan, "Telephone speech recognition using neural networks and hidden Markov models," in Proc. ICASSP, pp. 157-160, 1999.
[6] S. Thrun, "Is learning the n-th thing any easier than learning the first?" Advances in Neural Information Processing Systems, MIT Press, 1996.
[7] S. Ben-David and R. Schuller, "Exploiting task relatedness for multiple task learning," Lecture Notes in Computer Science, vol. 2777, pp. 567 - 580,2003.
[8] C. W. Omlin and C. L. Giles, "Training second-order recurrent neural networks using hints," in Proc. of the Ninth International Conference on Machine Learning., pp. 363-368, 1992.
[9] S. Parveen and P. Green, "Multitask learning in connectionist robust ASR using recurrent neural networks," in Proc. Eurospeech, pp. 1813-1816, Geneva, Switzerland, Sep. 2003.
[10] P., Niyogi and et al. "Incorporating prior information in machine learning by creating virtual examples," in Proceedings of the IEEE, vol. 86, no. 11, pp. 2196-2209, Nov. 1998.
[11] الف. نژادقلي، بازشناخت مقاوم گفتار نسبت به تنوعات مختلف گوينده در شبكه‌هاي عصبي بازشناخت گفتار، پايان‌نامه كارشناسي ارشد، دانشگاه صنعتي اميركبير، دانشكده مهندسي پزشكي، 1382.
[12] A. Blumer, A. Ehrenfeucht, D. Haussler, and M.Warmuth, "Learnability and the Vapnik-Chervo-Nenkis dimention," J. Ass. Comput. Match., vol.36, no.4, pp. 929-965, 1989.
[13] M. Bijankhan, J. Seikhzadeghan, M. R. Roohani, Y. Samareh, K. Lucas, M. Tebyani., "FARSDAT: the speech database of Farsi spoken language," in Proc. SST-94, pp. 826-831, Perth, Australia, 1994.
[14] S. B. Davis and P. Mermelstein, "Comparison of parametric representations of monosyllabic word recognition in continuously spoken sentences," IEEE Trans. ASSP, vol. 28, no. 4, pp. 357-366, Aug. 1980.
[15] م. رحيمي‎نژاد، توسعه و بهبود كيفيت روشهاي استخراج پارامترهاي بازنمايي در سيستم‌هاي بازشناخت گفتار، پايان نامه كارشناسي ارشد، دانشگاه صنعتي اميركبير، دانشكده مهندسي پزشكي، 1381.
[16] J. Han and W. Gao, "Robust telephone speech recognition based on channel compensation," Journal of Pattern Recognition Society, vol. 32, no.6, pp. 1061-1067, Jun. 1999.
[17] C. A. Jensen, et al., "Inversion of feedforward neural networks: algorithms and applications," Proceedings of the IEEE, vol. 87, no. 9, pp. 1536-1549, Sep. 1999.
[18] R. J., Williams, "Inverting a connectionist network mapping by backpropagation of error," in Proc 8th Annu. Conf. Cognitive Science Society, pp. 859-865, 1986.

اشتراک گذاری

آدرس مقاله

بازشناسي مقاوم و توأم گفتار مستقيم و تلفني با استخراج مناسب بردارهاي بازنمايي و اصلاح آنها توسط معكوس‌سازي شبكه‌هاي عصبي

رایمگ

پیوندهای سایت

مراکز مرتبط

پشتیبانی

صفحات رسمی