مقاوم‌سازی بازشناسی صحبت با به کارگیری فیلتر غیر خطی نامتقارن و استفاده از ویژگی‌های طیفی دلتا

محورهای موضوعی : مهندسی برق و کامپیوتر

1 - دانشگاه بیرجند
2 - دانشگاه پیام‌نور واحد مشهد

تاریخ دریافت : 1394/09/08 تاریخ پذیرش : 1394/09/08 تاریخ انتشار : 1392/03/31

کلید واژه: بازشناسی صحبت ضرایب کپسترال ‌نرمالیزه‌شده توان فیلتر غیر خطی نامتقارن ویژگی‌های کپسترال دلتا,

چکیده مقاله :

در این مقاله یک الگوریتم استخراج ویژگی مقاوم به نویز را پیشنهاد می‌دهیم. در این الگوریتم به منظور استخراج ویژگی از یک فیلتر غیر خطی و پوشش موقتی استفاده می‌شود و با بهره‌گیری ازویژگی دلتا- طیفی به جای ویژگی کپسترال- دلتا دقت بازشناسی صحبت به طور مطلوبی افزایش می‌یابد. تقریباً همه سیستم‌های خودکار تشخیص صحبت (ASR) کنونی از ویژگی‌های کپسترال- دلتا و دلتا- دلتا برای استخراج ویژگی صحبت استفاده می‌کنند. در این مقاله هدف، رسیدن به ویژگی‌های مقاومی است که در شرایط مختلف نویزی بهبود بیشتری برای بازشناسی صحبت فراهم می‌آورد. برای تحقق این امر بر روی برخی از مشخصات کلیدی صحبت (خصوصاً مشخصات غیر ایستان صحبت) متمرکز شده که با سیگنال‌های نویزی اختلاف دارد. نتایج آزمایش‌های انجام‌گرفته نشان می‌دهد که دقت بازشناسی در مقایسه با MFCC و PLP در حضور انواع مختلف نویز بهبود یافته است.

چکیده انگلیسی:

In this paper, we propose a new feature extraction algorithm which is robust against noise. In the proposed algorithm, a non-linear filter with temporal masking are used for speech feature extraction and by applying delta spectral characteristics instead of delta cepstral, the accuracy of speech recognition is improved. Almost, all present Automatic Speech Recognition (ASR) systems use cepstral-delta and delta-delta characteristics for speech feature extraction. The aim of this paper is to reach the robust speech features which provide more accurate speech recognition under different noisy conditions. This is achieved by focusing on speech key features (especially non-stationary speech features) which highly differ from the noise signals. The obtaining experimental results show that the accuracy of speech recognition improves in comparison with traditional methods such as PLP and MFCC.

منابع و مأخذ:

[1] B. Atal, "Effectiveness of linear prediction characteristics of the speech wave for automatic speaker identification and verification," J. of the Acoustical Society of America, vol. 55, no. 6, pp. 1304-1312, Jun. 1974.
[2] P. Jain and H. Hermansky, "Improved mean and variance normalization for robust speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 6, pp. 80-85, May 2001.
[3] X. Huang, A. Acero, and H. W. Won, Spoken Language Processing: A Guide to Theory, Algorithm, and System Development, Upper Saddle River, NJ: Prentice Hall, 2001.
[4] Y. Obuchi, N. Hataoka, and R. M. Stern, "Normalization of time-derivative parameters for robust speech recognition in small devices," IEICE Trans. on Information and Systems, vol. 87, no. 4, pp. 1004-1011, Spring 2004.
[5] P. J. Moreno, B. Raj, and R. M. Stern, "A vector Taylor series approach for environment-independent speech recognition," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2, pp. 733-736, 7-10 May 1996.
[6] R. M. Stern, B. Raj, and P. J. Moreno, "Compensation for environmental degradation in automatic speech recognition," in Proc. of the ESCA Tutorial and Research Workshop on Robust Speech Recognition for Unknown Communication Channels, vol. 2, pp. 33-42, Apr. 1997.
[7] C. Kim and R. M. Stern, "Power function-based power distribution normalization algorithm for robust speech recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, vol. 3, pp. 188-193, Nov. 2009.
[8] B. Raj, V. N. Parikh, and R. M. Stern, "The effects of background music on speech recognition accuracy," in Proc. IEEE Int. Conf. Acoust., Speech, and Signal Processing, vol. 2, pp. 851-854, Apr. 1997.
[9] B. Raj and R. M. Stern, "Missing-feature methods for robust automatic speech recognition," IEEE Signal Processing Magazine, vol. 22, no. 5, pp. 101-116, Apr. 2005.
[10] H. Hermansky, "Perceptual linear prediction analysis of speech," J. Acoust. Soc. Am., vol. 87, no. 4, pp. 1738-1752, Apr. 1990.
[11] C. Kim, Y. H. Chiu, and R. M. Stern, "Physiologically-motivated synchrony-based processing for robust automatic speech recognition," in Proc. INTERSPEECH-2006 Conf., pp. 1975-1978, Sep. 2006.
[12] H. Hermansky and N. Morgan, "RASTA processing of speech," IEEE. Trans. Speech Audio Process., vol. 2, no. 4, pp. 578-58, Oct. 1994.
[13] B. E. D. Kingsbury, N. Morgan, and S. Greenberg, "Robust speech recognition using the modulation spectrogram," Speech Communication, vol. 25, no. 1-3, pp. 117-132, May 1998.
[14] H. G. Hirsch and C. Ehrlicher, "Noise estimation techniques or robust speech recognition," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 153-156, May 1995.
[15] C. Kim and R. M. Stern, "Nonlinear enhancement of onset for robust speech recognition," in Proc. INTERSPEECH-2010 Conf., vol. 1, pp. 2058-2061, Sep. 2010.
[16] S. F. Boll, "Suppression of acoustic noise in speech using spectral subtraction," IEEE Trans. Acoustic, Speech, and Signal Processing, vol. 27, no. 2, pp. 113-120, Apr. 1979.
[17] C. Kim and R. M. Stern, "Power function-based power distribution normalization algorithm for robust speech recognition," in Proc. IEEE Automatic Speech Recognition and Understanding Workshop, vol. 1, pp. 188-193, Dec. 2009.
[18] C. Kim and R. M. Stern, "Feature extraction for robust speech recognition based on maximizing the sharpness of the power distribution and on power flooring," in Proc. IEEE Int. Conf. on Acoust., Speech, and Signal Processing, vol. 1, pp. 4574-4577, May 2010.
[19] S. Furui, "Speaker-independent isolated word recognition based on emphasized spectral dynamics," in Proc. IEEE Int. Conf. on Acoustics, Speech, and Signal Processing, vol. 1, pp. 1991-1994, Apr. 1986.
[20] M. Bijankhan and J. Sheikhzadegan, "FARSDAT-the speech database of farsi spoken language," in Proc. 5th Australian Int. Conf. on Speech Science & Tech., vol. 2, pp. 826-831, Dec. 1994.
[21] SPIB, SPIB Noise Data, Available from: http://spib.rice.edu/spib/select_noise.html

اشتراک گذاری

آدرس مقاله

مقاوم‌سازی بازشناسی صحبت با به کارگیری فیلتر غیر خطی نامتقارن و استفاده از ویژگی‌های طیفی دلتا

رایمگ

پیوندهای سایت

مراکز مرتبط

پشتیبانی

صفحات رسمی