بهبود تولیدکننده‌های گفتار سازه‌ای و پیوندی با الهام از عملکرد فشرده‌ساز‌های گفتار

الموضوعات : electrical and computer engineering

1 - دانشگاه صنعتی امیرکبیر
2 - دانشگاه صنعتی امیرکبیر

تاريخ الإرسال : 16 السبت , صفر, 1437 تاريخ التأكيد : 16 السبت , صفر, 1437 تاريخ الإصدار : 19 الثلاثاء , رجب, 1432

الکلمات المفتاحية: STRAIGHT تحریک چندباندی روش پیوندی روش سازه‌ای فشرده‌ساز,

ملخص المقالة :

این مقاله به پیاده‌سازی و بهبود بخش تولید گفتار از یک سیستم تبدیل متن به گفتار می‌پردازد. با این هدف، روش تولید پیوندی مبتنی بر روش جمع هم‌پوشان با پریود صحیح و تحریک چندباندی و روش تولید سازه‌ای برای زبان فارسی پیاده‌سازی شده و به‌منظور بهبود در کیفیت خروجی از قدرت فشرده‌سازهای گفتاری استفاده شده است. به‌عبارت دیگر در ایده مطرح‌شده در این مقاله برای رفع مشکلات تولیدکننده‌های گفتار از فشرده‌سازهای موجود استفاده شده است. به این ترتیب که از فشرده‌ساز STRAIGHT برای هموارسازی طیفی در تولیدکننده پیوندی و از فشرده‌ساز پیشگویی خطی با تحریک ترکیبی در تولید گفتار به روش سازه‌ای بهره گرفته شده است. نتایج ارزیابی‌ها نشان داده که استفاده از این فشرده‌سازها به کاهش ناپیوستگی‌ها در تولیدکننده پیوندی و افزایش معیارهای قابلیت فهم و طبیعی‌بودن در تولیدکننده سازه‌ای کمک کرده است.

المصادر:

[1] D. O'Shaughnessy, Speech Communication: Human and Machine, NewYork, Addison - Wesley, 1990.
[2] D. Klatt, "Software for a cascade/parallel formant synthesizer," J. of Acoustical Society of America, vol. 67, no. 3, pp. 971-995, Mar. 1980.
[3] P. Kabal, Code Excited Linear Prediction Coding of Speech at 4.8 kb/s. Technical Report 87-36, INRS-Telecommunications, University of Quebec, 1987.
[4] T. Moriya and M. Honda, "A mixed excitation LPC vocoder model for low bit rate speech coding," IEEE Trans. Speech, Audio Processing, vol. 3, no. 4, pp. 242-250, Jul. 1986.
[5] D. Griffin and J. Lim, "Multiband excitation vocoder," IEEE Trans. on Acoustics, Speech, and Signal Processing, vol. 36, no. 8, pp. 1223-1235, Aug. 1988.
[6] H. Kawahara, I. Masuda - Katsuse, and A. Cheveigne, "Restructuring speech representations using a pitch - adaptive time - frequency smoothing and an instantaneous - frequencybased F0 extraction," Speech Communication, vol. 27, no. 3, pp. 187-207, Apr. 1999.
[7] H. Zen and T. Toda, "An overview of Nitech HMM - based speech synthesis system for blizzard challenge 2005," in Proc. of Interspeech, pp. 93-96, Sep. 2005.
[8] H. Matsui and H. Kawahara, "Investigation of emotionally morphed speech perception and its structure using a high quality speech manipulation system," in Proc. 8th European Conf. on Speech Communication and Technology, pp. 2113-2116, 1-4 Sep. 2003.
[9] T. Yonezawa, N. Suzuki, K. Mase, and K. Kogure, "Gradually changing expression of singing voice based on morphing," in Proc. of Interspeech, pp. 541-544, Sep. 2005.
[10] F. Charpentier and M. G. Stella, "Diphone synthesis using an overlap add technique for speech waveform concatenation," Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, pp. 2015-2018, Apr. 1986.
[11] T. Dutoit, An Introduction to Text - to - Speech Synthesis, The Netherlands: Kluwer, 1997.
[12] T. Dutoit and H. Leich, "MBR- PSOLA: text to speech synthesis based on a MBE re - synthesis of the segments database," Speech Communication, vol. 13, no. 3, pp. 435-440, Nov. 1993.
[13] B. Bozkurt, T. Dutoit, C. D'Alessandro, V. Pagel, and R. Prudon, "Improving quality of MBROLA synthesis for non-uniform units synthesis," in Proc. IEEE Workshop Speech Synthesis, pp. 7-9, 11-13 Sep. 2002.
[14] A. Mihelic and J. Zganec-Gros, "Efficient unit-selection in text-to-speech synthesis," in Proc. of the 11th Int. Conf. on Text, Speech, and Dialogue, pp. 411-418, 2008.
[15] T. David, J. Chappell, and H. Hansen, "A comparison of spectral smoothing methods for segment concatenation based speech synthesis," Speech Communication, vol. 36, no. 3-4, pp. 343-373, Mar. 1998.
[16] R. C. Snell and F. Milinazzo, "Formant location from LPC analysis data," IEEE Trans. on Speech and Audio Processing, vol. 1, no. 2, pp. 129-134, Apr. 1993.
[17] ح. قادري، توليد گفتار فارسي از روي دنباله آوايي از طريق مدل‌کردن ساختار گوياي انسان، پايان‌نامه کارشناسي ارشد مهندسي کامپيوتر، دانشگاه صنعتي شريف، 1377.

شارک

عنوان URL للمقالة

بهبود تولیدکننده‌های گفتار سازه‌ای و پیوندی با الهام از عملکرد فشرده‌ساز‌های گفتار

رایمگ

الروابط

المراكز ذات الصلة

دعامة

الصفحات الرسمية