Design and Implementation of a Text to Speech system for Kurdish Language with It's Quality Assessment
Subject Areas : electrical and computer engineeringW. Barkhoda 1 , A. Bahrampour 2 , F. Akhlaqian 3 , H. Faili 4
1 -
2 -
3 -
4 - University of Tehran
Abstract :
In this paper the first text to speech system for Kurdish language has been introduced. Kurdish language has two standard scripts, Arabic and Latin. In the text analysis part besides treating common problems in various Kurdish texts, the problems involved in both standard scripts have been dealt with. Also, standard symbols have been introduced into which the system converts the input texts in each of the two scripts. For the first time for Kurdish language, intonation patterns for various sentence types have been determined. In the speech production part, three different synthesis systems based on allophone, syllable, and diaphone have been implemented. For quality assessment of the above mentioned systems and their comparison with each other, the four tests of MOS, Intelligibility, DRT, and MRT have been used. The test results show the high intelligibility of our systems, especially the system based on diaphone.
[1] E. S. Rawski, The Last Emperors: a Social History of Qing Imperial Institutions, Berkeley and Los Angeles: University of California Press, ISBN 0520212894, 1998.
[2] A. Black, CHATR Version 0.8: A Generic Speech Synthesis, System Documentation, ATR-Interpreting Telecommunications Laboratories, Kyoto, Japan, 1996.
[3] A. Hunt and A. Black, "Unit selection in a concatenative speech synthesis system using a large speech database," in IEEE Int. Conf. on Acoustics, Speech and Signal Processing, ICASSP 96, vol. 1, pp. 373-376, Atlanta, Georgia, 7-10 May 1996.
[4] M. Beutnagel, A. Conkie, J. Schroeter, Y. Stylianou, and A. Syrdal, "The AT&T NEXT-GEN TTS System," Joint Meeting of ASA, EAA, and DAGA, 1999.
[5] T. Dutoit, High Quality Text - to - Speech Synthesis of the French Language, Ph.D. Dissertation, the Faculté Polytechnique de Mons, 1993.
[6] T. Dutoit, F. Bataille, V. Pagel, N. Pierret and O. van der Vreken, "The MBROLA project: towards a set of high quality speech synthesizers free of use of non commercial purposes," in Proc. Fourth Int. Conf. on Spoken Language Processing, vol. 3, pp. 1393-1396, Philadelphia, US, 3-6 Oct. 1996.
[7] W. Hamza, Arabic Speech Synthesis Using Large Speech Database, Ph.D. Thesis, Cairo University, Electronics and Communications Engineering Department, 2000.
[8] A. Youssef and O. Emam, "An Arabic TTS system based on the IBM trainable speech synthesizer," Le traitement automatique de l arabe, JEP TALN 2004, 2004.
[9] F. Chouireb, M. Guerti, M. Naïl, and Y. Dimeh, "Development of a prosodic database for standard Arabic," Arabian J. for Science and Engineering, vol. 32, no. 2B, pp. 251-262, Oct. 2007.
[10] A. Ramsay and H. Mansour, "Towards including prosody in a text-to-speech system for modern standard Arabic," Computer Speech and Language, vol. 22, no. 1, pp. 84-103, Jan. 2008.
[11] H. Al-Muhtaseb, M. Elshafei, and M. Al-Ghamdi, "Techniques for High Quality Arabic Speech Synthesis," Information Sciences, vol. 140, pp. 255-267, 2002.
[12] I. Amdal and T. Svendsen, "A speech synthesis corpus for Norwegian," in Proc. Fifth Int. Conf. on Language Resources and Evaluation (LREC'06), pp. 1373-1376, Genova, Italy, 2006.
[13] K. Yoon, "A prosodic phrasing model for a Korean text-to-speech synthesis system," Computer Speech & Language, vol. 20, no. 1, pp. 69-79, Jan. 2006.
[14] P. Zervas, I. Potamitis, N. Fakotakis, and G. Kokkinakis, "A Greek TTS based on non uniform unit concatenation and the utilization of festival architecture," in Proc. First Balkan Conf. on Informatics, pp. 662-668, Thessalonica, Greece, 21-23 Nov. 2003.
[15] A. Farrokhi, S. Ghaemmaghami, and M. Sheikhan, "Estimation of prosodic information for Persian text-to-speech system using a recurrent neural network," in Proc. Speech Prosody 2004, Nara, Japan, 23-26 Mar. 2004.
[16] H. R. Abutalebi and M. Bijankhan, "Implementation of a text-to -speech system for Farsi language," in Proc. Sixth Int. Conf. on Spoken Language Processing, vol. 1, pp. 661-664, Beijing, China, Oct. 2000.
[17] F. Hendessi, A. Ghayoori, and T. A. Gulliver, "A speech synthesizer for Persian text using a neural network with a smooth ergodic HMM," ACM Trans. on Asian Language Information Processing (TALIP), vol. 4, no. 1, pp. 38-52, Mar. 2005.
[18] A. Koochari, M. Namnabat, S. M. Kasaeiyan, and A. Niazade, "Duration modeling for Persian text-to-speech system by neural network," in Proc. Int. Conf. on Multidisciplinary Information Sciences & Technologies, InSciT2006, Mirida, Spain, 25-28 Oct. 2006.
[19] M. Namnabat and A. Koochari, "Generating F0 contours for speech synthesis in Persian language using classification and regression tree," in Proc. 12th Int. Computer Society of Iran Computer Conf., CSICC’07, Tehran, Iran, 20-22 Feb. 2007.
[20] M. M. Homayounpour and M. Namnabat, "FarsBayan: a unit selection based Farsi speech synthesizer," in Proc. Nineth Int. Conf. on Spoken Language Processing, InterSpeech 2006-ICSLP, Pittsburgh, US, 17-21 Sep. 2006.
[21] M. Namnabat and M. M. Homayounpour, "A letter to sound system for Farsi Language using neural networks," in Proc. Int. Conf. on Signal Processing, ICSP2006, vol. 1, Beijing, China, 16-20 Nov. 2006.
[22] S. Baban, Phonology and Syllabication in Kurdish Language, Kurdish Academy Press, First Edition, Arbil, 2005. (in Kurdish)
[23] W. M. Thackston, Sorani Kurdish: a Reference Grammar with Selected Reading, Harvard: Iranian Studies at Harvard University, 2006.
[24] ع. رخزادی، آواشناسی و دستور زبان کردی، انتشارات ترفرند، تهران، 1380.
[25] م. کاوه، زبانشناسی و دستور زبان کردی (لهجه سقزی)، انتشارات احسان، ویرایش اول، تهران، 1385.
[26] و. بارخدا، طراحی و پیادهسازی سیستم تبدیل متن به گفتار در زبان کردی، پایاننامه کارشناسی ارشد، گروه کامپیوتر و فناوری اطلاعات، دانشگاه کردستان، 1388.
[27] R. J. Deller Jr., J. G. Proakis, and J. H. Hansen, Discrete Time Processing of Speech Signals, John Wiley and Sons, 2000.
[28] F. Daneshfar, W. Barkhoda, and B. ZahirAzami, "Implementation of a Text-to-Speech System for Kurdish Language," in Proc. Fourth Int. Conf. on Digital Telecommunications, ICDT'09, pp. 117-120, Colmar, France, 20-25 Jul. 2009.
[29] J. T. Sejnowski and R. Rosenberg, Parallel Networks that Learn to Pronounce English Text, the Johns Hopkins University, Complex Systems Inc, pp. 145-168, 1987.
[30] S. Lemmetty, Review of Speech Synthesis Technology, M. Sc Thesis, Helsinki University of Technology, 1999.
[31] M. N. Rao, S. Thomas, T. Nagarajan, and H. A. Murthy, "Text-to-speech synthesis using syllable-like units," in Proc. of National Conf. on Communications, pp. 277-280, IIT Kharagpur, India, Jan. 2005.
[32] W. Barkhoda, B. ZahirAzami, O. Shahryari, and A. Bahrampour, "A comparison between allophone, syllable, and diphone based TTS systems for Kurdish language," IEEE Int. Symp. on Signal Processing and Information Technology, ISSPIT'09, pp. 557-562, Ajman, UAE, 14-17 Dec. 2009.
[33] م. شيخان، م. نصيرزاده و ع. دفتريان، "طراحی و پيادهسازی سيستم تبديل متن به گفتار طبيعی برای زبان فارسی،" مجله علمی پژوهشی دانشكده مهندسی دانشگاه فردوسی مشهد، سال 17، شماره 2، صص. 48-31، 1384.