Letter to Sound Conversion for Persian Language Using Multi Layer Perceptrons
Subject Areas : electrical and computer engineeringM. Namnabat 1 , M. M. Homayounpour 2
1 -
2 -
Keywords: Letter to soundtext to phonemegrapheme to phonemeFarsi languageneural network,
Abstract :
Construction of letter to sound (LTS) conversion systems in Persian is a difficult task. Because of the omission of some vowels in Farsi orthography, these systems in general have low efficiencies. In this paper, the structure of a letter to sound system, having three-layer architecture, was presented. The first layer is rule-based, and the second layer consists of five multi layer perceptron (MLP) neural networks and a controller section for pronunciations determination. The third layer has a MLP network for detection of geminated letters by using results obtained from the previous steps. The proposed system is designed to produce rational pronunciations for every word, where the rational pronunciation means a phonetic transcription, which follows the correct Farsi syllabification structure and the obvious rules of phonetics. The authors have achieved 88% and 61% correct letters and words performance respectively, which is quite satisfactory for a Farsi language LTS system. The correct letter criterion is the percentage of letters for which the pronunciations have been determined correctly and the correct word criterion is the percentage of words for which the pronunciations of the constituting letters have been determined correctly.
[1] R. I. Damper, Y. Marchand, J. -D. S. Marsters, and A. Bazin, "Aligning letters and phonemes for speech synthesis," in Proc. 5th ISCA Speech Synthesis Workshop, pp. 209-213, Jun. 2004.
[2] J. Suontausta and J. Hakkinenen, "Decision tree based text-to-phoneme mapping for speech recognition," in Proc. ICSLP, vol. 2, pp. 831-834, Beijing, China, Oct. 2000.
[3] N. McCulloch, M. Bedworth, and J. Bridle, "NETspeak a re-implementation of NETtalk," Computer Speech and Language, vol. 2, no. 3/4, pp. 289-301, Jun. 1987.
[4] R. I. Damper and J. F. G. Esatmond, "Pronuncing text by analogy," in 16th International Conf. of Computational Linguistics, vol. 2, pp. 268-273, Madrid, Spain, Jul. 1996.
[5] M. Norris, "Time, memory, change and structure in the NETtalk text-to-speech network," in Proc. ACNN’96 Cognitive Models, Workshop Case Study, vol. 2, no. 7, 1996.
[6] T. J. Sejnowski and C. R. Rosenberg, "Parallel networks that learn to pronounce English text," Complex Systems, vol. 1, no. 1, pp. 145-168, Feb. 1987.
[7] O. Andersen, "Comparison of two tree-structured approaches for grapheme-to-phoneme conversion," in Proc. ICSLP’96, vol. 3, pp. 1700-1703, Oct. 1996.
[8] A. K. Kienappel and R. Kneser, "Designing very compact decision trees for grapheme-to-phoneme transcription," in Proc. Eurospeech, pp. 1911-1914, Aalborg, Denmark, Sep. 2001.
[9] R. I. Damper and J. F. G. Eastmond, "Pronunciation by analogy: impact of implementational choices on performance," Language and Speech, vol. 40, no. 1, pp. 1-23, 1997.
[10] Y. Marchand and R. I. Damper, "A multi-strategy approach to improving pronunciation by analogy," Computational Linguistics, vol. 26, no. 2, pp. 195-219, 2000.
[11] C. Bagshaw, "Phonemic transcription by analogy in text-to-speech synthesis: novel word pronunciation and lexicon compression," Computer Speech and Language, vol. 12, no. 2, pp. 119-142, 1998.
[12] M. J. Dedina and H. C. Nusbaum, "PRONOUNCE: A program for pronunciation by analogy," Computer Speech and Language, vol. 5, no. 1, pp. 55-64, 1991
[13] R. W. P. Luk and R. I. Damper, "Inference of letter-phoneme correspondences by delimiting and dynamic time warping techniques," in Proc. ICASSP, vol. 2, pp. 61-64, Mar. 1992.
[14] H. Demuth and M. Beale, Neural Network Toolbox for Use with Matlab, Users Guide Version 3.0, 1998.