فشرده‌سازي وفقي سيگنال صحبت باند وسيع و صوت با استفاده از تبديل موجک

محورهای موضوعی : مهندسی برق و کامپیوتر

1 - دانشگاه شهید بهشتی

تاریخ دریافت : 1382/09/30 تاریخ پذیرش : 1383/09/01 تاریخ انتشار : 1383/06/31

کلید واژه: فشرده‌سازي صحبتبسته موجکيمدل روان شنيداريباند بحرانيکدگذاري آنتروپي,

چکیده مقاله :

در اين مقاله طراحي يک کد كننده ديكد كننده جديد در نرخ بيت kb/s 32 براي سيگنال صحبت باند وسيع و صوت بررسي مي‌شود. اين كدر جايگزين خوبي براي کدرهاي باند وسيع قبلي مثل استاندارد G721 با نرخ بيت kb/s 32 و G722 با نرخ بيت kb/s 64 و 2/4 MOS= مي‌باشد. فشرده‌ساز يا کدر ما شامل قسمتهاي کدگذار تبديلي، مدل روان شنيداري، چندي‌کننده و قسمت کدگذار با طول متغير است. در قسمت کدگذار تبديلي از بسته موجکي که داراي باندهاي خروجي نزديک به باندهاي بحراني است استفاده شده است. تفاوت اين قسمت با کارهاي مشابه در استفاده از هسته تبديل موجک توسعه يافته پارامتري جديد و نيز روشي است که شاخه هاي WP را گسترش داده‌ايم تا انطباق بيشتري با باندهاي بحراني شنوايي داشته باشند. فكر استفاده از مدل روان شنيداري را از MPEG1-Audio گرفته‌ايم اما به جاي استفاده از طيف توان براي محاسبه نسبت سيگنال به ماسک S/M مستقيماً از داده‌هاي خروجي بسته موجکي استفاده کرده‌ايم. به اين ترتيب، علاوه بر تطبيق مناسب خروجي‌هاي بسته موجکي با مدل روان شنيداري، از ميزان محاسبات نيز کاسته شده است. در چندي‌کننده با توجه به تعداد بيت‌هاي هر باند بحراني که قبلاً توسط مدل روان شنيداري حساب شده است به چندي کردن خروجي‌هاي بسته موجکي مي‌پردازيم. در قسمت VLC، از روش کدگذاري آنتروپي استفاده کرده‌ايم. براي اين کار از جداول دوباره کد کننده استاندارد JPEG استفاده شده است. اما تغييراتي براي تطبيق هر چه بهتر با شرايط سيگنال صحبت اعمال نموده‌ايم. کدر قابليت استفاده وفقي از هسته موجک پارامتري را داراست. کدر با تغيير نسبت S/M قابليت کم کردن نرخ بيت و کاهش کيفيت در حد کيفيت مورد نياز را دارد. بنابراين، در جاهايي که احتياج به نرخ بيت ثابتي باشد با تغيير S/M در اطراف نقطه کاري نرخ بيت به ميزان خواسته شده مي‌رسد. در نهايت اين كدر با نرخ بيت kb/s 32 کيفيت بسيار خوبي دارد که به راحتي از سيگنال PCM ورودي با نرخ نمونه برداري kHz 16 تعداد بيت 16 در هر نمونه قابل تشخيص نيست.

چکیده انگلیسی:

The design of a new codec at 32 kb/s for audio and high quality speech (bandwidth limited to 7 kHz and sampled at 16 kHz with 16 b/sample) is presented in this paper. This codec is a good substitute for the G721 ITU Standard and its 64 kb/s variant G722 that are based on ADPCM and dating from the late 1980s. This new codec comprises adaptive wavelet transform coding, psycho-acoustic modeling, quantization and variable length entropy and run-length coding. The novelty here is the use of a parametric wavelet kernel and the way the wavelet packet tree (WPT) is expanded so that better matching is achieved with critical acoustic bands. The explicit kernel permits to control the sharpness of the basic half-band filter of which the filter used in the Fast Wavelet Transform (FWT) coding are derived. The psycho-acoustic modeling of MPEG1-Audio is used but instead of employing power spectrum for calculating the Signal-to-Mask ratio (S/M), we have directly used the energies of WPT output signals. As a consequence, the computation cost is reduced. The number of quantization bits in each band is controlled by the corresponding S/M ratio. The Variable Length Coding (VLC) used here is an extension of JPEG Huffman coding where some modifications are made to adapt this scheme to speech characteristics. The developed codec has the capability of reducing the bit-rate and controlling the required quality by changing the S/M ratios. Therefore, it can be used for fixed capacity channels by the same token. It is shown that this scheme has a very good quality at 32 kb/s and that the coded signal is quite indistinguishable from the PCM signal digitized at 16 kHz and 16 b/sample.

منابع و مأخذ:

[1] N. S. Jayant and P. Noll, Digital Coding of Waveforms, Prentice Hall, INC. Englewood Cliffs, NJ, 1984.
[2] N. Jayant, J. Johnston and R. Safranek, "Signal compression based on models of human perception," Proc. of IEEE, vol. 81, no. 10, pp. 1385-1422, Oct. 1993.
[3] I. Daubechies, Ten Lectures on Wavelets, SIAM, 1992.
[4] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press,1999.
[5] C. S. Burrus, R. A. Gopinath, and H. Guo, Introduction to Wavelets and Wavelet Transforms, Prentice Hall, 1993.
[6] P. Srinivasan and H. Jamieson, "High quality audio compression using an adaptive wavelet packet decomposition and psychoacoustic modeling," IEEE Tran. Signal processing, vol. 46, no. 4, pp. 1085-1093, Apr. 1998.
[7] E. Ambikairajah, A. G. Davis, and W. T. K. Wong, "Auditory masking & MPEG-1 audio compression," Electronics & Communication Engineering Journal, vol. 9, no. 4, pp. 165-175,Aug. 1997.
[8] ISO/IEC JTC 1/SC 29/WG 1, "Call for contributions- lossless compression of continuous-tone still pictures," ISO Working Document ISO/IEC JTC1/SC29/WG1 N41, Mar. 1995.
[9] D. Sinha and A. H. Tewfik, "Low bit rate transparent audio compression using adaptive wavelets," IEEE Trans. Signal Processing, vol. 41, no. 12, pp. 3463-3479, Dec. 1993.
[10] Matlab 5.3 Wavelet Toolbox Users Guide.
[11] K. Brandenburge and M. Bosi, "Overview of MPEG audio: current and future standards for low bit rate audio coding," J. Audio Engineering Society, vol. 45, no. 1-2, pp. 4-21, Jan./Feb. 1997.
[12] ISO/IEC Int'l Standard 11172-3, Information Technology: Coding of Moving Pictures and Associated Audio for Digital Storage Media at up to about 1.5 Mb/s- Part3: Audio.
[13] Digitale Audioverarbeitung, WS 2000, 18.205
[14] D. Y. Pan, "A tutorial on MPEG/Audio compression," IEEE Multimedia, vol. 2, no. 2, pp. 60-74, Summer 1995.

اشتراک گذاری

آدرس مقاله

فشرده‌سازي وفقي سيگنال صحبت باند وسيع و صوت با استفاده از تبديل موجک

رایمگ

پیوندهای سایت

مراکز مرتبط

پشتیبانی

صفحات رسمی