Instance Based Sparse Classifier Fusion for Speaker Verification
محورهای موضوعی : Speech ProcessingMohammad Hasheminejad 1 , حسن فرسی 2
1 - University of Birjand
2 -
کلید واژه: Speaker Recognition , Speaker Verification , Ensemble Classification , Classifier Fusion , IBSparse,
چکیده مقاله :
This paper focuses on the problem of ensemble classification for text-independent speaker verification. Ensemble classification is an efficient method to improve the performance of the classification system. This method gains the advantage of a set of expert classifiers. A speaker verification system gets an input utterance and an identity claim, then verifies the claim in terms of a matching score. This score determines the resemblance of the input utterance and pre-enrolled target speakers. Since there is a variety of information in a speech signal, state-of-the-art speaker verification systems use a set of complementary classifiers to provide a reliable decision about the verification. Such a system receives some scores as input and takes a binary decision: accept or reject the claimed identity. Most of the recent studies on the classifier fusion for speaker verification used a weighted linear combination of the base classifiers. The corresponding weights are estimated using logistic regression. Additional researches have been performed on ensemble classification by adding different regularization terms to the logistic regression formulae. However, there are missing points in this type of ensemble classification, which are the correlation of the base classifiers and the superiority of some base classifiers for each test instance. We address both problems, by an instance based classifier ensemble selection and weight determination method. Our extensive studies on NIST 2004 speaker recognition evaluation (SRE) corpus in terms of EER, minDCF and minCLLR show the effectiveness of the proposed method.
This paper focuses on the problem of ensemble classification for text-independent speaker verification. Ensemble classification is an efficient method to improve the performance of the classification system. This method gains the advantage of a set of expert classifiers. A speaker verification system gets an input utterance and an identity claim, then verifies the claim in terms of a matching score. This score determines the resemblance of the input utterance and pre-enrolled target speakers. Since there is a variety of information in a speech signal, state-of-the-art speaker verification systems use a set of complementary classifiers to provide a reliable decision about the verification. Such a system receives some scores as input and takes a binary decision: accept or reject the claimed identity. Most of the recent studies on the classifier fusion for speaker verification used a weighted linear combination of the base classifiers. The corresponding weights are estimated using logistic regression. Additional researches have been performed on ensemble classification by adding different regularization terms to the logistic regression formulae. However, there are missing points in this type of ensemble classification, which are the correlation of the base classifiers and the superiority of some base classifiers for each test instance. We address both problems, by an instance based classifier ensemble selection and weight determination method. Our extensive studies on NIST 2004 speaker recognition evaluation (SRE) corpus in terms of EER, minDCF and minCLLR show the effectiveness of the proposed method.
[1] X. Zhou, A. S. d’Avila Garcez, H. Ali, S. N. Tran, and K. Iqbal, “Unimodal late fusion for NIST i-vector challenge on speaker detection,” Electron. Lett., vol. 50, no. 15, pp. 1098–1100, 2014.#
[2] H. L. Ville Hautamӓki, Tomi Kinnunen, Filip Sedl&225;k, Kong Ail Lee, Bin Ma, “Sparse Classifier Fusion for Speaker Verification,” IEEE Trans. Audio, Speech Lang. Process., vol. 21, no. 8, pp. 1622–1631, 2013.#
[3] Z. Lei, Y. Yang, and Z. Wu, “Ensemble of support vector machine for text-independent speaker recognition,” IJCSNS Int. J. Comput. Sci. Netw. Secur., vol. 6, no. 5, pp. 163–167, 2006.#
[4] A. Kumar and B. Raj, “Unsupervised Fusion Weight Learning in Multiple Classifier Systems,” arXiv:1502.01823, Feb. 2015.#
[5] K. Lai, D. Liu, S. Chang, and M. Chen, “Learning Sample Specific Weights for Late Fusion,” Image Process. IEEE Trans., vol. 24, no. 9, pp. 2772–2783, 2015.#
[6] R. Saeidi, J. Pohjalainen, T. Kinnunen, and P. Alku, “Temporally weighted linear prediction features for tackling additive noise in speaker verification,” IEEE Signal Process. Lett., vol. 17, no. 6, pp. 599–602, 2010.#
[7] F.I. Cabeceran, “Fusing prosodic and acoustic information for robust speaker recognition,” Ph.D dissertation, TALP Research Center, Speech Processing Group, Universitat Polit&232;cnica de Catalunya Barcelona, July 2008.#
[8] M. J. Alam, T. Kinnunen, P. Kenny, P. Ouellet, and D. O’Shaughnessy, “Multitaper MFCC and PLP features for speaker verification using i-vectors,” Speech Commun., vol. 55, no. 2, pp. 237–251, Feb. 2013.#
[9] W. M. Campbell, D. E. Sturim, D. A. Reynolds, and A. Solomonoff, “SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation,” 2006 IEEE Int. Conf. Acoust. Speech Signal Process. Proc., vol. 1, no. 2, pp. 1–3, 2006.#
[10] C. You, K.-A. Lee, and H. Li, “GMM-SVM kernel with a Bhattacharyya-based distance for speaker recognition,” IEEE Trans. Audio, Speech Lang. Process., vol. 18, no. 6, pp. 1300–1312, 2010.#
[11] P. Mat, M. Kara, and P. Kenny, “full-covariance ubm and heavy-tailed plda in i-vector speaker verification,” Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference on, Prague, Czech Republic, 22 May - 27 May 2011 pp. 4516–4519, 2011.#
[12] A. Solomonoff, W. M. Campbell, and I. Boardman, “Advances in channel compensation for SVM speaker recognition,” in International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Philadelphia PA, USA, 2005, pp. 629–632.#
[13] P. Emerson, Designing an All-Inclusive Democracy. Springer, 2007.#
[14] W. M. Campbell, D. E. Sturim, W. Shen, D. A. Reynolds, and J. Navr&225;til, “The MIT-LL/IBM 2006 speaker recognition system: High-performance reduced-complexity recognition,” in ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing - Proceedings, Honolulu, Hawaii, USA, 15-20 April 2007.#
[15] S. Chernbumroong, S. Cang, and H. Yu, “Genetic Algorithm-based Classifiers fusion for multi-sensor activity recognition of elderly people,” IEEE J. Biomed. Heal. informatics, vol. 19, no. 1, pp. 282 – 289, 2014.#
[16] C. M. Bishop, Pattern recogintion and machine learning (Information Science and Statistics). Springer, 2007.#
[17] S. Pigeon, P. Druyts, and P. Verlinde, “Applying logistic regression to the fusion of the NIST’99 1_speaker submissions,” Digit. Signal Process., vol. 10, no. 1–3, pp. 237–248, 2000.#
[18] M. Schmidt, G. Fung, and R. Rosales, “Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches,” Lect. Notes Comput. Sci., vol. 4701, pp. 286–297, 2007.#
[19] N. Br&252;mmer, “Measuring, refining and calibrating speaker and language information extracted from speech,” Ph.D dissertation, Faculty of Engineering, University of Stellenbosch, 2010.#
[20] S. Wang and X. Yao, “Relationships between diversity of classification ensembles and single-class performance measures,” IEEE Trans. Knowl. Data Eng., vol. 25, no. 1, pp. 206–219, 2013.#
[21] C. Magi, J. Pohjalainen, T. B&228;ckstr&246;m, and P. Alku, “Stabilised weighted linear prediction,” Speech Commun., vol. 51, no. 5, pp. 401–411, 2009.#
[22] S. O. Sadjadi, M. Slaney, and L. Heck, “MSR Identity Toolbox v1. 0: A matlab toolbox for speaker-recognition research,” Speech Lang. Process. Tech. Comm. Newsl., 2013.#
[23] N. Br&252;mmer and E. de Villiers, “Bosaris toolkit [software package],” 2011. [Online]. Available: https://sites.google.com/site/bosaristoolkit/.#