THE INFINITE GAUSSIAN MODELS: AN APPLICATION TO SPEAKER IDENTIFICATION

  • Souad FRIHA Institut de Génie Electrique, University of Tebessa
  • Nora MANSOURI Laboratoire d'Automatique et de Robotique, University of Mentouri, Route Ain EL Bey, Constantine
  • Abdelmalik TALEB AHMED University of Valenciennes et du Hainaut Cambrésis LAMIH UMR CNRS-UVHC 8530, Univ. de Valenciennes et du Hainaut Cambrésis, Le mont Houy 59313 Valenciennes Cedex 9

Résumé

When modeling speech with traditional Gaussian Mixture Models (GMM) a major problem is that one need to fix a priori the
number of GMMs. Using the infinite version of GMMs allows to overcome this problem. This is based on considering a
Dirichlet process with a Bayesian inference via Gibbs sampling rather than the traditional EM inference. The paper
investigates the usefulness of the infinite Gaussian modeling using the state of the art SVM classifiers. We consider the
particular case of the speaker identification under limited data condition that is very short speech sequences. Basically,
recognition rates of 100% are achieved after only 5 iterations using training and test samples less than 1 second. Experiments
are carried out over NIST SRE 2000 corpus.

Références

The infinite Gaussian models an application to speaker identification
107
[14] Edmund Jackson, Manuel Davy and William J.
Fitzgerald, “Unsupervised Classification of Functions
using Dirichlet Process Mixtures of Gaussian
Processes”, CUED/F-INFENG/TR. 562, august, 2006.
[15] Emad Bahrami, Samani, M. Mehdi Homayounpour
and Hong Gu, “A Novel Hybrid GMM/SVM
Architecture for Protein Secondary Structure
Prediction”, 7th international workshop on Fuzzy
Logic and Applications: Applications of Fuzzy Sets
Theory, pp. 491-496, LNAI 4578, 2007.
[16] Frank Wood, Michael J. Black, “A nonparametric
Bayesian alternative to spike sorting”, Journal of
Neuroscience Methods 173, 1-12, 2008.
[17] Friha, S.; Mansouri, “Application of GMMs to Speech
Recognition using very short time series”, Intelligent
Systems and Automation: 1st Mediterranean
Conference on Intelligent Systems and Automation
(CISA 08). AIP Conference Proceedings, pp. 450-453,
Volume 1019, 2008.
[18] ] Ganapathiraju A., Hamaker J., Picone J.,
“Applications of Support Vector Machines to Speech
Recognition”, IEEE Transactions on Signal
Processing, vol. 52, no. 8, pp. 2348-2355, 2004.
[19] H S Jayanna and S R Mahadeva Prasanna, “An
Experimental Comparison of Modelling Techniques
for Speaker Recognition under Limited Data
Condition”, Sadhana, Indian Academy of Sciences, pp.
717-728, Vol. 34, Part 3, October 2009.
[20] Jérôme Louradour, Khalid Daoudi and Francis Bach,
“Feature Space Mahalanobis Sequence Kernels:
Application to SVM Speaker Verification”, IEEE
Transactions on Audio Speech and Langage
Processing, Vol. 15, No. 8, November 2007.
[21] ] Jian Wang, Jianjun Lei, Jun Guo and Zhen Yang,
“SVM Based Speaker Selection using GMM
Supervector for Rapid Speaker Adaptation”, 6th
international conference, SEAL proceedings (Hefei,
China), pp. 617-624, vol. 4247, 2006.
[22] Mckenma S. J., Nait Charif H. “Summerizing
Contextual Activity and Detecting Unusual Inactivity
in a Supportive ”, Home Environnement, Pattern
Annal Appl. 7(4), 386-401, 2004.
[23] Michael Mendel, “Implementing the Infinite GMM”,
within the project in Tony Jebara's Machine Learning
course, cs4771,may 6, 2005, downloadable from
http://mr-pc.org/work/
[24] David M. Blei, Michael I. Jordan, Thomas L. Griffiths,
Joshua B.Tenenbaum, “Hierarchical Topic Models and
the Nested Chinese Restaurant Process”, Advances in
Neural Information Processing Systems (NIPS) 16,
2004.
[25] Oh-Wook Kwon, Kwokleung Chan, Te-Won Lee,
“Speech Feature Analysis Using Variational Bayesian
PCA”, IEEE Signal Processing Letters, Vol. XX, No.
Y, Month 2002.
[26] Pablo Martine-Olmos, Juan José Murillo-Fuentes and
Fernando Pérez-Cruz, “Soft LDPC Decoding in
nonlinear Channels with Gaussian Process for
Classification”, 17th European Signal Processing
Conference (EUSIPCO 2009).
[27] Radford M. Neal, “Markov Chain Sampling Methods
for Dirichlet Process Mixture Models”, Technical
Report No.9815, Department of Statistics, University
of Torento, 1998.
[28] Carl Edward Rasmussen, in Advances in Neural
Information Processing Systems 12, pp. 554-560, MIT
Press, 2000.
[29] Carl Edward Rasmussen and Zoubin Ghahramani,
“Infinite Mixtures of Gaussian Process Experts”, In
Advances in Neural Information Processing Systems
14, MIT Press, 2002.
[30] Réda Dehak, Najim Dehak, Patrick Kenny, Pierre
Dumouchel, “linear and nonlinear Kernel GMM
SuperVector Machines for Speaker Verification”, in
Interspeech, Antwerp, Belgium, 2007.
[31] Shui-Sheng Zhou, Hong-Wei Liu, Feng Ye, “Variant
of Gaussian Kernel and Parameter setting Method for
Nonlinear SVM ”, Elsevier Science Publishers B. V,
Neurocomputing, Issue 13-15, Volume 72, August
2009.
[32] Soonil Kwon a, Shrikanth Narayanan, “Robust speaker
identification based on selective use of feature
vectors”, Pattern Recognition Letters, 85–89, 28, 2007.
[33] Tao Chen, Julian Morris and Elaine Martin, “Bayesian
Control Limits for Statistical Process Monitoring”,
IEEE International Conference on Control and
Automation (Budapest, Hungary), ICCA 2005.
[34] Tingyao Wu, Jacques Duchateau, Jean Pierre Martens,
Dirk Van Compernolle, “Feature Subset Selection for
Improved Native Accent Identification”, Speech
Communication, 83-98, 52, 2010.
[35] Tomi Kinnunen, Haizhou Li, “An Overview of Text-
Independent Speaker Recognition: from Features to
Supervectors”, Speech Communication, 12–40,52,
2010.
[36] William J.J. Roberts and Jonathan P. Willmore,
“Automatic Speaker Recognition using Gaussian
Mixture Models”, Commonwealth of Australia, 1999.
[37] Xiaodan Zhuang, Jing Huang, Gerasimos Potamiamos,
Mark Hasegawa-Johnson, “Acoustic Fall Detection
using Gaussian Mixture Models and GMM
Supervectors”, IEEE International Conference on
Acoustics, Speech and Signal Processing, 2009.[1] M. F. Abu El yazid, M. A. El Gamal,, M. M. H. El
Ayadi, “On the Determination of Optimal Model
Order for GMM-Based Text-Independent Speaker
Identification”, EURASIP Journal on Applied Signal
Processing, 2004.
[2] Baibo Zhang, Changshui Zhang, Xing Yi, “Active
Curve Axis Gaussian Mixture Models”, Pattern
Recognition Society Journal 2351-2362, 38, 2005.
[3] Christophe Biernacki, “Pourquoi les modèles de
mélange pour la classification?”,Revue MODULAD,
No. 40, 2009.
[4] Bin Ma, Donglai Zhu, Rong Tong and Haizhou Li,
“Speaker Cluster Based GMM Tokenization for
Speaker Recognition”, Interspeech, Pensylvania, USA
pp. 505-508, September 2006.
[5] W. M. Campbell, D. E. Sturim, D.A. Reynolds,
“Support Vector Machines using GMM Supervectors
for Speaker Verification”, IEEE Signal Processing
Letters, pp. 308-311, vol. 13, no. 5, 2006.
[6] M. Campbell, D. E. Sturim, D.A. Reynolds, A.
Solomonoff, “SVM Based Speaker Verification using
a GMM Supervector Kernel and NAP Variability
Compensation”, pp. 97-100, in Proc. ICASSP.
Toulouse : IEEE, 2006.
[7] Chang Huai You, Kong Aik Lee and Haizhou Li, “An
SVM Kernel with GMM-Supervector Based in the
Bhattacharryya Distance for Speaker Recognition”,
IEEE Signal Processing Letters, Vol. 16, No. 1
January 2009.
[8] Cheng-Lung Huang a, Jian-Fan Dun, “A distributed
PSO–SVM hybrid system with feature selection and
parameter optimization”, Applied Soft Computing,
1381–1391, 8, 2008
[9] Cheng-Lung Huang, Chieh-Jen Wang, “A GA-based
feature Selection and Parameters Optimization for
Support Vector Machines”, Expert Systems with
Applications, 231-240, 31, 2006.
[10] Dalei Wu, Ji Li, Haiqing Wu, “ -Gaussian Mixture
Modelling for Speaker Recognition”, Pattern
Recognition Letters 589-594, 30, 2009.
[11] Dalei Wu, Ji Li and Hui Jiang “Normalization and
Transformation Techniques for Robust Speaker
Recognition”, Dalei Wu, Baojie Li and Hui Jiang,
Speech Recognition, Technologies and Applications,
pp. 550, ISBN 978-953-7619-29-9, November 2008.
[12] Douglas A. Reynolds, Thomas F. Quatieri and Robert
B. Dunn, “Speaker Verification Using Adapted
Gaussian Mixture Models”, Digital Signal Processing,
pp. 19-41, vol. 10, no. 1-3, 2000
[13] Edmund Jackson, Manuel Davy, Arnaud Doucert,
William J. Fitzgerald, “Bayesian Unsupervised Signal
Classification by Dirichlet Process Mixtures of
Gaussian Processes”, IEEE International Conference
on Acoustics, Speech and Signal Processing, ICASSP
2007.
Comment citer
FRIHA, Souad; MANSOURI, Nora; TALEB AHMED, Abdelmalik. THE INFINITE GAUSSIAN MODELS: AN APPLICATION TO SPEAKER IDENTIFICATION. Courrier du Savoir, [S.l.], v. 12, mai 2014. ISSN 1112-3338. Disponible à l'adresse : >https://revues.univ-biskra.dz./index.php/cds/article/view/458>. Date de consultation : 14 nov. 2024
Rubrique
Articles