Robust speaker identification based on neural response in clean and noisy conditions / Md. Atiqul Islam
Speaker identification (SID) is a biometric technique of determining an unknown speaker's identity using underlying information of his/her speech utterances. It is very essential for security, crime investigation, forensic test, and telephoning. Robust SID under noisy conditions is still a c...
Saved in:
Main Author: | |
---|---|
Format: | Thesis |
Published: |
2016
|
Subjects: | |
Online Access: | http://studentsrepo.um.edu.my/7661/7/atikul_islam.pdf http://studentsrepo.um.edu.my/7661/ |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Speaker identification (SID) is a biometric technique of determining an unknown
speaker's identity using underlying information of his/her speech utterances. It is very
essential for security, crime investigation, forensic test, and telephoning. Robust SID
under noisy conditions is still a challenging topic in the field of speech processing. Most
of the acoustic-feature-based methods fail to achieve robust SID scores under noisy
conditions. However, human performance is very robust in noisy environments. The
physiologically-based computational model of the auditory nerve (AN) proposed by
Zilany and colleagues (2006), which captures almost all of the nonlinearities observed
at the level of auditory periphery, was used in this study to obtain a robust SID
performance. A neural-response-based novel feature was proposed in this study for both
text-dependent and text-independent speaker identification systems. The proposed
feature, referred to as neurogram, was computed from the output of the AN model. The
training and testing speech signals were taken from three renowned text-independent
datasets (YOHO, TIMIT, and TIDIGIT) and a text-dependent audio speech dataset
'UNIVERSITY MALAY A' to evaluate the performance of the proposed system. The
speaker modeling was done using speech signals recorded under clean environment
whereas testing was done in both clean and noisy conditions. The testing speech signals
were contaminated by adding white Gaussian noise, pink noise, and street noise with
signal-to-noise ratios (SNRs) ranging from -5 to 15 dB in steps of 5 dB.
To develop a speaker model, three standard classifiers were employed in this study such
as the Gaussian mixture model (GMM), support vector machine (SVM) and Gaussian
mixture model-Universal background model (GMM-UBM). The performance of the
proposed neural-feature-based speaker identification was compared to the results from
the traditional acoustic-feature-based methods, such a the Mel-frequency cep tral |
---|