Result comparison of model validation techniques on audio-visual speech recognition

This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Vis...

Full description

Saved in:
Bibliographic Details
Main Authors: Thum, Wei Seong, M. Z., Ibrahim, Nurul Wahidah, Arshad, D.J., Mulvaney
Format: Book Section
Language:English
English
Published: Springer, Singapore 2017
Subjects:
Online Access:http://umpir.ump.edu.my/id/eprint/20566/13/78.%20Result%20Comparison%20of%20Model%20Validation%20Techniques%20on%20Audio-Visual%20Speech%20Recognition.pdf
http://umpir.ump.edu.my/id/eprint/20566/14/78.%20A%20Comparison%20of%20Model%20Validation%20Techniques%20on%20Audio-Visual%20Speech%20Recognition.pdf
http://umpir.ump.edu.my/id/eprint/20566/
https://link.springer.com/chapter/10.1007/978-981-10-6451-7_14
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.ump.umpir.20566
record_format eprints
spelling my.ump.umpir.205662018-07-18T05:01:54Z http://umpir.ump.edu.my/id/eprint/20566/ Result comparison of model validation techniques on audio-visual speech recognition Thum, Wei Seong M. Z., Ibrahim Nurul Wahidah, Arshad D.J., Mulvaney QA75 Electronic computers. Computer science TK Electrical engineering. Electronics Nuclear engineering This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Visual Speech Recognition (AVSR), in which recognition uses both audio information and mouth movements obtained from a video recording of the speaker’s face region. In this paper, model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performance of the validation techniques themselves. A new speech data corpus is used, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains 10 speakers with five sets of samples uttered by each speaker. The database is divided into training and testing sets and processed in manners suitable for the validation techniques under investigation. The performance is evaluated using a range of different signal-to-noise ratio values using a variety of noise types obtained from the NOISEX-92 dataset. Springer, Singapore 2017-07 Book Section PeerReviewed pdf en http://umpir.ump.edu.my/id/eprint/20566/13/78.%20Result%20Comparison%20of%20Model%20Validation%20Techniques%20on%20Audio-Visual%20Speech%20Recognition.pdf pdf en http://umpir.ump.edu.my/id/eprint/20566/14/78.%20A%20Comparison%20of%20Model%20Validation%20Techniques%20on%20Audio-Visual%20Speech%20Recognition.pdf Thum, Wei Seong and M. Z., Ibrahim and Nurul Wahidah, Arshad and D.J., Mulvaney (2017) Result comparison of model validation techniques on audio-visual speech recognition. In: IT Convergence and Security 2017. Lecture Notes in Electrical Engineering, 449 . Springer, Singapore, Berlin, Germany, pp. 1-8. ISBN 978-981-10-6450-0 (Print); 978-981-10-6451-7 (online) https://link.springer.com/chapter/10.1007/978-981-10-6451-7_14
institution Universiti Malaysia Pahang
building UMP Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Malaysia Pahang
content_source UMP Institutional Repository
url_provider http://umpir.ump.edu.my/
language English
English
topic QA75 Electronic computers. Computer science
TK Electrical engineering. Electronics Nuclear engineering
spellingShingle QA75 Electronic computers. Computer science
TK Electrical engineering. Electronics Nuclear engineering
Thum, Wei Seong
M. Z., Ibrahim
Nurul Wahidah, Arshad
D.J., Mulvaney
Result comparison of model validation techniques on audio-visual speech recognition
description This paper implements and compares the performance of a number of techniques proposed for improving the accuracy of Automatic Speech Recognition (ASR) systems. As ASR that uses only speech can be contaminated by environmental noise, in some applications it may improve performance to employ Audio-Visual Speech Recognition (AVSR), in which recognition uses both audio information and mouth movements obtained from a video recording of the speaker’s face region. In this paper, model validation techniques, namely the holdout method, leave-one-out cross validation and bootstrap validation, are implemented to validate the performance of an AVSR system as well as to provide a comparison of the performance of the validation techniques themselves. A new speech data corpus is used, namely the Loughborough University Audio-Visual (LUNA-V) dataset that contains 10 speakers with five sets of samples uttered by each speaker. The database is divided into training and testing sets and processed in manners suitable for the validation techniques under investigation. The performance is evaluated using a range of different signal-to-noise ratio values using a variety of noise types obtained from the NOISEX-92 dataset.
format Book Section
author Thum, Wei Seong
M. Z., Ibrahim
Nurul Wahidah, Arshad
D.J., Mulvaney
author_facet Thum, Wei Seong
M. Z., Ibrahim
Nurul Wahidah, Arshad
D.J., Mulvaney
author_sort Thum, Wei Seong
title Result comparison of model validation techniques on audio-visual speech recognition
title_short Result comparison of model validation techniques on audio-visual speech recognition
title_full Result comparison of model validation techniques on audio-visual speech recognition
title_fullStr Result comparison of model validation techniques on audio-visual speech recognition
title_full_unstemmed Result comparison of model validation techniques on audio-visual speech recognition
title_sort result comparison of model validation techniques on audio-visual speech recognition
publisher Springer, Singapore
publishDate 2017
url http://umpir.ump.edu.my/id/eprint/20566/13/78.%20Result%20Comparison%20of%20Model%20Validation%20Techniques%20on%20Audio-Visual%20Speech%20Recognition.pdf
http://umpir.ump.edu.my/id/eprint/20566/14/78.%20A%20Comparison%20of%20Model%20Validation%20Techniques%20on%20Audio-Visual%20Speech%20Recognition.pdf
http://umpir.ump.edu.my/id/eprint/20566/
https://link.springer.com/chapter/10.1007/978-981-10-6451-7_14
_version_ 1643668910237548544
score 13.211869