BENCHMARKING WHISPER OPENAI ON SARAWAK LANGUAGES

The end-to-end (E2E) model is influentially reshaping the automatic speech recognition (ASR) scene, supplanting traditional ASR models such as the Hidden Markov model (HMM) and Deep Neural Network (DNN)-based hybrid models. In essence, it displaces crucial components of these traditional ASR models...

Full description

Saved in:
Bibliographic Details
Main Author: GERALD EINSTEIN CORNELIUS
Format: Final Year Project Report
Language:English
Published: Universiti Malaysia Sarawak, (UNIMAS) 2023
Subjects:
Online Access:http://ir.unimas.my/id/eprint/44201/2/Gerald%20Einstein%20ft.pdf
http://ir.unimas.my/id/eprint/44201/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:The end-to-end (E2E) model is influentially reshaping the automatic speech recognition (ASR) scene, supplanting traditional ASR models such as the Hidden Markov model (HMM) and Deep Neural Network (DNN)-based hybrid models. In essence, it displaces crucial components of these traditional ASR models by simplifying the module-based design into a single-network architecture inside a deep learning framework. Interestingly, this simplified technique does not hinder the performance of this worthy successor of a model in recognising speech, while it even yields results that are superior to those of traditional ASR models. Recognising its infinite potential, OpenAI have developed the robust Whisper model based on the E2E, encoder-decoder transformer. While the aforementioned model performs exceptionally well for English ASR, its undetermined performance on low resource languages is a topic of research interest. In this work, the performance evaluation of the Whisper model on Sarawak languages will be explored. This model will be evaluated using speech data from under-resourced Sarawak languages, namely the Sarawak Malay, Iban, Melanau, and the Bidayuh dialects of Jagoi and Bukar Sadong. Fundamentally, a systematic literature review (SLR) and the development of an ASR system built on the Whisper model to uncover the recognition accuracy of Whisper OpenAI on Sarawak languages are the key highlights of this work. The experiment results obtained from the developed ASR system, based on the Word Error Rate (WER) evaluation metric may serve as a baseline for future works based on the integrated Whisper model for under-resource Sarawak languages.