Voiced-unvoiced segmentation using time-domain features for improved speech recognition system / Ahmad Akmal Mohd Rosley

This thesis explores a method for voiced-unvoiced segmentation using time-domain features to enhance speech recognition systems. Voiced unvoiced segmentation is the process of identifying both voiced and unvoiced parts of a spoken stream. Voiced unvoiced segmentation in speech recognition often stru...

Full description

Saved in:
Bibliographic Details
Main Author: Mohd Rosley, Ahmad Akmal
Format: Thesis
Language:en
Published: 2025
Subjects:
Online Access:https://ir.uitm.edu.my/id/eprint/117651/1/117651.pdf
https://ir.uitm.edu.my/id/eprint/117651/
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:This thesis explores a method for voiced-unvoiced segmentation using time-domain features to enhance speech recognition systems. Voiced unvoiced segmentation is the process of identifying both voiced and unvoiced parts of a spoken stream. Voiced unvoiced segmentation in speech recognition often struggles with balancing accuracy and computational efficiency, with simpler methods failing in complex conditions and frequency-domain approaches being computationally demanding. V-UV segmentation was designed to increase accuracy and decrease computation time in data processing. The study utilizes a database of audio recordings from 51 female speakers uttering three isolated English words. Preprocessing methods, such as normalization and pre emphasis, are used to prepare audio data for analysis. V-UV segmentation utilizes ZCR and STE to separate voiced and unvoiced frames from the speech signal where only voiced frames are the focus of the feature extraction procedure, which uses LPC to extract coefficients that reflect the spectral envelope of the spoken signals. Following segmentation, the findings show considerable frame reduction percentages for each word, "Aluminium" showed a frame reduction of 24.95%, "Better" showed a frame reduction of 31.95%, and "Communication" showed a frame reduction of 40.58%. Moderate accuracy was demonstrated by the DA classifier utilizing Mahalanobis distance, where the highest overall accuracy recorded was 73.33% for LPC order 19, offering the greatest speech signal representation for classification in the proposed system.