A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform

Speech is considered the key mode of interaction amongst humans. Speech signals encounter different scenarios during transmission, such as interference and additive noise, which lead to generate noisy signals. Therefore, robust Speech Enhancement Algorithms (SEA) that suppress noise without di...

Full description

Saved in:
Bibliographic Details
Main Author: Mahmmod, Basheera M.
Format: Thesis
Language:English
Published: 2018
Subjects:
Online Access:http://psasir.upm.edu.my/id/eprint/75679/1/FK%202018%20137%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/75679/
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.upm.eprints.75679
record_format eprints
institution Universiti Putra Malaysia
building UPM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Putra Malaysia
content_source UPM Institutional Repository
url_provider http://psasir.upm.edu.my/
language English
topic Signal processing - Case studies
Speech processing systems
spellingShingle Signal processing - Case studies
Speech processing systems
Mahmmod, Basheera M.
A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform
description Speech is considered the key mode of interaction amongst humans. Speech signals encounter different scenarios during transmission, such as interference and additive noise, which lead to generate noisy signals. Therefore, robust Speech Enhancement Algorithms (SEA) that suppress noise without distorting the original signals are necessary. The removing of noise without causing speech distortion is a challenging task. Moreover, an annoying noise that appears after the enhancement process, called Musical Noise (MN), should be eliminated. Recent SEA approaches tend to enhance speech quality and intelligibility, because improving these two attributes is critical for normal people and hearing impairments. Therefore, this thesis aims to restore speech signals from corrupted signal with minimum MN and best trade-off between Residual Noise (RN) and Signal Distortion (SD). First, a new transform based on new orthogonal polynomials, called the Discrete Krawtchouk–Tchebichef Transform (DKTT), is presented. DKTT exhibits superior compaction and localization properties that affect noise extraction process. Second, a noise classification method is adopted to identify the types of additive noise. Then, three optimum types of parameters are determined based on noise type. The subsequent phase of the developed system involves the proposed non-linear speech estimator. It is based on the Minimum Mean Square Error (MMSE) and the low-distortion approaches. The analytical solution is derived from the assumption that speech and noise components can be modeled based on a combination between Gamma and Laplacian distributions. These types of combination are used first in the developed SEA. Afterward, the second proposed linear estimator has been proposed mainly to reduce the effects of MN. Finally, the inverse of DKTT is applied to regain the clean signal back. To demonstrate the capability of the proposed system, clean speech sentences are selected from the TIMIT dataset. Moreover, eleven types of noise are chosen from the NOISEX-92 dataset, in addition to speech-shaped noises. These noises are the most dominate in the real world. Comparison results reinforce the improvement in quality and intelligibility measurements with reducing of MN level. The objective measurements are including Perceptual Evaluation of Speech Quality (PESQ), Frequency-Weighted Segmental Signal-to-Noise Ratio (FWSNR), the Coherence Speech Intelligibility Index (CSII), Short-Time Objective Intelligibility measure (STOI), along with three types of composite measures, namely, Signal distortion (SIG), Back-ground intrusiveness (BAK), and Overall quality (OVL). The improved SEA demonstrated an improvement in nearly all the aforementioned quality and intelligibility measures for different types of noise and five levels of signal-to-noise ratio (SNR), i.e., −10, −5, 0, 5, and 10 dB. In white noise, for example, the average absolute improvements and their corresponding percentage values of the system performance in terms of PESQ, OVL, STOI, and FWSNR in (dB) for the five SNR levels are 0.37 (17.3%), 0.37 (24.7%), 0.59 (7.8%), and 0.06 (7.7%), respectively. For cockpit noise, the improvements are 0.22 (10.6%), 0.18 (10.5%), 1.5 (23.3%), and 0.07 (9.5%), respectively. For Speech-Shaped noise, the improvements are 0.23 (11.3%), 0.17 (9.1%), 2.05 (31.6%), and 0.05 (7.8%), respectively. Moreover, the classification accuracy has been reached to 99.44%. This work contributed in developing a new transform, finding a new speech and noise models, introducing new linear and non-linear estimators with their adaptively smoothing parameter to get good noise reduction. As a conclusion, the proposed SEA enhances and improves noisy signals and regain clean signals with less RN and SD, reducing MN level. Moreover, best improvement in quality and intelligibility properties is obtained particularly in high noise levels.
format Thesis
author Mahmmod, Basheera M.
author_facet Mahmmod, Basheera M.
author_sort Mahmmod, Basheera M.
title A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform
title_short A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform
title_full A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform
title_fullStr A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform
title_full_unstemmed A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform
title_sort speech enhancement framework using discrete krawtchouk-tchebichef transform
publishDate 2018
url http://psasir.upm.edu.my/id/eprint/75679/1/FK%202018%20137%20-%20IR.pdf
http://psasir.upm.edu.my/id/eprint/75679/
_version_ 1651869212476440576
spelling my.upm.eprints.756792019-11-20T02:59:28Z http://psasir.upm.edu.my/id/eprint/75679/ A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform Mahmmod, Basheera M. Speech is considered the key mode of interaction amongst humans. Speech signals encounter different scenarios during transmission, such as interference and additive noise, which lead to generate noisy signals. Therefore, robust Speech Enhancement Algorithms (SEA) that suppress noise without distorting the original signals are necessary. The removing of noise without causing speech distortion is a challenging task. Moreover, an annoying noise that appears after the enhancement process, called Musical Noise (MN), should be eliminated. Recent SEA approaches tend to enhance speech quality and intelligibility, because improving these two attributes is critical for normal people and hearing impairments. Therefore, this thesis aims to restore speech signals from corrupted signal with minimum MN and best trade-off between Residual Noise (RN) and Signal Distortion (SD). First, a new transform based on new orthogonal polynomials, called the Discrete Krawtchouk–Tchebichef Transform (DKTT), is presented. DKTT exhibits superior compaction and localization properties that affect noise extraction process. Second, a noise classification method is adopted to identify the types of additive noise. Then, three optimum types of parameters are determined based on noise type. The subsequent phase of the developed system involves the proposed non-linear speech estimator. It is based on the Minimum Mean Square Error (MMSE) and the low-distortion approaches. The analytical solution is derived from the assumption that speech and noise components can be modeled based on a combination between Gamma and Laplacian distributions. These types of combination are used first in the developed SEA. Afterward, the second proposed linear estimator has been proposed mainly to reduce the effects of MN. Finally, the inverse of DKTT is applied to regain the clean signal back. To demonstrate the capability of the proposed system, clean speech sentences are selected from the TIMIT dataset. Moreover, eleven types of noise are chosen from the NOISEX-92 dataset, in addition to speech-shaped noises. These noises are the most dominate in the real world. Comparison results reinforce the improvement in quality and intelligibility measurements with reducing of MN level. The objective measurements are including Perceptual Evaluation of Speech Quality (PESQ), Frequency-Weighted Segmental Signal-to-Noise Ratio (FWSNR), the Coherence Speech Intelligibility Index (CSII), Short-Time Objective Intelligibility measure (STOI), along with three types of composite measures, namely, Signal distortion (SIG), Back-ground intrusiveness (BAK), and Overall quality (OVL). The improved SEA demonstrated an improvement in nearly all the aforementioned quality and intelligibility measures for different types of noise and five levels of signal-to-noise ratio (SNR), i.e., −10, −5, 0, 5, and 10 dB. In white noise, for example, the average absolute improvements and their corresponding percentage values of the system performance in terms of PESQ, OVL, STOI, and FWSNR in (dB) for the five SNR levels are 0.37 (17.3%), 0.37 (24.7%), 0.59 (7.8%), and 0.06 (7.7%), respectively. For cockpit noise, the improvements are 0.22 (10.6%), 0.18 (10.5%), 1.5 (23.3%), and 0.07 (9.5%), respectively. For Speech-Shaped noise, the improvements are 0.23 (11.3%), 0.17 (9.1%), 2.05 (31.6%), and 0.05 (7.8%), respectively. Moreover, the classification accuracy has been reached to 99.44%. This work contributed in developing a new transform, finding a new speech and noise models, introducing new linear and non-linear estimators with their adaptively smoothing parameter to get good noise reduction. As a conclusion, the proposed SEA enhances and improves noisy signals and regain clean signals with less RN and SD, reducing MN level. Moreover, best improvement in quality and intelligibility properties is obtained particularly in high noise levels. 2018-05 Thesis NonPeerReviewed text en http://psasir.upm.edu.my/id/eprint/75679/1/FK%202018%20137%20-%20IR.pdf Mahmmod, Basheera M. (2018) A speech enhancement framework using discrete Krawtchouk-Tchebichef Transform. PhD thesis, Universiti Putra Malaysia. Signal processing - Case studies Speech processing systems
score 13.211869