Using the short-time fourier transform and ResNet to diagnose depression from speech data

Depression is a common illness that is affecting many people nowadays, this is especially true now with the advent of the COVID-19 pandemic. It often arises when a person is having difficulty coping with stressful life events. It can occur throughout the lifespan of a person, and it pervades al...

Full description

Saved in:
Bibliographic Details
Main Authors: Elfaki, Ayman, Asnawi, Ani Liza, Jusoh, Ahmad Zamani, Ismail, Ahmad Fadzil, Ibrahim, Siti Noorjannah, Mohamed Azmin, Nor Fadhillah, Nik Hashim, Nik Nur Wahidah
Format: Conference or Workshop Item
Language:English
English
Published: IEEE 2021
Subjects:
Online Access:http://irep.iium.edu.my/97108/1/97108_Using%20the%20short-time%20fourier%20transform_Scopus.pdf
http://irep.iium.edu.my/97108/2/97108_Using%20the%20short-time.pdf
http://irep.iium.edu.my/97108/
https://ieeexplore.ieee.org/document/9673562
Tags: Add Tag
No Tags, Be the first to tag this record!
Description
Summary:Depression is a common illness that is affecting many people nowadays, this is especially true now with the advent of the COVID-19 pandemic. It often arises when a person is having difficulty coping with stressful life events. It can occur throughout the lifespan of a person, and it pervades all aspects of our lives. Currently, depression diagnoses rely on patient interviews and self-report questionnaires, which depend heavily on the patient honesty and the subjective experience of the clinician. In this paper, we will begin with investigating the viability of using the Short-Time Fourier Transform (STFT) as a feature descriptor to objectively diagnose depression from speech data. The dataset used in this research is the Audio-Visual Emotion Challenging 2017 (AVEC2017). The model is based on a modified ResNet18 model architecture to perform a binary classification (i.e., depressed or non-depressed). The STFT is computed from the speech signal to generate a mel-spectrogram for training and testing the model. The experiment shows that relying solely on STFT as an input feature resulted in an F1 score of 74.71% in classifying depression.