Using the short-time fourier transform and ResNet to diagnose depression from speech data
Depression is a common illness that is affecting many people nowadays, this is especially true now with the advent of the COVID-19 pandemic. It often arises when a person is having difficulty coping with stressful life events. It can occur throughout the lifespan of a person, and it pervades al...
Saved in:
Main Authors: | , , , , , , |
---|---|
Format: | Conference or Workshop Item |
Language: | English English |
Published: |
IEEE
2021
|
Subjects: | |
Online Access: | http://irep.iium.edu.my/97108/1/97108_Using%20the%20short-time%20fourier%20transform_Scopus.pdf http://irep.iium.edu.my/97108/2/97108_Using%20the%20short-time.pdf http://irep.iium.edu.my/97108/ https://ieeexplore.ieee.org/document/9673562 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
Summary: | Depression is a common illness that is affecting
many people nowadays, this is especially true now with the
advent of the COVID-19 pandemic. It often arises when a
person is having difficulty coping with stressful life events. It
can occur throughout the lifespan of a person, and it pervades
all aspects of our lives. Currently, depression diagnoses rely on
patient interviews and self-report questionnaires, which depend
heavily on the patient honesty and the subjective experience
of the clinician. In this paper, we will begin with investigating
the viability of using the Short-Time Fourier Transform (STFT)
as a feature descriptor to objectively diagnose depression from
speech data. The dataset used in this research is the Audio-Visual
Emotion Challenging 2017 (AVEC2017). The model is based on
a modified ResNet18 model architecture to perform a binary
classification (i.e., depressed or non-depressed). The STFT is
computed from the speech signal to generate a mel-spectrogram
for training and testing the model. The experiment shows that
relying solely on STFT as an input feature resulted in an F1
score of 74.71% in classifying depression. |
---|