Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning

Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in ine...

Full description

Saved in:
Bibliographic Details
Main Authors: Hock, Hung Chieng, Wahid, Noorhaniza, Ong, Pauline
Format: Article
Language:English
Published: Universiti Utara Malaysia 2021
Subjects:
Online Access:https://repo.uum.edu.my/id/eprint/28125/1/document%20%284%29.pdf
https://doi.org/10.32890/jict.20.1.2021.9267
https://repo.uum.edu.my/id/eprint/28125/
https://www.e-journal.uum.edu.my/index.php/jict/article/view/12398
https://doi.org/10.32890/jict.20.1.2021.9267
Tags: Add Tag
No Tags, Be the first to tag this record!
id my.uum.repo.28125
record_format eprints
spelling my.uum.repo.281252023-05-21T15:21:24Z https://repo.uum.edu.my/id/eprint/28125/ Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning Hock, Hung Chieng Wahid, Noorhaniza Ong, Pauline QA75 Electronic computers. Computer science Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multi linear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks. Universiti Utara Malaysia 2021 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/28125/1/document%20%284%29.pdf Hock, Hung Chieng and Wahid, Noorhaniza and Ong, Pauline (2021) Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning. Journal of Information and Communication Technology (JICT), 20 (1). pp. 21-39. ISSN 1675-414X https://www.e-journal.uum.edu.my/index.php/jict/article/view/12398 https://doi.org/10.32890/jict.20.1.2021.9267 https://doi.org/10.32890/jict.20.1.2021.9267
institution Universiti Utara Malaysia
building UUM Library
collection Institutional Repository
continent Asia
country Malaysia
content_provider Universiti Utara Malaysia
content_source UUM Institutional Repository
url_provider http://repo.uum.edu.my/
language English
topic QA75 Electronic computers. Computer science
spellingShingle QA75 Electronic computers. Computer science
Hock, Hung Chieng
Wahid, Noorhaniza
Ong, Pauline
Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning
description Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multi linear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks.
format Article
author Hock, Hung Chieng
Wahid, Noorhaniza
Ong, Pauline
author_facet Hock, Hung Chieng
Wahid, Noorhaniza
Ong, Pauline
author_sort Hock, Hung Chieng
title Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning
title_short Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning
title_full Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning
title_fullStr Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning
title_full_unstemmed Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning
title_sort parametric flatten-t swish: an adaptive nonlinear activation function for deep learning
publisher Universiti Utara Malaysia
publishDate 2021
url https://repo.uum.edu.my/id/eprint/28125/1/document%20%284%29.pdf
https://doi.org/10.32890/jict.20.1.2021.9267
https://repo.uum.edu.my/id/eprint/28125/
https://www.e-journal.uum.edu.my/index.php/jict/article/view/12398
https://doi.org/10.32890/jict.20.1.2021.9267
_version_ 1768010679040606208
score 13.211869