Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning
Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in ine...
Saved in:
Main Authors: | , , |
---|---|
Format: | Article |
Language: | English |
Published: |
Universiti Utara Malaysia
2021
|
Subjects: | |
Online Access: | https://repo.uum.edu.my/id/eprint/28125/1/document%20%284%29.pdf https://doi.org/10.32890/jict.20.1.2021.9267 https://repo.uum.edu.my/id/eprint/28125/ https://www.e-journal.uum.edu.my/index.php/jict/article/view/12398 https://doi.org/10.32890/jict.20.1.2021.9267 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.uum.repo.28125 |
---|---|
record_format |
eprints |
spelling |
my.uum.repo.281252023-05-21T15:21:24Z https://repo.uum.edu.my/id/eprint/28125/ Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning Hock, Hung Chieng Wahid, Noorhaniza Ong, Pauline QA75 Electronic computers. Computer science Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit (ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2) the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and 4) the multi linear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during training and thereby improved the predictive performance of the networks. Universiti Utara Malaysia 2021 Article PeerReviewed application/pdf en cc4_by https://repo.uum.edu.my/id/eprint/28125/1/document%20%284%29.pdf Hock, Hung Chieng and Wahid, Noorhaniza and Ong, Pauline (2021) Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning. Journal of Information and Communication Technology (JICT), 20 (1). pp. 21-39. ISSN 1675-414X https://www.e-journal.uum.edu.my/index.php/jict/article/view/12398 https://doi.org/10.32890/jict.20.1.2021.9267 https://doi.org/10.32890/jict.20.1.2021.9267 |
institution |
Universiti Utara Malaysia |
building |
UUM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Utara Malaysia |
content_source |
UUM Institutional Repository |
url_provider |
http://repo.uum.edu.my/ |
language |
English |
topic |
QA75 Electronic computers. Computer science |
spellingShingle |
QA75 Electronic computers. Computer science Hock, Hung Chieng Wahid, Noorhaniza Ong, Pauline Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning |
description |
Activation function is a key component in deep learning that performs non-linear mappings between the inputs and outputs. Rectified Linear Unit
(ReLU) has been the most popular activation function across the deep learning community. However, ReLU contains several shortcomings that can result in inefficient training of the deep neural networks, these are: 1) the negative
cancellation property of ReLU tends to treat negative inputs as unimportant information for the learning, resulting in performance degradation; 2)
the inherent predefined nature of ReLU is unlikely to promote additional flexibility, expressivity, and robustness to the networks; 3) the mean activation of ReLU is highly positive and leads to bias shift effect in network layers; and
4) the multi linear structure of ReLU restricts the non-linear approximation power of the networks. To tackle these shortcomings, this paper introduced Parametric Flatten-T Swish (PFTS) as an alternative to ReLU. By taking ReLU as a baseline method, the experiments showed that PFTS improved
classification accuracy on SVHN dataset by 0.31%, 0.98%, 2.16%, 17.72%, 1.35%, 0.97%, 39.99%, and 71.83% on DNN-3A, DNN-3B, DNN-4, DNN5A, DNN-5B, DNN-5C, DNN-6, and DNN-7, respectively. Besides, PFTS also achieved the highest mean rank among the comparison methods. The proposed PFTS manifested higher non-linear approximation power during
training and thereby improved the predictive performance of the networks. |
format |
Article |
author |
Hock, Hung Chieng Wahid, Noorhaniza Ong, Pauline |
author_facet |
Hock, Hung Chieng Wahid, Noorhaniza Ong, Pauline |
author_sort |
Hock, Hung Chieng |
title |
Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning |
title_short |
Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning |
title_full |
Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning |
title_fullStr |
Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning |
title_full_unstemmed |
Parametric flatten-t swish: an adaptive nonlinear activation function for deep learning |
title_sort |
parametric flatten-t swish: an adaptive nonlinear activation function for deep learning |
publisher |
Universiti Utara Malaysia |
publishDate |
2021 |
url |
https://repo.uum.edu.my/id/eprint/28125/1/document%20%284%29.pdf https://doi.org/10.32890/jict.20.1.2021.9267 https://repo.uum.edu.my/id/eprint/28125/ https://www.e-journal.uum.edu.my/index.php/jict/article/view/12398 https://doi.org/10.32890/jict.20.1.2021.9267 |
_version_ |
1768010679040606208 |
score |
13.211869 |