Student loss: Towards the probability assumption in inaccurate supervision
Noisy labels are often encountered in datasets, but learning with them is challenging. Although natural discrepancies between clean and mislabeled samples in a noisy category exist, most techniques in this field still gather them indiscriminately, which leads to their performances being partially ro...
Saved in:
Main Authors: | , , , , , |
---|---|
Format: | Article |
Published: |
IEEE Computer Society
2024
|
Subjects: | |
Online Access: | http://eprints.utm.my/108870/ http://dx.doi.org/10.1109/TPAMI.2024.3357518 |
Tags: |
Add Tag
No Tags, Be the first to tag this record!
|
id |
my.utm.108870 |
---|---|
record_format |
eprints |
spelling |
my.utm.1088702025-01-08T04:50:05Z http://eprints.utm.my/108870/ Student loss: Towards the probability assumption in inaccurate supervision Shuo, Zhang Li, Jian-Qing Fujita, Hamido Li, Yu-Wen Wang, Deng-Bao Zhu, Ting-Ting LB Theory and practice of education Noisy labels are often encountered in datasets, but learning with them is challenging. Although natural discrepancies between clean and mislabeled samples in a noisy category exist, most techniques in this field still gather them indiscriminately, which leads to their performances being partially robust. In this paper, we reveal both empirically and theoretically that the learning robustness can be improved by assuming deep features with the same labels follow a student distribution, resulting in a more intuitive method called student loss. By embedding the student distribution and exploiting the sharpness of its curve, our method is naturally data-selective and can offer extra strength to resist mislabeled samples. This ability makes clean samples aggregate tightly in the center, while mislabeled samples scatter, even if they share the same label. Additionally, we employ the metric learning strategy and develop a large-margin student (LT) loss for better capability. It should be noted that our approach is the first work that adopts the prior probability assumption in feature representation to decrease the contributions of mislabeled samples. This strategy can enhance various losses to join the student loss family, even if they have been robust losses. Experiments demonstrate that our approach is more effective in inaccurate supervision. Enhanced LT losses significantly outperform various state-of-the-art methods in most cases. Even huge improvements of over 50% can be obtained under some conditions. IEEE Computer Society 2024-01-23 Article PeerReviewed Shuo, Zhang and Li, Jian-Qing and Fujita, Hamido and Li, Yu-Wen and Wang, Deng-Bao and Zhu, Ting-Ting (2024) Student loss: Towards the probability assumption in inaccurate supervision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 46 (6). pp. 4460-4475. ISSN 0162-8828 http://dx.doi.org/10.1109/TPAMI.2024.3357518 DOI:10.1109/TPAMI.2024.3357518 |
institution |
Universiti Teknologi Malaysia |
building |
UTM Library |
collection |
Institutional Repository |
continent |
Asia |
country |
Malaysia |
content_provider |
Universiti Teknologi Malaysia |
content_source |
UTM Institutional Repository |
url_provider |
http://eprints.utm.my/ |
topic |
LB Theory and practice of education |
spellingShingle |
LB Theory and practice of education Shuo, Zhang Li, Jian-Qing Fujita, Hamido Li, Yu-Wen Wang, Deng-Bao Zhu, Ting-Ting Student loss: Towards the probability assumption in inaccurate supervision |
description |
Noisy labels are often encountered in datasets, but learning with them is challenging. Although natural discrepancies between clean and mislabeled samples in a noisy category exist, most techniques in this field still gather them indiscriminately, which leads to their performances being partially robust. In this paper, we reveal both empirically and theoretically that the learning robustness can be improved by assuming deep features with the same labels follow a student distribution, resulting in a more intuitive method called student loss. By embedding the student distribution and exploiting the sharpness of its curve, our method is naturally data-selective and can offer extra strength to resist mislabeled samples. This ability makes clean samples aggregate tightly in the center, while mislabeled samples scatter, even if they share the same label. Additionally, we employ the metric learning strategy and develop a large-margin student (LT) loss for better capability. It should be noted that our approach is the first work that adopts the prior probability assumption in feature representation to decrease the contributions of mislabeled samples. This strategy can enhance various losses to join the student loss family, even if they have been robust losses. Experiments demonstrate that our approach is more effective in inaccurate supervision. Enhanced LT losses significantly outperform various state-of-the-art methods in most cases. Even huge improvements of over 50% can be obtained under some conditions. |
format |
Article |
author |
Shuo, Zhang Li, Jian-Qing Fujita, Hamido Li, Yu-Wen Wang, Deng-Bao Zhu, Ting-Ting |
author_facet |
Shuo, Zhang Li, Jian-Qing Fujita, Hamido Li, Yu-Wen Wang, Deng-Bao Zhu, Ting-Ting |
author_sort |
Shuo, Zhang |
title |
Student loss: Towards the probability assumption in inaccurate supervision |
title_short |
Student loss: Towards the probability assumption in inaccurate supervision |
title_full |
Student loss: Towards the probability assumption in inaccurate supervision |
title_fullStr |
Student loss: Towards the probability assumption in inaccurate supervision |
title_full_unstemmed |
Student loss: Towards the probability assumption in inaccurate supervision |
title_sort |
student loss: towards the probability assumption in inaccurate supervision |
publisher |
IEEE Computer Society |
publishDate |
2024 |
url |
http://eprints.utm.my/108870/ http://dx.doi.org/10.1109/TPAMI.2024.3357518 |
_version_ |
1821001617433952256 |
score |
13.23648 |