عرض للأخصائي: Deep reinforcement learning with robust deep deterministic policy gradient

Deep reinforcement learning with robust deep deterministic policy gradient

Recently, Deep Deterministic Policy Gradient (DDPG) is a popular deep reinforcement learning algorithms applied to continuous control problems like autonomous driving and robotics. Although DDPG can produce very good results, it has its drawbacks. DDPG can become unstable and heavily dependent on se...

وصف كامل

محفوظ في:

التفاصيل البيبلوغرافية
المؤلفون الرئيسيون:	Teckchai Tiong, Ismail Saad, Kenneth Tze Kin Teo, Herwansyah Lago
التنسيق:	Proceedings
اللغة:	English
منشور في:	IEEE Xplore 2020
الموضوعات:	T Technology (General) TK Electrical engineering. Electronics Nuclear engineering
الوصول للمادة أونلاين:	https://eprints.ums.edu.my/id/eprint/27893/1/Deep%20reinforcement%20learning%20with%20robust%20deep%20deterministic%20policy%20gradient-Abstract.pdf https://eprints.ums.edu.my/id/eprint/27893/ https://ieeexplore.ieee.org/document/9309539
الوسوم:	إضافة وسم لا توجد وسوم, كن أول من يضع وسما على هذه التسجيلة!

id	my.ums.eprints.27893
record_format	eprints
spelling	my.ums.eprints.278932021-07-07T07:01:59Z https://eprints.ums.edu.my/id/eprint/27893/ Deep reinforcement learning with robust deep deterministic policy gradient Teckchai Tiong Ismail Saad Kenneth Tze Kin Teo Herwansyah Lago T Technology (General) TK Electrical engineering. Electronics Nuclear engineering Recently, Deep Deterministic Policy Gradient (DDPG) is a popular deep reinforcement learning algorithms applied to continuous control problems like autonomous driving and robotics. Although DDPG can produce very good results, it has its drawbacks. DDPG can become unstable and heavily dependent on searching the correct hyperparameters for the current task. DDPG algorithm risk overestimating the Q values in the critic (value) network. The accumulation of estimation errors as time elapse can result in the reinforcement agent trapping into a local optimum or suffering from disastrous forgetting. Twin Delayed DDPG (TD3) mitigated the overestimation bias problem but might not exploit full performance due to underestimation bias. In this paper Twin Average Delayed DDPG (TAD3) is proposed for specific adaption to TD3 and shows that the resulting algorithm perform better than TD3 in a challenging continuous control environment. IEEE Xplore 2020-11-28 Proceedings PeerReviewed text en https://eprints.ums.edu.my/id/eprint/27893/1/Deep%20reinforcement%20learning%20with%20robust%20deep%20deterministic%20policy%20gradient-Abstract.pdf Teckchai Tiong and Ismail Saad and Kenneth Tze Kin Teo and Herwansyah Lago (2020) Deep reinforcement learning with robust deep deterministic policy gradient. https://ieeexplore.ieee.org/document/9309539
institution	Universiti Malaysia Sabah
building	UMS Library
collection	Institutional Repository
continent	Asia
country	Malaysia
content_provider	Universiti Malaysia Sabah
content_source	UMS Institutional Repository
url_provider	http://eprints.ums.edu.my/
language	English
topic	T Technology (General) TK Electrical engineering. Electronics Nuclear engineering
spellingShingle	T Technology (General) TK Electrical engineering. Electronics Nuclear engineering Teckchai Tiong Ismail Saad Kenneth Tze Kin Teo Herwansyah Lago Deep reinforcement learning with robust deep deterministic policy gradient
description	Recently, Deep Deterministic Policy Gradient (DDPG) is a popular deep reinforcement learning algorithms applied to continuous control problems like autonomous driving and robotics. Although DDPG can produce very good results, it has its drawbacks. DDPG can become unstable and heavily dependent on searching the correct hyperparameters for the current task. DDPG algorithm risk overestimating the Q values in the critic (value) network. The accumulation of estimation errors as time elapse can result in the reinforcement agent trapping into a local optimum or suffering from disastrous forgetting. Twin Delayed DDPG (TD3) mitigated the overestimation bias problem but might not exploit full performance due to underestimation bias. In this paper Twin Average Delayed DDPG (TAD3) is proposed for specific adaption to TD3 and shows that the resulting algorithm perform better than TD3 in a challenging continuous control environment.
format	Proceedings
author	Teckchai Tiong Ismail Saad Kenneth Tze Kin Teo Herwansyah Lago
author_facet	Teckchai Tiong Ismail Saad Kenneth Tze Kin Teo Herwansyah Lago
author_sort	Teckchai Tiong
title	Deep reinforcement learning with robust deep deterministic policy gradient
title_short	Deep reinforcement learning with robust deep deterministic policy gradient
title_full	Deep reinforcement learning with robust deep deterministic policy gradient
title_fullStr	Deep reinforcement learning with robust deep deterministic policy gradient
title_full_unstemmed	Deep reinforcement learning with robust deep deterministic policy gradient
title_sort	deep reinforcement learning with robust deep deterministic policy gradient
publisher	IEEE Xplore
publishDate	2020
url	https://eprints.ums.edu.my/id/eprint/27893/1/Deep%20reinforcement%20learning%20with%20robust%20deep%20deterministic%20policy%20gradient-Abstract.pdf https://eprints.ums.edu.my/id/eprint/27893/ https://ieeexplore.ieee.org/document/9309539
_version_	1760230648595873792
score	13.250246

Deep reinforcement learning with robust deep deterministic policy gradient

مواد مشابهة