Deep reinforcement learning with robust deep deterministic policy gradient

Recently, Deep Deterministic Policy Gradient (DDPG) is a popular deep reinforcement learning algorithms applied to continuous control problems like autonomous driving and robotics. Although DDPG can produce very good results, it has its drawbacks. DDPG can become unstable and heavily dependent on se...

詳細記述

保存先:
書誌詳細
主要な著者: Teckchai Tiong, Ismail Saad, Kenneth Tze Kin Teo, Herwansyah Lago
フォーマット: Proceedings
言語:English
出版事項: IEEE Xplore 2020
主題:
オンライン・アクセス:https://eprints.ums.edu.my/id/eprint/27893/1/Deep%20reinforcement%20learning%20with%20robust%20deep%20deterministic%20policy%20gradient-Abstract.pdf
https://eprints.ums.edu.my/id/eprint/27893/
https://ieeexplore.ieee.org/document/9309539
タグ: タグ追加
タグなし, このレコードへの初めてのタグを付けませんか!
その他の書誌記述
要約:Recently, Deep Deterministic Policy Gradient (DDPG) is a popular deep reinforcement learning algorithms applied to continuous control problems like autonomous driving and robotics. Although DDPG can produce very good results, it has its drawbacks. DDPG can become unstable and heavily dependent on searching the correct hyperparameters for the current task. DDPG algorithm risk overestimating the Q values in the critic (value) network. The accumulation of estimation errors as time elapse can result in the reinforcement agent trapping into a local optimum or suffering from disastrous forgetting. Twin Delayed DDPG (TD3) mitigated the overestimation bias problem but might not exploit full performance due to underestimation bias. In this paper Twin Average Delayed DDPG (TAD3) is proposed for specific adaption to TD3 and shows that the resulting algorithm perform better than TD3 in a challenging continuous control environment.