wideriver.tech
Back to News Feed
Industry News

RL without TD learning

Berkeley AI Research

In this post, I’ll introduce a reinforcement learning (RL) algorithm based on an “alternative” paradigm: divide and conquer. Unlike traditional methods, this algorithm is not based on temporal difference (TD) learning (which has scalability challenges), and scales well to long-horizon tasks. We can do Reinforcement Learning (RL) based on divide and conquer, instead of temporal difference (TD) learning. Problem setting: off-policy RL Our problem setting is off-policy RL. Let’s briefly revi...

Read full article

Join the Discussion

2000 characters remaining

No comments yet. Be the first to comment!