Here is a followup from the same lead author of the paper referred to in that fi...

gwern · on May 27, 2017

Using forecasting for 'control' doesn't make too much sense (why the need to train a second ensemble to prevent overshoot if it's just supervised learning?), and the first author on that post, is not Gao but Richard Evans who is a DeepMind deep RL researcher (most recent publications: "Deep Reinforcement Learning in Large Discrete Action Spaces", "Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions", "Reinforcement Learning in a Neurally Controlled Robot Using Dopamine Modulated STDP").