Here is a followup from the same lead author of the paper referred to in that first blog post (Jim Gao) who apparently was involved in Deepmind's project. Note the conspicuous lack of any sort of reference to deep reinforcement learning
Using forecasting for 'control' doesn't make too much sense (why the need to train a second ensemble to prevent overshoot if it's just supervised learning?), and the first author on that post, is not Gao but Richard Evans who is a DeepMind deep RL researcher (most recent publications: "Deep Reinforcement Learning in Large Discrete Action Spaces", "Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions", "Reinforcement Learning in a Neurally Controlled Robot Using Dopamine Modulated STDP").
https://blog.google/topics/environment/deepmind-ai-reduces-e...