Uhh, not all but maybe most in production. You can use any optimization technique you want on training the weights including things like evolutionary algorithms or simulated annealing which are entirely different from what's listed here. Evolutionary style methods may be SOTA for continuous control reinforcement learning problems... Consider how backprop or hill climbing or LBFG-S performs on something basic like cart pole
I have seen this paper. The blog post does not report, that they were able to train a better agent, than A3C. Only that ES allowed them to use more compute power to train in parallel.