Unreasonable performances of a simple linear policy

There is no pytorch here. I just wanted to share the fact that 150 lines of code with numpy and a simple linear policy with a basic SGD can reach such performance in MuJoCo environments:

The algorithm is from Ben Recht’s team (http://www.argmin.net/2018/03/20/mujocoloco/)

My next step is to implement a special pytorch optimizer (or a module?) to make these 150 numpy lines into 50 pytorch lines.

That would be great. I am looking forward for your PyTorch implementation of ARS. I found one here:

I did this : https://github.com/alexis-jacq/Pytorch_Policy_Search_Optimizer

So it’s possible to explore ARS performance using other kind of policies than linear using Pytorch tools.
But I did this before version 0.4, it’s probably a bit old-fashion now.

Thank you very much (also for your fast response), this is great. However, it would be even greater if it can also include a simple classification example (rather than RL) where I guess the augmented random search can still be used (at least that is the case in the above example). Because it would be much easier for me to grasp it on such simple classification setting than RL, in which I am newbie; and give it a short in my existing problems. Cheers.