Unreasonable performances of a simple linear policy

alexis-jacq · April 11, 2018, 11:21pm

There is no pytorch here. I just wanted to share the fact that 150 lines of code with numpy and a simple linear policy with a basic SGD can reach such performance in MuJoCo environments:

The algorithm is from Ben Recht’s team (http://www.argmin.net/2018/03/20/mujocoloco/)

My next step is to implement a special pytorch optimizer (or a module?) to make these 150 numpy lines into 50 pytorch lines.

Kamer_Ali_Yuksel · January 14, 2019, 10:45am

That would be great. I am looking forward for your PyTorch implementation of ARS. I found one here:

github.com

LAIRLAB/ARS-experiments/blob/8c832b8c5e996621469436464716234679457cbf/ars/mnist_ars.py

'''
Augmented random search for MNIST
Author: Anirudh Vemula
'''
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable
from envs.mnist.mnist import MNIST
from utils.ars import *
import numpy as np
import random
import ipdb

parser = argparse.ArgumentParser(description='PyTorch MNIST Example')
# Experiment parameters

This file has been truncated. show original

alexis-jacq · January 17, 2019, 9:58am

I did this : https://github.com/alexis-jacq/Pytorch_Policy_Search_Optimizer

So it’s possible to explore ARS performance using other kind of policies than linear using Pytorch tools.
But I did this before version 0.4, it’s probably a bit old-fashion now.

Kamer_Ali_Yuksel · January 17, 2019, 12:59pm

Thank you very much (also for your fast response), this is great. However, it would be even greater if it can also include a simple classification example (rather than RL) where I guess the augmented random search can still be used (at least that is the case in the above example). Because it would be much easier for me to grasp it on such simple classification setting than RL, in which I am newbie; and give it a short in my existing problems. Cheers.