REINFORCE in batch mode

sscommanderh · March 9, 2017, 9:56pm

Hi everyone,
I hope to trian the REINFORCE algorithm in a batch mode with batch size larger than 1. From this example: https://github.com/pytorch/examples/blob/master/reinforcement_learning/reinforce.py

I have a feeling that this code need to be modified:

for action, r in zip(model_Net.saved_actions, rewards):
    action.reinforce(r)
optimizer.zero_grad()
autograd.backward(model_Net.saved_actions, [None for _ in model_Net.saved_actions])

My question is how to pass the ‘r’ and ‘action’ in a batch mode in back-propagation ? It might related to reshape the ‘action’ values in a way to allow back propagation.

Right now I came up with an idea that is to compute ‘r’ and ‘action’ in batch mode in forward passes, but update the gradients sequentially (1 sample at a time) in back-propagations (e.g., run ‘finish_episode’ several times). But it’s obviously not optimal.

Thanks in advance,
Rein

jekbradbury · March 9, 2017, 9:59pm

If the shape of an action variable has batch dimension b, then you can call action.reinforce using a reward that’s either a scalar or a vector of length b.

AjayTalati · March 22, 2017, 10:07am

Hi, are there are any very simple tutorials, or code samples of how to use .reinforce in batch mode?

Nothing, complicated, (or long to train like gym environments or Atari), just say a synthetic linear regression dataset with some noise added?

This would be really helpful to newbs !

Where’s the test code for .reinforce? Probably a good place to start from, if you wanted to write a batch-wise test problem?

OK- update - the .reinforce tests are in here,

github.com

pytorch/pytorch/blob/v0.1.10/test/test_autograd.py

import contextlib
import gc
import sys
import math
import torch
import unittest
from copy import deepcopy
from collections import OrderedDict
from torch.autograd import gradcheck

from common import TestCase, run_tests
from torch.autograd._functions import *
from torch.autograd import Variable, Function

if sys.version_info[0] == 2:
    import cPickle as pickle
else:
    import pickle

PRECISION = 1e-4

This file has been truncated. show original