REINFORCE in batch mode

Hi everyone,
I hope to trian the REINFORCE algorithm in a batch mode with batch size larger than 1. From this example:

I have a feeling that this code need to be modified:

for action, r in zip(model_Net.saved_actions, rewards):
autograd.backward(model_Net.saved_actions, [None for _ in model_Net.saved_actions])

My question is how to pass the ‘r’ and ‘action’ in a batch mode in back-propagation ? It might related to reshape the ‘action’ values in a way to allow back propagation.

Right now I came up with an idea that is to compute ‘r’ and ‘action’ in batch mode in forward passes, but update the gradients sequentially (1 sample at a time) in back-propagations (e.g., run ‘finish_episode’ several times). But it’s obviously not optimal.

Thanks in advance,

1 Like

If the shape of an action variable has batch dimension b, then you can call action.reinforce using a reward that’s either a scalar or a vector of length b.


Hi, are there are any very simple tutorials, or code samples of how to use .reinforce in batch mode?

Nothing, complicated, (or long to train like gym environments or Atari), just say a synthetic linear regression dataset with some noise added?

This would be really helpful to newbs !

Where’s the test code for .reinforce? Probably a good place to start from, if you wanted to write a batch-wise test problem?

OK- update - the .reinforce tests are in here,