A3C error at lstm layer

darleybarreto · June 18, 2017, 3:47pm

Hey guys, need some help here. I’m trying to do an a3c agent and my implementation is based on this one[1].
My model and inputs has the same shape as [1]:

My __init__:

# num_inputs = 4
self.conv1 = nn.Conv2d(num_inputs, 32, 5, stride=1, padding=2)
self.conv2 = nn.Conv2d(32, 32, 5, stride=1, padding=1)
self.conv3 = nn.Conv2d(32, 64, 4, stride=1, padding=1)
self.conv4 = nn.Conv2d(64, 64, 3, stride=1, padding=1)

num_outputs = action_space.n # which is 4

self.lstm = nn.LSTMCell(1024, 512)
self.critic_linear = nn.Linear(512, 1)
self.actor_linear = nn.Linear(512, num_outputs)

And my forward:

inputs, (hx, cx) = inputs
# inputs is something with shape [1, 4, 80, 80]
x = F.relu(F.max_pool2d(self.conv1(inputs), kernel_size=2, stride=2))
x = F.relu(F.max_pool2d(self.conv2(x), kernel_size=2, stride=2))
x = F.relu(F.max_pool2d(self.conv3(x), kernel_size=2, stride=2))
x = F.relu(F.max_pool2d(self.conv4(x), kernel_size=2, stride=2))

x = x.view(x.size(0), -1)

hx, cx = self.lstm(x, (hx, cx))

x = hx

return self.critic_linear(x), self.actor_linear(x), (hx, cx)

I’m getting this error at hx, cx = self.lstm(x, (hx, cx)):

RuntimeError: size mismatch, m1: [1 x 256], m2: [512 x 2048] at /py/conda-bld/pytorch_1490980628440/work/torch/lib/TH/generic/THTensorMath.c:1229

[1] https://github.com/dgriff777/rl_a3c_pytorch/blob/master/model.py

dgriff · June 18, 2017, 6:09pm

hey from looks of it when you are creating the hx, and cx Variables you have set the size incorrectly

u will want something like this:

cx = Variable(torch.zeros(1, 512))
hx = Variable(torch.zeros(1, 512))

As I see you are using my repo as reference it would be those lines in the train.py and test.py file

darleybarreto · June 18, 2017, 7:11pm

You’re wright, I’m following three or so implementations and the error was a silly thing I did. Btw, your implementation is very good. Thank you.

dgriff · June 18, 2017, 8:07pm

No problem and thank you. Yeah I had a lot of fun with it, its truly a great algorithm and one of my main goals was to see its true potential if implemented in which I thought was most optimal way while still keeping the generalized nature of it and it showed great potential.

Small tip so you don’t waste time like I did worrying about the multiprocessing updating of parameters with out locking is with locks the acquiring and releasing locks just take too much time and slow it up too much that it negates any potential benefit. Trust in the Hogwild training! Its far superior.

Anyways have fun and good luck!