the exact error is
Variable._execution_engine.run_backward( RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
its a simple fully connected nn with reinforce algorithm on a cpu. So the loss is simply the mean of q_value * log_probability of all the actions in one batch.
I think the problem is that (thats not it) I collect everything as lists, convert them into numpy arrays, calculate the loss and then backpropogate. However I’m not able to get things right. The following code (reinforce part omitted) replicates the error and is exactly what I have done.
import torch import torch.nn as nn import torch.nn.functional as F import statistics import torch.optim as optim import numpy as np class PGN(nn.Module): def __init__(self): super().__init__() self.fc1= nn.Linear(5*5,64) self.fc2= nn.Linear(64,64) self.fc3= nn.Linear(64,64) self.fc4= nn.Linear(64,3) def forward(self,x): x = F.relu(self.fc1(x)) x = F.relu(self.fc2(x)) x = F.relu(self.fc3(x)) x = self.fc4(x) # x= F.softmax(x, dim=1) return x net = PGN() optimizer = optim.Adam(net.parameters(), lr=0.1, eps=1e-3) for params in net.parameters(): params.requires_grad = True # forward pass and collection log probability state = [torch.Tensor(np.random.rand(25)) for _ in range(10)] batch_log_probs = for ele in state: logit = net(ele) prob = F.softmax(logit, dim=0) prob = prob.detach().numpy() action = np.random.choice(len(prob), p=prob) log_prob = F.log_softmax(logit, dim=0) log_prob = log_prob[action] batch_log_probs.append(log_prob) q_vals = torch.Tensor(np.random.rand(10)) loss = -q_vals*batch_log_probs loss = loss/mean() optimizer.zero_grad() loss.backward() optimizer.step()