Reinforcement learning: element 0 of tensors does not require grad and does not have a grad_fn

the exact error is

Variable._execution_engine.run_backward(
		RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn 

its a simple fully connected nn with reinforce algorithm on a cpu. So the loss is simply the mean of q_value * log_probability of all the actions in one batch. I think the problem is that (thats not it) I collect everything as lists, convert them into numpy arrays, calculate the loss and then backpropogate. However I’m not able to get things right. The following code (reinforce part omitted) replicates the error and is exactly what I have done.

import torch
import torch.nn as nn
import torch.nn.functional as F
import statistics
import torch.optim as optim
import numpy as np


class PGN(nn.Module):
    def __init__(self):
        super().__init__()
        self.fc1= nn.Linear(5*5,64)
        self.fc2= nn.Linear(64,64)
        self.fc3= nn.Linear(64,64)
        self.fc4= nn.Linear(64,3)
    def forward(self,x):
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        # x= F.softmax(x, dim=1) 
        return x




net = PGN()
optimizer = optim.Adam(net.parameters(), lr=0.1, eps=1e-3)
for params in net.parameters():
    params.requires_grad = True

# forward pass and collection log probability
state = [torch.Tensor(np.random.rand(25)) for _ in range(10)]
batch_log_probs =[]
for ele in state:
    logit = net(ele)
    prob = F.softmax(logit, dim=0)
    prob = prob.detach().numpy()
    action = np.random.choice(len(prob), p=prob)
    log_prob = F.log_softmax(logit, dim=0)
    log_prob = log_prob[action]
    batch_log_probs.append(log_prob)
q_vals = torch.Tensor(np.random.rand(10))
loss = -q_vals*batch_log_probs
loss = loss/mean()
optimizer.zero_grad() 
loss.backward()
optimizer.step()

Your code isn’t executable and raises an error at:

    loss = -q_vals*batch_log_probs

TypeError: only integer tensors of a single element can be converted to an index

After adding:

batch_log_probs = torch.stack(batch_log_probs)

before the problematic operation and fixing loss/mean() to loss.mean() the code runs fine and doesn’t raise the reported error.

1 Like

Thank you for your reply!!! torch.stack solved my problem!

do you think its faster to append tensors to a list and then convert the list to a tensor using torch.stack or should I just work with tensors only and use torch.cat

for others
the main issue I was facing was that I had a list of tensors (batch_log_probs), each with a grad function. the grad function would disappear when I would convert the list to a tensor.

torch.stack() preserves the grad function