RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

I’m not how, how your use case is a binary classification, if you are dealing with a number of classes of 64?

Anyway, I just tried the linked code and it’s working. Are you seeing any issues with it?

criterion = LabelSmoothingLoss(64, smoothing=0.1)

x = torch.randn(10, 64, requires_grad=True)
y = torch.randint(0, 64, (10,))

loss = criterion(x, y)

Hey,
I’m having a similar problem with this RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn error.
I’m doing some facial analysis with this network. I have a bit of experience working in pytorch, but I’m currently out of my depth. Since I’m doing this as part of a larger project, I decided to wrap the FAN in a Pytorch Lightning module which, as far as I know, shouldn’t affect how freezing params works. I mention it in case there is a problem with wrapping a network like this:

class LightningFAN(LightningModule):
  def __init__(self):
    LightningModule.__init__(self)
    self._fan = FAN(num_modules=4)
    ...

  def train_freeze(self):
    for i, param in enumerate(self._fan.parameters()):
      param.requires_grad = False
    trainable = [self._fan.l3, self._fan.m3, self._fan.top_m_3, self._fan.conv_last3, self._fan.bn_end3]
    for module in trainable:
      for param in module.parameters():
        param.requires_grad = True

  def forward(self, x):
    return self._fan.forward(x)

  ...
  def other_lightning_module_methods():
    pass
  ...

  def configure_optimizers(self):
    optimizer = RMSprop(self.parameters(), lr=self.hparams.learning_rate, weight_decay=0.0)
    return optimizer

When I call the train method on this, it errors. If I don’t freeze anything it works fine.
As a test, I started unfreezing layers until it trained correctly. I wrote a quick function to see which modules were frozen without problems

for key, val in self._fan.__dict__["_modules"].items():
  print("Key: ", key)
  try:
    print("Weight Requires Grad: ", val.weight.requires_grad)
  except Exception:
    pass
  try:
    print("Weight Grad Function: ", val.weight.grad_fn, val.weight.grad)
  except Exception:
    pass
  try:
    print("Bias Requires Grad: ", val.bias.requires_grad)
  except Exception:
    pass
  try:
    print("Bias Grad Function: ", val.bias.grad_fn, val.bias.grad)
  except Exception:
    pass
  print()

And as an output I get

Key:  conv1
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  bn1
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  conv2

Key:  conv3

Key:  conv4

Key:  m0

Key:  top_m_0

Key:  conv_last0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  bn_end0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  False
Bias Grad Function:  None None

Key:  l0
Weight Requires Grad:  False
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl0
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al0
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m1

Key:  top_m_1

Key:  conv_last1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al1
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m2

Key:  top_m_2

Key:  conv_last2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bl2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  al2
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  m3

Key:  top_m_3

Key:  conv_last3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  bn_end3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

Key:  l3
Weight Requires Grad:  True
Weight Grad Function:  None None
Bias Requires Grad:  True
Bias Grad Function:  None None

The problem actually occurs when I freeze the gradient for the l0 module’s bias. Why would that be? Does this mean that none of the layers after that one are computing gradients? Also, why do no layers have gradient functions at this stage?

The .grad_fn of activations will be populated during the forward pass by default, not of the parameters.
Is your training working fine without using Lightning? If not, could you post an executable code snippet, so that we could have a look?

I am having serious issue with this error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

I think the biggest problem is that it doesn’t tell me which tensor is causing this error. I check and re check and everything seems right. But I can’t seem to find who is causing this. My model seems fine.

Can we have a more informative error message here please? Perhaps once it’s caught it can print the tensor or something to identify who is the culprit?


Addendum:

My error was caused by a torch.no_grad(). I used it but I was doing meta-learning (evaluation/testing phase) so grads are needed for things like MAML…new territory for the bug explored!

regardless, a better error msg would be great.

3 Likes

Hi there,

I’m getting the same error as everyone. And I’m pretty sure it’s in my computation of t. Except I really don’t know how to fix it. I was kind off struggling with the data types of the tensors.

def load_data():
left_foot = np.array(loadmat(‘LFoot1.mat’)[‘LFoot1’]) #this is now x,y,z
left_foot.astype(np.float32)
left_foot = left_foot[300:]
left_foot = sklearn.preprocessing.normalize(left_foot)
dataset_size = len(left_foot)
train_dataset = left_foot[0:int(0.7dataset_size),:]
test_dataset = left_foot[int(0.7
dataset_size):int(0.85dataset_size),:]
validation_dataset = left_foot[int(0.85
dataset_size)::,:]
train_loader = DataLoader(train_dataset, batch_size=int(0.7dataset_size), shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=int(0.15
dataset_size), shuffle=False)
validation_loader = DataLoader(validation_dataset, batch_size=int(0.15*dataset_size), shuffle=False)

true_y0 = torch.tensor(left_foot[0])
t = torch.linspace(start=0, end=dataset_size, steps=500) #332259

return train_loader, true_y0, t #, test_loader, validation_loader

true_y, true_y0, t = load_data()
length_t = np.float32(len(t))
length_y0 = np.float32(len(true_y0))
length_y = np.float32(len(true_y))
batch_size_t = torch.tensor([length_t])
batch_size_y = torch.tensor([length_y])
batch_size_y0 = torch.tensor([length_y0])

Thanks for your help.

if name == ‘main’:

ii = 0

func = ODEFunc()

optimizer = optim.RMSprop(func.parameters(), lr=1e-3)
end = time.time()

time_meter = RunningAverageMeter(0.97)

loss_meter = RunningAverageMeter(0.97)

for itr in range(1, args.niters + 1):
    optimizer.zero_grad()
    pred_y = odeint(func, batch_size_y0, batch_size_t )
    loss = torch.mean(torch.abs(pred_y - batch_size_y))
    loss.backward() #error -> element 0 of tensors does not require grad
    optimizer.step()

    time_meter.update(time.time() - end)
    loss_meter.update(loss.item())

    if itr % args.test_freq == 0:
        with torch.no_grad():
            pred_y = odeint(func, true_y0, t)
            loss = torch.mean(torch.abs(pred_y - true_y))
            print('Iter {:04d} | Total Loss {:.6f}'.format(itr, loss.item()))
            visualize(true_y, pred_y, func, ii)
            ii += 1

    end = time.time()

Could you post or describe what ODEFunc and odeint are doing? Based on the posted code snippet and the description of the error I guess that these methods might internally detach the tensor from the computation graph.

odeint is an ordinary differential equation solver together with ODEFunc this needs to calculate the gradients so that the new y values can be predicted. (I’m trying to make a neural ordinary differential equation model). My data is in x,y,z direction.

Are these method using PyTorch functions and if so did you make sure that all of them are differentiable (you can check each output for a valid .grad_fn). If that’s not the case and you are using other libraries, you would have to write the backward pass manually via custom autograd.Functions as described here.

I used the libraries from others (GitHub - rtqichen/torchdiffeq: Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation. ode_demo.py). And in their script everything works just fine. So I had to rewrite some stuff for my data.

It gives this error:

element 0 of tensors does not require grad and does not have a grad_fn

does that mean I have to place .grad_fn behind some of the outputs? And if so, where?

Hi @pinocchio, it’s great to see that you too are working on MAML. I am also trying to implement MAML, but updation of outer loop theta parameters is a bit tricky. I am not able to figure it out. Can you share a general recipe that you used for implementing MAML? In my case, I have been generating deepcopy of the model for each task.

thanks! Its always the most unexpected silly mistake

Hey thanks your fix fixed Nikronic’s fix :D. worked right away!

I had the same error when loading a vgg13 model and modifying the classifier. The issue was freezing the parameters for backpropagation after modifying the classifier- while it should be the other way around!
This is what solved it for me:

model = models.vgg13(pretrained=True)

#Freeze parameters for backpropogation - done first!
for param in model.parameters():
    param.requires_grad = False

#only now - modify the classifier! 

model.classifier[-1].out_features = 10

Hopefully, this will help! :slight_smile:

Hi ptrblck,

Yes indeed I did torch.tensor()

But it seems that I need to do this to set the device of the variable to ‘cuda’

Is there any way to get around this?

No, you don’t need to rewrap a tensor in order to push it to a specific device and could apply the .cuda() or .to() operation directly on the original tensor.

In my case, I mistakenly put some loss related code inside:
with torch.no_grad():

this is related. Basically you cannot introduce non-differentiable operation.

Hi, I am having the same issue, any help would be appreciated, thanks.
The error disappears when I remove this line:
loss.backward()

atom_num_features = 11    #11 atom features, 5 edge features
embedding_size = 25

class GCN(torch.nn.Module):
    def __init__(self):
        # Init parent
        super(GCN, self).__init__()
        torch.manual_seed(42)

        # 3 GCN layers. Learn info from 3 neighboor hops
        self.initial_conv = GCNConv(atom_num_features, embedding_size)
        self.conv1 = GCNConv(embedding_size, embedding_size)
        self.conv2 = GCNConv(embedding_size, embedding_size)
        self.conv3 = GCNConv(embedding_size, embedding_size)

        # Output layer
        self.out = Linear(embedding_size*2,1)

    def forward(self, x, edge_index, batch_index):
        # First Conv layer
        hidden = self.initial_conv(x, edge_index)
        hidden = tanh(hidden)

        # Other Conv layers
        hidden = self.conv1(hidden, edge_index)
        hidden = tanh(hidden)
        hidden = self.conv2(hidden, edge_index)
        hidden = tanh(hidden)
        hidden = self.conv3(hidden, edge_index)
        hidden = tanh(hidden)

        # Global Pooling (stack different aggregations)
        hidden = torch.cat([gmp(hidden, batch_index),
                            gap(hidden, batch_index)], dim=1)

# Apply a final (linear) classifier.
        out = self.out(hidden)
        return torch.tensor(out,dtype=float),torch.tensor(hidden,dtype=float)


model = GCN()
print(model)
print("Number of parameters: ", sum(p.numel() for p in model.parameters()))


# TRAINING THE GNN

# Root mean squared error
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0007)


# Wrap data in a data loader
data_size = len(dataset)  # size is 2301
NUM_GRAPHS_PER_BATCH = 25
loader = DataLoader(dataset,batch_size=NUM_GRAPHS_PER_BATCH, shuffle=False)

#test_loader = DataLoader(, batch_size=NUM_GRAPHS_PER_BATCH, shuffle=False)

def train(data):
    # Enumerate over the data
    for batch in loader:
      # Reset gradients
      optimizer.zero_grad()
      # Passing the node features and the connection info
      pred, embedding = model(batch.x.float(), batch.edge_index, batch.batch)
      # Calculating the loss and gradients
      loss = loss_fn(pred, batch.y)
      loss.backward()
      # Update using the gradients
      optimizer.step()
    return loss, embedding

print("Starting training...")
losses = []
for epoch in range(2000):
    loss, h = train(dataset)
    losses.append(loss)
    if epoch % 100 == 0:
      print(f"Epoch {epoch} | Train Loss {loss}")


                                                                                                               

Re-wrapping a tensor will detach it from the computation graph in:

return torch.tensor(out,dtype=float),torch.tensor(hidden,dtype=float)

so just return out and hidden directly instead and rerun your code.

I figured it out, thanks