RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

Hi there,

I’m getting the same error as everyone. And I’m pretty sure it’s in my computation of t. Except I really don’t know how to fix it. I was kind off struggling with the data types of the tensors.

def load_data():
left_foot = np.array(loadmat(‘LFoot1.mat’)[‘LFoot1’]) #this is now x,y,z
left_foot.astype(np.float32)
left_foot = left_foot[300:]
left_foot = sklearn.preprocessing.normalize(left_foot)
dataset_size = len(left_foot)
train_dataset = left_foot[0:int(0.7dataset_size),:]
test_dataset = left_foot[int(0.7
dataset_size):int(0.85dataset_size),:]
validation_dataset = left_foot[int(0.85
dataset_size)::,:]
train_loader = DataLoader(train_dataset, batch_size=int(0.7dataset_size), shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=int(0.15
dataset_size), shuffle=False)
validation_loader = DataLoader(validation_dataset, batch_size=int(0.15*dataset_size), shuffle=False)

true_y0 = torch.tensor(left_foot[0])
t = torch.linspace(start=0, end=dataset_size, steps=500) #332259

return train_loader, true_y0, t #, test_loader, validation_loader

true_y, true_y0, t = load_data()
length_t = np.float32(len(t))
length_y0 = np.float32(len(true_y0))
length_y = np.float32(len(true_y))
batch_size_t = torch.tensor([length_t])
batch_size_y = torch.tensor([length_y])
batch_size_y0 = torch.tensor([length_y0])

Thanks for your help.

if name == ‘main’:

ii = 0

func = ODEFunc()

optimizer = optim.RMSprop(func.parameters(), lr=1e-3)
end = time.time()

time_meter = RunningAverageMeter(0.97)

loss_meter = RunningAverageMeter(0.97)

for itr in range(1, args.niters + 1):
    optimizer.zero_grad()
    pred_y = odeint(func, batch_size_y0, batch_size_t )
    loss = torch.mean(torch.abs(pred_y - batch_size_y))
    loss.backward() #error -> element 0 of tensors does not require grad
    optimizer.step()

    time_meter.update(time.time() - end)
    loss_meter.update(loss.item())

    if itr % args.test_freq == 0:
        with torch.no_grad():
            pred_y = odeint(func, true_y0, t)
            loss = torch.mean(torch.abs(pred_y - true_y))
            print('Iter {:04d} | Total Loss {:.6f}'.format(itr, loss.item()))
            visualize(true_y, pred_y, func, ii)
            ii += 1

    end = time.time()

Could you post or describe what ODEFunc and odeint are doing? Based on the posted code snippet and the description of the error I guess that these methods might internally detach the tensor from the computation graph.

odeint is an ordinary differential equation solver together with ODEFunc this needs to calculate the gradients so that the new y values can be predicted. (I’m trying to make a neural ordinary differential equation model). My data is in x,y,z direction.

Are these method using PyTorch functions and if so did you make sure that all of them are differentiable (you can check each output for a valid .grad_fn). If that’s not the case and you are using other libraries, you would have to write the backward pass manually via custom autograd.Functions as described here.

I used the libraries from others (GitHub - rtqichen/torchdiffeq: Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation. ode_demo.py). And in their script everything works just fine. So I had to rewrite some stuff for my data.

It gives this error:

element 0 of tensors does not require grad and does not have a grad_fn

does that mean I have to place .grad_fn behind some of the outputs? And if so, where?

Hi @pinocchio, it’s great to see that you too are working on MAML. I am also trying to implement MAML, but updation of outer loop theta parameters is a bit tricky. I am not able to figure it out. Can you share a general recipe that you used for implementing MAML? In my case, I have been generating deepcopy of the model for each task.

thanks! Its always the most unexpected silly mistake

Hey thanks your fix fixed Nikronic’s fix :D. worked right away!

I had the same error when loading a vgg13 model and modifying the classifier. The issue was freezing the parameters for backpropagation after modifying the classifier- while it should be the other way around!
This is what solved it for me:

model = models.vgg13(pretrained=True)

#Freeze parameters for backpropogation - done first!
for param in model.parameters():
    param.requires_grad = False

#only now - modify the classifier! 

model.classifier[-1].out_features = 10

Hopefully, this will help! :slight_smile:

1 Like

Hi ptrblck,

Yes indeed I did torch.tensor()

But it seems that I need to do this to set the device of the variable to ‘cuda’

Is there any way to get around this?

1 Like

No, you don’t need to rewrap a tensor in order to push it to a specific device and could apply the .cuda() or .to() operation directly on the original tensor.

In my case, I mistakenly put some loss related code inside:
with torch.no_grad():

1 Like

this is related. Basically you cannot introduce non-differentiable operation.

Hi, I am having the same issue, any help would be appreciated, thanks.
The error disappears when I remove this line:
loss.backward()

atom_num_features = 11    #11 atom features, 5 edge features
embedding_size = 25

class GCN(torch.nn.Module):
    def __init__(self):
        # Init parent
        super(GCN, self).__init__()
        torch.manual_seed(42)

        # 3 GCN layers. Learn info from 3 neighboor hops
        self.initial_conv = GCNConv(atom_num_features, embedding_size)
        self.conv1 = GCNConv(embedding_size, embedding_size)
        self.conv2 = GCNConv(embedding_size, embedding_size)
        self.conv3 = GCNConv(embedding_size, embedding_size)

        # Output layer
        self.out = Linear(embedding_size*2,1)

    def forward(self, x, edge_index, batch_index):
        # First Conv layer
        hidden = self.initial_conv(x, edge_index)
        hidden = tanh(hidden)

        # Other Conv layers
        hidden = self.conv1(hidden, edge_index)
        hidden = tanh(hidden)
        hidden = self.conv2(hidden, edge_index)
        hidden = tanh(hidden)
        hidden = self.conv3(hidden, edge_index)
        hidden = tanh(hidden)

        # Global Pooling (stack different aggregations)
        hidden = torch.cat([gmp(hidden, batch_index),
                            gap(hidden, batch_index)], dim=1)

# Apply a final (linear) classifier.
        out = self.out(hidden)
        return torch.tensor(out,dtype=float),torch.tensor(hidden,dtype=float)


model = GCN()
print(model)
print("Number of parameters: ", sum(p.numel() for p in model.parameters()))


# TRAINING THE GNN

# Root mean squared error
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0007)


# Wrap data in a data loader
data_size = len(dataset)  # size is 2301
NUM_GRAPHS_PER_BATCH = 25
loader = DataLoader(dataset,batch_size=NUM_GRAPHS_PER_BATCH, shuffle=False)

#test_loader = DataLoader(, batch_size=NUM_GRAPHS_PER_BATCH, shuffle=False)

def train(data):
    # Enumerate over the data
    for batch in loader:
      # Reset gradients
      optimizer.zero_grad()
      # Passing the node features and the connection info
      pred, embedding = model(batch.x.float(), batch.edge_index, batch.batch)
      # Calculating the loss and gradients
      loss = loss_fn(pred, batch.y)
      loss.backward()
      # Update using the gradients
      optimizer.step()
    return loss, embedding

print("Starting training...")
losses = []
for epoch in range(2000):
    loss, h = train(dataset)
    losses.append(loss)
    if epoch % 100 == 0:
      print(f"Epoch {epoch} | Train Loss {loss}")


                                                                                                               

Re-wrapping a tensor will detach it from the computation graph in:

return torch.tensor(out,dtype=float),torch.tensor(hidden,dtype=float)

so just return out and hidden directly instead and rerun your code.

1 Like

I figured it out, thanks

torch.set_grad_enabled(True)

Hi ptrblck

I have the same problem
this is my training procedure
Tnx in advance


for epoch in range(201):
    model.train()
    for (x1, y1), (x2, y2) in zip(src_loader, trg_loader):
        xs = x1.squeeze().t()
        xt = x2.squeeze().t()
        # print(xs.shape, xt.shape)
        z1, z2 = model(xs, xt)
        # print(z1.shape, z2.shape)
        loss = criterion(z1, z2)
        # print(loss.item())
        opt.zero_grad()
        loss.backward()
        opt.step()

        if epoch%100 == 0:
            print(f"epoch={epoch}, loss={loss.item()}")
in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Could you post a minimal, executable code snippet reproducing this error, please?

the is the code

from torch.optim import Adam
from torch import nn as nn
import numpy as np
import torch
from scipy import linalg


def sSVD(X):
    u, s, v = linalg.svd(X.cpu().detach().numpy(), full_matrices=False, lapack_driver="gesvd") 
    u, s, v = np.nan_to_num(u), np.nan_to_num(s), np.nan_to_num(v)
    return torch.from_numpy(u).float(), torch.from_numpy(s).float(), torch.from_numpy(v).float()


class myLoss(nn.Module):

    def __init__(self):
        super().__init__()
        self.criterion = nn.MSELoss()

    def forward(self, x1, x2):
        u1, s1, v1 = sSVD(x1)    
        u2, s2, v2 = sSVD(x2)
        return self.criterion(s1, s2)



class ComW(nn.Module):
    def __init__(self, input_dim=80):
        super(ComW, self).__init__()
        self.fc = nn.Linear(in_features=80, out_features=80)

    def forward_once(self, x):
        return self.fc(x)

    def forward(self, x1, x2):
        z1 = self.forward_once(x1)
        z2 = self.forward_once(x2)
        return z1, z2


model = ComW()


opt = Adam(params=model.parameters(), lr=3e-4)
criterion = myLoss()

X1 = torch.randn(size=(80, 1, 700))
X2 = torch.randn(size=(80, 1, 700))

for epoch in range(201):
    model.train()
    print(X1.shape)
    xs = X1.squeeze().t()
    xt = X2.squeeze().t()

    z1, z2 = model(xs, xt)
    loss = criterion(z1, z2)
    opt.zero_grad()
    loss.backward()
    opt.step()

    if epoch%100 == 0:
        print(f"epoch={epoch}, loss={loss.item()}")