RuntimeError: element 0 of variables does not require grad and does not have a grad_fn

Assignment · June 9, 2021, 8:05am

Hi there,

I’m getting the same error as everyone. And I’m pretty sure it’s in my computation of t. Except I really don’t know how to fix it. I was kind off struggling with the data types of the tensors.

def load_data():
left_foot = np.array(loadmat(‘LFoot1.mat’)[‘LFoot1’]) #this is now x,y,z
left_foot.astype(np.float32)
left_foot = left_foot[300:]
left_foot = sklearn.preprocessing.normalize(left_foot)
dataset_size = len(left_foot)
train_dataset = left_foot[0:int(0.7dataset_size),:]
test_dataset = left_foot[int(0.7dataset_size):int(0.85dataset_size),:]
validation_dataset = left_foot[int(0.85dataset_size)::,:]
train_loader = DataLoader(train_dataset, batch_size=int(0.7dataset_size), shuffle=False)
test_loader = DataLoader(test_dataset, batch_size=int(0.15dataset_size), shuffle=False)
validation_loader = DataLoader(validation_dataset, batch_size=int(0.15*dataset_size), shuffle=False)

true_y0 = torch.tensor(left_foot[0])
t = torch.linspace(start=0, end=dataset_size, steps=500) #332259

return train_loader, true_y0, t #, test_loader, validation_loader

true_y, true_y0, t = load_data()
length_t = np.float32(len(t))
length_y0 = np.float32(len(true_y0))
length_y = np.float32(len(true_y))
batch_size_t = torch.tensor([length_t])
batch_size_y = torch.tensor([length_y])
batch_size_y0 = torch.tensor([length_y0])

Thanks for your help.

if name == ‘main’:

ii = 0

func = ODEFunc()

optimizer = optim.RMSprop(func.parameters(), lr=1e-3)
end = time.time()

time_meter = RunningAverageMeter(0.97)

loss_meter = RunningAverageMeter(0.97)

for itr in range(1, args.niters + 1):
    optimizer.zero_grad()
    pred_y = odeint(func, batch_size_y0, batch_size_t )
    loss = torch.mean(torch.abs(pred_y - batch_size_y))
    loss.backward() #error -> element 0 of tensors does not require grad
    optimizer.step()

    time_meter.update(time.time() - end)
    loss_meter.update(loss.item())

    if itr % args.test_freq == 0:
        with torch.no_grad():
            pred_y = odeint(func, true_y0, t)
            loss = torch.mean(torch.abs(pred_y - true_y))
            print('Iter {:04d} | Total Loss {:.6f}'.format(itr, loss.item()))
            visualize(true_y, pred_y, func, ii)
            ii += 1

    end = time.time()

ptrblck · June 9, 2021, 9:21am

Could you post or describe what ODEFunc and odeint are doing? Based on the posted code snippet and the description of the error I guess that these methods might internally detach the tensor from the computation graph.

Assignment · June 9, 2021, 9:30am

odeint is an ordinary differential equation solver together with ODEFunc this needs to calculate the gradients so that the new y values can be predicted. (I’m trying to make a neural ordinary differential equation model). My data is in x,y,z direction.

ptrblck · June 9, 2021, 9:33am

Are these method using PyTorch functions and if so did you make sure that all of them are differentiable (you can check each output for a valid .grad_fn). If that’s not the case and you are using other libraries, you would have to write the backward pass manually via custom autograd.Functions as described here.

Assignment · June 9, 2021, 9:38am

I used the libraries from others (GitHub - rtqichen/torchdiffeq: Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation. ode_demo.py). And in their script everything works just fine. So I had to rewrite some stuff for my data.

It gives this error:

element 0 of tensors does not require grad and does not have a grad_fn

does that mean I have to place .grad_fn behind some of the outputs? And if so, where?

asura · June 13, 2021, 4:28am

Hi @pinocchio, it’s great to see that you too are working on MAML. I am also trying to implement MAML, but updation of outer loop theta parameters is a bit tricky. I am not able to figure it out. Can you share a general recipe that you used for implementing MAML? In my case, I have been generating deepcopy of the model for each task.

Prathamesh_Sonawane · July 7, 2021, 8:25pm

thanks! Its always the most unexpected silly mistake

Danyail_Mateen · October 12, 2021, 10:21am

Hey thanks your fix fixed Nikronic’s fix :D. worked right away!

PythonSimplified · October 18, 2021, 5:29pm

I had the same error when loading a vgg13 model and modifying the classifier. The issue was freezing the parameters for backpropagation after modifying the classifier- while it should be the other way around!
This is what solved it for me:

model = models.vgg13(pretrained=True)

#Freeze parameters for backpropogation - done first!
for param in model.parameters():
    param.requires_grad = False

#only now - modify the classifier! 

model.classifier[-1].out_features = 10

Hopefully, this will help!

Chloe_Su · November 10, 2021, 8:20pm

Hi ptrblck,

Yes indeed I did torch.tensor()

But it seems that I need to do this to set the device of the variable to ‘cuda’

Is there any way to get around this?

ptrblck · November 10, 2021, 8:53pm

No, you don’t need to rewrap a tensor in order to push it to a specific device and could apply the .cuda() or .to() operation directly on the original tensor.

RowanG1 · January 28, 2022, 2:01am

In my case, I mistakenly put some loss related code inside:
with torch.no_grad():

yy0125 · April 8, 2022, 2:10am

this is related. Basically you cannot introduce non-differentiable operation.

jimmy1 · June 14, 2022, 3:58am

Hi, I am having the same issue, any help would be appreciated, thanks.
The error disappears when I remove this line:
loss.backward()

atom_num_features = 11    #11 atom features, 5 edge features
embedding_size = 25

class GCN(torch.nn.Module):
    def __init__(self):
        # Init parent
        super(GCN, self).__init__()
        torch.manual_seed(42)

        # 3 GCN layers. Learn info from 3 neighboor hops
        self.initial_conv = GCNConv(atom_num_features, embedding_size)
        self.conv1 = GCNConv(embedding_size, embedding_size)
        self.conv2 = GCNConv(embedding_size, embedding_size)
        self.conv3 = GCNConv(embedding_size, embedding_size)

        # Output layer
        self.out = Linear(embedding_size*2,1)

    def forward(self, x, edge_index, batch_index):
        # First Conv layer
        hidden = self.initial_conv(x, edge_index)
        hidden = tanh(hidden)

        # Other Conv layers
        hidden = self.conv1(hidden, edge_index)
        hidden = tanh(hidden)
        hidden = self.conv2(hidden, edge_index)
        hidden = tanh(hidden)
        hidden = self.conv3(hidden, edge_index)
        hidden = tanh(hidden)

        # Global Pooling (stack different aggregations)
        hidden = torch.cat([gmp(hidden, batch_index),
                            gap(hidden, batch_index)], dim=1)

# Apply a final (linear) classifier.
        out = self.out(hidden)
        return torch.tensor(out,dtype=float),torch.tensor(hidden,dtype=float)


model = GCN()
print(model)
print("Number of parameters: ", sum(p.numel() for p in model.parameters()))


# TRAINING THE GNN

# Root mean squared error
loss_fn = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.0007)


# Wrap data in a data loader
data_size = len(dataset)  # size is 2301
NUM_GRAPHS_PER_BATCH = 25
loader = DataLoader(dataset,batch_size=NUM_GRAPHS_PER_BATCH, shuffle=False)

#test_loader = DataLoader(, batch_size=NUM_GRAPHS_PER_BATCH, shuffle=False)

def train(data):
    # Enumerate over the data
    for batch in loader:
      # Reset gradients
      optimizer.zero_grad()
      # Passing the node features and the connection info
      pred, embedding = model(batch.x.float(), batch.edge_index, batch.batch)
      # Calculating the loss and gradients
      loss = loss_fn(pred, batch.y)
      loss.backward()
      # Update using the gradients
      optimizer.step()
    return loss, embedding

print("Starting training...")
losses = []
for epoch in range(2000):
    loss, h = train(dataset)
    losses.append(loss)
    if epoch % 100 == 0:
      print(f"Epoch {epoch} | Train Loss {loss}")

ptrblck · June 14, 2022, 5:35am

Re-wrapping a tensor will detach it from the computation graph in:

return torch.tensor(out,dtype=float),torch.tensor(hidden,dtype=float)

so just return out and hidden directly instead and rerun your code.

jimmy1 · June 14, 2022, 4:21pm

I figured it out, thanks

mendi · July 9, 2022, 3:10am

torch.set_grad_enabled(True)

Hamze_Asadi · August 12, 2022, 11:09pm

Hi ptrblck

I have the same problem
this is my training procedure
Tnx in advance


for epoch in range(201):
    model.train()
    for (x1, y1), (x2, y2) in zip(src_loader, trg_loader):
        xs = x1.squeeze().t()
        xt = x2.squeeze().t()
        # print(xs.shape, xt.shape)
        z1, z2 = model(xs, xt)
        # print(z1.shape, z2.shape)
        loss = criterion(z1, z2)
        # print(loss.item())
        opt.zero_grad()
        loss.backward()
        opt.step()

        if epoch%100 == 0:
            print(f"epoch={epoch}, loss={loss.item()}")

in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

ptrblck · August 12, 2022, 11:34pm

Could you post a minimal, executable code snippet reproducing this error, please?

Hamze_Asadi · August 13, 2022, 7:22am

the is the code

from torch.optim import Adam
from torch import nn as nn
import numpy as np
import torch
from scipy import linalg


def sSVD(X):
    u, s, v = linalg.svd(X.cpu().detach().numpy(), full_matrices=False, lapack_driver="gesvd") 
    u, s, v = np.nan_to_num(u), np.nan_to_num(s), np.nan_to_num(v)
    return torch.from_numpy(u).float(), torch.from_numpy(s).float(), torch.from_numpy(v).float()


class myLoss(nn.Module):

    def __init__(self):
        super().__init__()
        self.criterion = nn.MSELoss()

    def forward(self, x1, x2):
        u1, s1, v1 = sSVD(x1)    
        u2, s2, v2 = sSVD(x2)
        return self.criterion(s1, s2)



class ComW(nn.Module):
    def __init__(self, input_dim=80):
        super(ComW, self).__init__()
        self.fc = nn.Linear(in_features=80, out_features=80)

    def forward_once(self, x):
        return self.fc(x)

    def forward(self, x1, x2):
        z1 = self.forward_once(x1)
        z2 = self.forward_once(x2)
        return z1, z2


model = ComW()


opt = Adam(params=model.parameters(), lr=3e-4)
criterion = myLoss()

X1 = torch.randn(size=(80, 1, 700))
X2 = torch.randn(size=(80, 1, 700))

for epoch in range(201):
    model.train()
    print(X1.shape)
    xs = X1.squeeze().t()
    xt = X2.squeeze().t()

    z1, z2 = model(xs, xt)
    loss = criterion(z1, z2)
    opt.zero_grad()
    loss.backward()
    opt.step()

    if epoch%100 == 0:
        print(f"epoch={epoch}, loss={loss.item()}")