Element 0 of tensors does not require grad and does not have a grad_fn

Shriya_Kak · December 27, 2019, 1:49pm

@ptrblck: Hello Sir, I read your post regarding the issue “Element 0 of tensors does not require grad and does not have a grad_fn” and I am also trying to train the pre-trained model. I am not mentioning the requires_grad to FALSE as I want the network to train and by-default it is TRUE. Here is my code snippet and I am getting the same error for training the model.

Could you please suggest any solution to this problem. It would be very helpful.

  for epoch in range(10):
        running_loss = 0.0
        myNetwork.train()
        for i, data in enumerate(trainloader):
            image = data['image']
            landmarks = data['landmarks']
            image = F.interpolate(image, size=256)

           optimizer.zero_grad()
           output = myNetwork(image)[-1].detach()
           output_2d_landmarks = get_2d_landmarks_from_heatmap(output)

           loss = criterion(output_2d_landmarks, landmarks)
           loss.backward()
           optimizer.step()
           running_loss += loss.item()

          if i % 2000 == 1999:    # print every 2000 mini-batches
               print('[%d, %5d] loss: %.3f' %
              (epoch + 1, i + 1, running_loss / 2000))

ptrblck · December 27, 2019, 6:44pm

You are detaching the output of your model, so that the backward pass will stop at this point. Since it’s the output, no backpropagation will be possible.
What is the reason you are calling detach() on the output?

Shriya_Kak · December 28, 2019, 10:08am

I need to take output of one model as input to other but I am not sure if it will work with detach(). for now I have kept it. Also, I got to know that in pytorch version 1.3.1 the requires_grad is bydefault false and we have to explicitly mention requires_grad = True. Is that correct ? I tried to put my loss.requires_grad = True and it started training. But it is still unclear to me…

ptrblck · December 28, 2019, 9:38pm

If you need to train both models, you shouldn’t call detach on the output of the first model.
detach will stop the gradient calculation, so that your fist model won’t get any gradients.

Tensors are created with requires_grad=False be default.
If you need to have gradients in the input tensors, you would have to set this attribute to True. This is useful for e.g. adversarial training and not necessary in a vanilla classification use case.

Your loss tensor should already require gradients and you shouldn’t have to reset it.
If the output or loss does not require gradients, you might have detached the tensor again at some point.

Shriya_Kak · December 30, 2019, 10:08am

Thank you so much for your replies. It was of great help. :)

Asgard · February 26, 2020, 6:03am

HI ptrblck,

I’ve set the require_grad function to true and still getting the error:

criteria=nn.MSELoss()
model = EC_FCN()
model=model.double()
#print(model)
lr=0.5
e=2

for p in model.parameters():
    print(p)
start=time.time()
epoch=1

for it in range(epoch):
    Y=list()
    
    if not it%5:
        lr=lr/1.5
    
    optimizer=torch.optim.Adam(model.parameters(), lr=lr)
    optimizer.zero_grad()
    
    cols=matrix.shape[0] # nXd matrix (e.g., n=10 here)
    
    for index in range(cols):
        
        val=matrix[index,:]
        #print(val.type())
        
        val.requires_grad_()
        val.double()
       # print("val=",val.requires_grad)
        
        
        out=model(val.double())
        Y.append(out)
    
    
    Y=torch.tensor(Y).view(cols,-1) # Y --> nX1 vector

    Y1 = calculate(torch.tensor(matrix).double(),Y,e)  # Y1--> nX1 vector
    
    loss=criteria(Y1,Y)
    loss.backward()
    optimizer.step()

This is throwing an error:

RuntimeError                              Traceback (most recent call last)
<ipython-input-208-34b5b1c413f9> in <module>
     46 
     47     loss=criteria(Y1,Y)
---> 48     loss.backward()
     49     optimizer.step()
     50 

~\Anaconda3\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
    100                 products. Defaults to ``False``.
    101         """
--> 102         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    103 
    104     def register_hook(self, hook):

~\Anaconda3\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     88     Variable._execution_engine.run_backward(
     89         tensors, grad_tensors, retain_graph, create_graph,
---> 90         allow_unreachable=True)  # allow_unreachable flag
     91 
     92 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Thanks in advance!

P.S.: The forward method is a fully connected network with 3 layers: 10 nodes–> 5 nodes—> 1node

ptrblck · February 26, 2020, 7:19am

If you are wrapping the outputs of your model into a new tensor via torch.tensor(Y), you will break the graph.
Try to use torch.stack instead.

Asgard · February 26, 2020, 7:30am

Yes. My intention was to make a pass through all the rows with each row generating a single valued tensor i.e., out=model(val.double()) ; then append it to a list and do the same for every row (getting n-valued list) and converted to a tensor which will be compared with another n-valued tensor using the loss function.

But, by converting list into a tensor, all the elements of the tensor no longer have a grad_fn associated with them. So, will it work if I did something like:

Y=torch.tensor(Y).view(cols,-1).requires_grad_() # Y --> nX1 vector

I’ll check out your method as well and reply back. Thanks a lot !

ptrblck · February 26, 2020, 7:35am

No, this wouldn’t work, as the newly created tensor Y will require gradients, but still the passed tensor will be detached from the computation graph.
Also, compare your result to torch.cat, as I’m not sure which shape you are expecting.

Tran_Hoang · February 28, 2020, 5:19am

Hi ptrblck,
I also have the same problem where I might detach my output somewhere but I have no idea how to find it. I’d be really appreciated if you can help me. The loss keeps stay constant after i changed requires_grad = true but if I take it out, I have the error element 0 of tensors … Below is the code:

class Network(nn.Module):
    def __init__(self):
        super().__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=6, kernel_size=3)
        #self.conv2 = nn.Conv2d(in_channels=6, out_channels=12, kernel_size=2)
        
        self.fc1 = nn.Linear(in_features=6*2*8, out_features=120)
        self.fc2 = nn.Linear(in_features=120, out_features=60)
        self.out = nn.Linear(in_features=60, out_features = 1)
        
    def forward(self, t):
        # (1) input layer
        t = t

        # (2) hidden conv layer
        t = self.conv1(t)
        t = F.relu(t)


        # (3) hidden conv layer
        #t = self.conv2(t)
        #t = F.relu(t)
        

        # (4) hidden linear layer
        #t = t.reshape(-1, 6*2*8)
        t = t.view(t.size(0),-1)
        t = self.fc1(t)
        t = F.relu(t)

        # (5) hidden linear layer
        t = self.fc2(t)
        t = F.relu(t)

        # (6) output layer
        t = torch.tensor(t, dtype = torch.float)
        t = self.out(t)
        #t = F.softmax(t, dim=1)
        return t

def train(epoch):
    model.train()
    tr_loss = 0
    # getting the training set
    x_train, y_train = Variable(train_x), Variable(train_y)
    # getting the validation set
    x_val, y_val = Variable(val_x), Variable(val_y)
    # converting the data into GPU format
    if torch.cuda.is_available():
        x_train = x_train.cuda()
        y_train = y_train.cuda()
        x_val = x_val.cuda()
        y_val = y_val.cuda()

    # clearing the Gradients of the model parameters
   # optimizer.zero_grad()
    
    # prediction for training and validation set
    output_train = model(x_train.float())
    output_val = model(x_val.float())

    # computing the training and validation loss
    
    loss_train = criterion(output_train, y_train)
    loss_val = criterion(output_val, y_val)
    train_losses.append(loss_train)
    val_losses.append(loss_val)

    # computing the updated weights of all the model parameters
    loss_train = Variable(loss_train, requires_grad = True)
    loss_train.backward()
    optimizer.step()
    tr_loss = loss_train.item()
    if epoch%2 == 0:
        # printing the validation loss
        print('Epoch : ',epoch+1, '\t', 'loss :', loss_val)

ptrblck · February 28, 2020, 6:20am

You are detaching t from the computation graph by creating a new tensor in:

# (6) output layer
t = torch.tensor(t, dtype = torch.float)

I’m not sure, if you really need to change the dtype of the tensor, as it should already be a FloatTensor, however if so you should just call t = t.float().

Also, Variables are deprecated since PyTorch 0.4.0, so you can use tensors directly.

PS: I’ve formatted your code for easier readability and you can post code snippets by wrapping them into three backticks ```

arghya · May 4, 2020, 6:18am

following is the code to my model

import torch
import numpy as np
import math


class DeCNN(torch.nn.Module):
    def __init__(self, gen_emb, domain_emb, num_classes=3, dropout=0.5):
        super(DeCNN, self).__init__()

        self.gen_embedding = torch.nn.Embedding(gen_emb.shape[0], gen_emb.shape[1])
        self.gen_embedding.weight = torch.nn.Parameter(torch.from_numpy(gen_emb), requires_grad=False)
        self.domain_embedding = torch.nn.Embedding(domain_emb.shape[0], domain_emb.shape[1])
        self.domain_embedding.weight = torch.nn.Parameter(torch.from_numpy(domain_emb), requires_grad=False)

        self.conv1 = torch.nn.Conv1d(gen_emb.shape[1] + domain_emb.shape[1], 128, 5, padding=2)
        self.conv2 = torch.nn.Conv1d(gen_emb.shape[1] + domain_emb.shape[1], 128, 3, padding=1)
        self.dropout = torch.nn.Dropout(dropout)

        self.conv3 = torch.nn.Conv1d(256, 256, 5, padding=2)
        self.conv4 = torch.nn.Conv1d(256, 256, 5, padding=2)
        self.conv5 = torch.nn.Conv1d(256, 256, 5, padding=2)
        self.linear_ae = torch.nn.Linear(256, num_classes)

    def forward(self, x, x_len):
        x_emb = torch.cat((self.gen_embedding(x), self.domain_embedding(x)), dim=2)
        x_emb = self.dropout(x_emb).transpose(1, 2)
        x_conv = torch.nn.functional.relu(torch.cat((self.conv1(x_emb), self.conv2(x_emb)), dim=1))
        x_conv = self.dropout(x_conv)
        x_conv = torch.nn.functional.relu(self.conv3(x_conv))
        x_conv = self.dropout(x_conv)
        x_conv = torch.nn.functional.relu(self.conv4(x_conv))
        x_conv = self.dropout(x_conv)
        x_conv = torch.nn.functional.relu(self.conv5(x_conv))
        x_conv = x_conv.transpose(1, 2)
        x_logit = self.linear_ae(x_conv)
        return x_logit

Following is the train function:

def train_epoch(model, training_data, optimizer, criterion, device):
    model.train()
    epoch_loss = 0
    epoch_acc = 0
    epoch_precision = 0
    epoch_recall = 0
    epoch_f1 = 0
    for batch in tqdm(training_data, mininterval=2, desc='  - (Training)   ', leave=False):
        # print(batch)
        sequences, targets, sequence_lengths = batch
        sequences = sequences.to(device)
        sequence_lengths = sequence_lengths.to(device)
        targets = targets.to(device)

        optimizer.zero_grad()
        pred = model(sequences, sequence_lengths)
        pred = torch.nn.functional.log_softmax(pred.data)
        pred = pred.permute(0, 2, 1)
        print(pred.size())
        loss = criterion(pred, targets)
        print(loss)
        precision, recall, f1 = get_performance(pred, targets, sequence_lengths)
        loss.backward()
        optimizer.step()

        epoch_loss += float(loss.item())
        epoch_precision += float(precision)
        epoch_recall += float(recall)
        epoch_f1 += float(f1)

    return epoch_loss / len(training_data), epoch_precision / len(
        training_data), epoch_recall / len(training_data), epoch_f1 / len(training_data)

I get the following error:

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

Error doesn’t replicate on deletion of line

pred = torch.nn.functional.log_softmax(pred.data)

Another observation is that with the line above the loss comes out to be a tensor object but without it i get a tensor(-0.0182, grad_fn=<NllLoss2DBackward>) which is what i would ideally want.

However that would lead to other prolblems like negative loss.
@ptrblck Could you help me with what i might be missing out on

ptrblck · May 4, 2020, 6:42am

You are detaching pred from the computation graph by using the .data attribute in:

pred = torch.nn.functional.log_softmax(pred.data)

Pass pred directly to log_softmax.
Also, you should generally not use the .data attribute, as it might have other unwanted side effects.

arghya · May 4, 2020, 6:47am

Thanks!
'll keep in mind not to use .data()

Gaotong_Wu · June 26, 2020, 11:08pm

for epoch in range(6): 
    running_loss = 0.0

 for i, data in enumerate(train_dl, 0):
    # get the inputs; data is a list of [inputs, labels]
    inputs, labels = data
    
    # zero the parameter gradients
    optimizer.zero_grad()

    # forward + backward + optimize
    outputs =(inputs)
    loss = criterion(outputs,labels)
    loss.backward()
    optimizer.step()
    # print statistics
    running_loss += loss.item()
    if i % 2000 == 1999:    # print every 2000 mini-batches
        print('[%d, %5d] loss: %.3f' %
              (epoch + 1, i + 1, running_loss / 2000))
        running_loss = 0.0

I would like to train my model but I got this error. Could someone help me to check what’s wrong?

ptrblck · June 28, 2020, 9:08am

It seems you are just assigning the output to inputs without calling any operations or your model:

outputs =(inputs)

The output should be created through the model or through some operations, which include parameters which require gradients.

yingjiao_liu · July 13, 2020, 4:04am

Hi Sir, I’ve got the same error, “RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn.”
But I do not think I should set requires_grad=True because I am using the LongTensor (batch_a in my case).
I am training a linear model to evaluate how does a pre-trained feature extraction model in speaker identification.

def train_linear_model(config, epoch_i, apc_model, linear_model, optimizer, criterion, device, data_loader):
    linear_model.train()
    losses = [] 
    # x:acoustic input, l:length of acoustic sequence, a:speaker index
    for batch_x, batch_l, batch_a in data_loader:
        batch_x, batch_l, batch_a = setup_inputs(config, batch_x, batch_l, batch_a, device)
        loss, _ = pass_inputs_through_model(config,apc_model, linear_model, criterion, batch_x,
                                            batch_l, batch_a, device)
        losses.append(loss.item())
        print(losses)
        
        optimizer.zero_grad()
        loss.backward()
        grad_norm = torch.nn.utils.clip_grad_norm_(linear_model.parameters(), config.clip_thresh)
        optimizer.step()
    return losses

def setup_inputs(config, batch_x, batch_l, batch_a, device):
    _, indices = torch.sort(batch_l, descending=True)
    batch_x = Variable(batch_x[indices]).to(device)
    batch_l = Variable(batch_l[indices]).to(device)
    batch_a = Variable(batch_a[indices]).to(device)

    return batch_x, batch_l, batch_a

def pass_inputs_through_model(config, apc_model, linear_model, criterion, batch_x, batch_l, batch_a, device, is_finetune=False):
    with torch.set_grad_enabled(is_finetune):
        _, internal_rep = apc_model.forward(batch_x, batch_l)  # last RNN layer

        speaker_scores = linear_model(internal_rep)
        loss_not_reduced = criterion(speaker_scores, batch_a)
        lengths_mask = get_lengths_mask(loss_not_reduced, batch_l, device)
        loss_not_reduced_masked = lengths_mask * loss_not_reduced
        loss = loss_not_reduced_masked.mean()
        return loss, speaker_scores

Here is my dataset:

class Speakers(data.Dataset):
    def __init__(self, audio_path, speaker_frequency, out_dim):
        self.audio_path = audio_path
        self.out_dim = out_dim
        # create speaker to index and speaker to index lookup tables
        self.speaker_to_indx = {}
        self.indx_to_speaker = {}
        for speaker in sorted(speaker_frequency):
            self.speaker_to_indx[speaker] = list(speaker_frequency.keys()).index(speaker)
            self.indx_to_speaker[list(speaker_frequency.keys()).index(speaker)] = speaker

        self.filename = [f.split('.')[0] for f in os.listdir(self.audio_path) if f.endswith('.pt')]  # 2703 files
     
        with open(join(audio_path, 'lengths.pkl'), 'rb') as f:
            self.lengths = pickle.load(f)

    def __len__(self):
        return len(self.filename)

    def __getitem__(self, index):
        # x:Mel features of audio file; l:length of feature; a: speaker index
        x = torch.load(join(self.audio_path, self.filename[index] + ".pt"))
        l = self.lengths[self.filename[index]]
        a = self.get_speaker_target_index(join(self.audio_path, self.filename[index] + ".pt"))
        return x, l, a

ptrblck · July 14, 2020, 2:11am

In pass_inputs_through_model is_finetune is set to False by default, which will disable the gradient calculation.
Could you set this argument to True or alternatively remove the with torch.set_grad_enabled(is_finetune): line?

PS: Variables are deprecated since PyTorch 0.4 so you can use tensors now.

yingjiao_liu · July 17, 2020, 5:21pm

Thank you so much for replying. In the function pass_inputs_through_model() , with torch.set_grad_enabled(is_finetune):line is intended to remove tensor from graph so that don’t backpropagate errors through the upstream APC model. I only need the extraction results from _, internal_rep = apc_model.forward(batch_x, batch_l). The RuntimeError is happened to loss.backward() in train_linear_model. This loss is used for training the downstream linear model.
Do you think what else I should modify?

ptrblck · July 18, 2020, 2:32am

The complete forward pass including the loss calculation won’t be tracked by Autograd, if you keep the context manager to disable the gradient calculation.
If you want to detach the graph at the output of apc_model (internal_rep output), then use:

 _, internal_rep = apc_model(batch_x, batch_l)
interal_rep = internal_rep.detach()