Element 0 of tensors does not require grad and does not have a grad_fn

ptrblck · September 30, 2021, 5:36pm

You have to set the requires_grad attribute on parameters, not an nn.Module.
Change:

model_conv.classifier.requires_grad_=True

to

for param in model_conv.classifier.parameters():
    param.requires_grad = True

TARUN_BHATIA · December 12, 2021, 11:29pm

Hello @ptrblck , I read through the comments however i am still unsure why the similar error occurs for my case as well . Heres my code for training

def train_fn(data_loader, model, optimizer, scheduler):
    model.train()
    total_train_loss = 0
    lst_active_labels = []
    lst_active_preds = []
    
    for batch in tqdm(data_loader, total = len(data_loader)):
        
        b_input_ids = batch[0].cuda()
        b_input_mask = batch[1].cuda()
        b_labels = batch[2].cuda()
        
        # Zero the gradients
        model.zero_grad()
        
        outputs = model(b_input_ids, 
                        attention_mask=b_input_mask,
                        labels=b_labels)

        loss = outputs[0]
        
#         loss.requires_grad =True 
        
        loss.backward()
        
        total_train_loss += loss.item()
        
        torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
        optimizer.step()
        scheduler.step()
       
        labels = b_labels.view(-1) 
        active_logits = outputs[1].view(-1, 2)
        flattened_predictions = torch.argmax(active_logits, axis=1)

        active_accuracy = labels.view(-1) != -100
        labels_tmp = torch.masked_select(labels, active_accuracy) 
        pred_tmp = torch.masked_select(flattened_predictions, active_accuracy) 
        lst_active_labels.extend(labels_tmp.tolist())
        lst_active_preds.extend(pred_tmp.tolist())
    
    avg_f1_score_0=f1_score(lst_active_labels,lst_active_preds,average='binary',pos_label = 0)
    avg_f1_score_1=f1_score(lst_active_labels,lst_active_preds,average='binary',pos_label = 1)     
    avg_accuracy_score=accuracy_score(lst_active_labels,lst_active_preds)
    avg_mcc_score=matthews_corrcoef(lst_active_labels,lst_active_preds)
    return (float(total_train_loss / len(data_loader)),avg_f1_score_0,avg_f1_score_1, avg_accuracy_score, avg_mcc_score,lst_active_labels,lst_active_preds)

The error occurs in loss.backward() .
Thanks in advance

ptrblck · December 13, 2021, 7:23pm

Most likely you are detaching an activation tensor from the computation graph inside the model. Could you post the model definition, please?

TARUN_BHATIA · December 14, 2021, 12:58pm

Thanks @ptrblck for the quick response.
This is my model. I am trying to predict the quality(0/1) of each token in a set to source and translated sentence. Train data is something like [SEP] source_sentence[SEP] target /translated_sentence [SEP] and labels are 0 1 1 0 … for each token in source and target.

class EntityModel(nn.Module):

def __init__(self):
    
    super(EntityModel, self).__init__()
    self.bert = XLMRobertaForTokenClassification.from_pretrained(config.BASE_MODEL,output_attentions = False, output_hidden_states = False, num_labels=2)


def forward(self, ids, attention_mask, labels):
    
    outputs = self.bert(ids,
                            attention_mask = attention_mask,
                            labels = labels,return_dict=False)


    return outputs[0], outputs[1]

ptrblck · December 14, 2021, 7:38pm

I can’t see anything obviously wrong in your code and based on the docs the model should return logits as well as the loss.
Check the .grad_fn attributes of both outputs and make sure they point to a valid backward function (not showing None).
If the .grad_fn of these outputs are indeed set to None than this tensor would be detached from the computation graph and you might need to look into the model itself.

Louis_Dumontet · May 26, 2022, 3:08pm

Hello,

I have a problem. I trained a model and I want to train another model containing the first one without changing the weights of the first one. So here is my code :

class FinalModel(pl.LightningModule):

    def __init__(self, model, class_weights):
        
        super().__init__()
        
        self.model = model
        for param in self.model.parameters():
            param.requires_grad = False
        
        self.linear = nn.Linear(512, 242)

        self.train_loss_tracker = EMATracker()
        self.train_acc_tracker = EMATracker()
        self.train_bal_acc_tracker = EMATracker()
        self.train_ma_prec_tracker = EMATracker()
        self.train_ma_recall_tracker = EMATracker()
        self.train_ma_f1_tracker = EMATracker()
        self.train_mi_prec_tracker = EMATracker()
        self.train_mi_recall_tracker = EMATracker()
        self.train_mi_f1_tracker = EMATracker()
        
        self.class_weights = class_weights

    def forward(self, x):
        x = self.model(x)
        x = self.linear(x)
        return x

    def configure_optimizers(self):
        optimizer = torch.optim.Adam(self.parameters(), lr=3e-4)
        return optimizer

When I start training it, it says : element 0 of tensors does not require grad and does not have a grad_fn.

Can you please help me ?

Regards

ptrblck · May 26, 2022, 4:37pm

Your model seems to work fine if I remove some unneeded stuff:

class FinalModel(nn.Module):
    def __init__(self, model):
        super().__init__()
        self.model = model
        for param in self.model.parameters():
            param.requires_grad = False
        
        self.linear = nn.Linear(512, 242)

    def forward(self, x):
        x = self.model(x)
        x = self.linear(x)
        return x


tmp = models.resnet18()
tmp.fc = nn.Linear(512, 512)
model = FinalModel(model=tmp)

x = torch.randn(1, 3, 224, 224)
out = model(x)
print(out.grad_fn)
# <AddmmBackward0 object at 0x7f00ad895940>

out.mean().backward() # works

so I assume something else might be causing the issue in your setup. E.g. did you globally disable gradient computation?

Louis_Dumontet · May 26, 2022, 6:13pm

I am sorry, you are right. The problem was somewhere else. Sorry for the inconvenience !

mendi · July 9, 2022, 3:09am

Try:
torch.set_grad_enabled(True)

Mark_Seagoe · October 1, 2022, 9:52pm

I have this same error… Hoping someone can help me ID whatever is causing detach!
Thanks!
Mark

classifier = nn.Sequential(OrderedDict([
    ('fc1', nn.Linear(input_size, hidden_units)),
    ('relu', nn.ReLU()),
    ('fc2', nn.Linear(hidden_units, 256)),
    ('relu', nn.ReLU()),
    ('fc3', nn.Linear(256, 102)),
    ('output', nn.LogSoftmax(dim=1))
    ]))

model.classifier = classifier

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=learning_rate)
# epochs = 10

train_losses = []
valid_losses = []
valid_accuracies = []

def train():
    print('\nTraining classifier for {}.  Please wait...'.format(arch))
    model.to(device)
    for e in range(epochs):
        # print('Training Epoch {}...\r'.format(e+1), end =" ")
        # epoch timing
        epoch_start = time()
        running_train_loss = 0
        
        # enable dropout and gradient computing
        model.train()
        
        for images, labels in trainloader: # for each image input
            
            # Load the training image and label into GPU
            images, labels = images.to(device), labels.to(device)
            # Fresh start
            optimizer.zero_grad()
            
            # run fwd to get log probabilities
            log_ps = model.forward(images)
            # Check & calculate loss, hopefully improves over time
            train_loss = criterion(log_ps, labels)
            
            # drive the loss backward into the model
            train_loss.backward()
            # take a step and adjust weights
            optimizer.step()
            
            running_train_loss += train_loss.item()
            
        else: # loss check using validation data
            img_num = 0
            running_valid_loss = 0
            running_valid_accuracy = 0
            accuracy = 0
            
            # disable dropout
            model.eval()
            
            # gradient computing needs to be turned off explicitly
            with torch.no_grad():
                for images, labels in validloader:
                    
                    # print('Validing Epoch {} image {}...\r'.format(e+1, img_num+1), end =" ")
                    
                    # Load the validation image and label into GPU
                    images, labels = images.to(device), labels.to(device)
                    
                    # run fwd to get log probabilities
                    log_ps = model.forward(images)
                    # probabilities
                    ps = torch.exp(log_ps)
                    
                    # calculations for validation loss
                    valid_loss = criterion(log_ps, labels)
                    running_valid_loss += valid_loss.item()
                    
                    # calculations for accuracy
                    top_p, top_class = ps.topk(1, dim=1)
                    equals = top_class == labels.view(*top_class.shape)
                    running_valid_accuracy += torch.mean(equals.type(torch.FloatTensor))
                    
            # epoch run time
            epoch_end = time()
            epoch_time = epoch_end - epoch_start
            
            # save epoch results
            train_losses.append(running_train_loss/len(trainloader))
            valid_losses.append(running_valid_loss/len(validloader))
            valid_accuracies.append(running_valid_accuracy/len(validloader))
            
            # print epoch results
            print(
                'Epoch# {}/{}: '.format(e+1, epochs),
                '{:.2f} sec. '.format(epoch_time),
                'Train Loss: {:.3f}. '.format(train_losses[-1]),
                'Validation Loss: {:.3f}. '.format(valid_losses[-1]),
                'Accuracy: {:.1f}%'.format(100.0*valid_accuracies[-1])
            )
    print('Training completed.')

ptrblck · October 1, 2022, 9:55pm

Your general code looks alright and I cannot reproduce the issue by adding some missing pieces:

model = models.vgg16()
input_size = model.classifier[0].in_features
hidden_units = 512

classifier = nn.Sequential(OrderedDict([
    ('fc1', nn.Linear(input_size, hidden_units)),
    ('relu', nn.ReLU()),
    ('fc2', nn.Linear(hidden_units, 256)),
    ('relu', nn.ReLU()),
    ('fc3', nn.Linear(256, 102)),
    ('output', nn.LogSoftmax(dim=1))
]))

model.classifier = classifier

criterion = nn.NLLLoss()
optimizer = torch.optim.Adam(model.classifier.parameters(), lr=1e-3)

dataset = TensorDataset(torch.randn(10, 3, 224, 224), torch.randint(0, 102, (10,)))
trainloader = DataLoader(dataset)
device = 'cpu'

def train():
    for e in range(10):        
        running_train_loss = 0
        model.train()
        for images, labels in trainloader: # for each image input
            
            # Load the training image and label into GPU
            images, labels = images.to(device), labels.to(device)
            # Fresh start
            optimizer.zero_grad()
            
            # run fwd to get log probabilities
            log_ps = model.forward(images)
            # Check & calculate loss, hopefully improves over time
            train_loss = criterion(log_ps, labels)
            
            # drive the loss backward into the model
            train_loss.backward()
            # take a step and adjust weights
            optimizer.step()
        print(f"Epoch {e}, loss {train_loss.item()}")
            
train()

Could you check what the difference between our code snippets might be?

Mark_Seagoe · October 1, 2022, 10:35pm

They look the same basically. Mine chokes on the loss.backward(), BTW. I should also mention that it did not choke on vgg16, but for this Udacity project it says to try with other pretrained models, and it died when I tried resnet18. I wouldn’t know if that’s useful information. Thanks!

ptrblck · October 1, 2022, 10:45pm

Yes, that’s useful, since other models might not use the .classifier attribute.
I assume you are freezing all parameters of the resnet and then want to train the newly initialized .classifier afterwards.
However, note that resnet18 uses .fc for the last layer:

model = models.resnet18()
print(model)
# ...
#  (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
#  (fc): Linear(in_features=512, out_features=1000, bias=True)
# )

model.classifier
# AttributeError: 'ResNet' object has no attribute 'classifier'

# create a *new* classifier attribute which is never used
model.classifier = nn.Linear(1, 1)
print(model)
# ...
#   (avgpool): AdaptiveAvgPool2d(output_size=(1, 1))
#   (fc): Linear(in_features=512, out_features=1000, bias=True)
#   (classifier): Linear(in_features=1, out_features=1, bias=True)
# )

So you are creating a new .classifier attribute which is never used while your output comes from the frozen .fc layer and will thus raise the error.

Mark_Seagoe · October 1, 2022, 10:49pm

Thanks for the tip! I’m going to stick with some other models for now and come back to try to get a shim to get this model working later.

Shulin_Tian · October 4, 2022, 9:28am

Hi Sir, may I ask that: if I want to train an ensembled model, and only want to train the parameters of the first model, but when I do the loss.backward(), it shows error message like “element 0 of tensors does not require grad and does not have a grad_fn”

I have checked the network, the parameters are successfully set (modelA all parameters requires_grad set as True, and modelB all parameters requires_grad set as False)

srishti-git1110 · October 4, 2022, 9:40am

Please post some code that reproduces your error, and make sure the tensors with respect to which you are calculating the gradients are being used in the computation graph of the tensor you wish to differentiate (i.e. in the calculations of the tensor you call backward on).

zzmicer · October 5, 2022, 2:55pm

Hello @ptrblck,
I’ve built latest c++ pytorch from sources with BUILD_MOBILE_AUTOGRAD : ON on ios. However, when I’m trying to reproduce the easiest autograd example, I get an error.

{
    torch::AutoGradMode enable_grad(true);
    
    auto x = torch::ones({2, 2}, torch::requires_grad());
    auto y = x + 2;
    y = y.sum();
    y.backward();
 }

ERROR_LOG: element 0 of tensors does not require grad and does not have a grad_fn

My initial task is to collect the gradients from forward pass in order to preprocess input(to perform forward pass again, and get more robust softmax scores).

Seems like autograd is not working on ios.

ptrblck · October 5, 2022, 7:26pm

I think it’s a known limitation and have heard that changing some build arguments could reenable autograd. However, based on your experience it doesn’t seem to work (anymore). Could you create a GitHub issue so that the code owners could take a look at it, please?

Archie96 · July 3, 2023, 6:04am

I am having the same issue with following https://github.com/gagan3012/keytotext for custom dataset. I am encountering the same error, and I cant traceback the code. Kindly help me to solve this error

/usr/local/lib/python3.10/dist-packages/torch/_tensor.py in backward(self, gradient, retain_graph, create_graph, inputs)
485 inputs=inputs,
486 )
→ 487 torch.autograd.backward(
488 self, gradient, retain_graph, create_graph, inputs=inputs
489 )

/usr/local/lib/python3.10/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
198 # some Python versions print out the first line of a multi-line function
199 # calls in the traceback and some print out the last line
→ 200 Variable.execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
201 tensors, grad_tensors, retain_graph, create_graph, inputs,
202 allow_unreachable=True, accumulate_grad=True) # Calls into the C++ engine to run the backward pass

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

ptrblck · July 3, 2023, 6:14am

Could you post a minimal and executable code snippet reproducing the error, please?