Element 0 of tensors does not require grad and does not have a grad_fn

Hi everybody,

I’ve been trying to debug what is happening but don’t know what’s wrong.

If you need more info let me know.

Regards!

epochs = 10
steps = 0
running_loss = 0
print_every = 5
for epoch in range(epochs):
for inputs, labels in train_loader:
steps += 1

    inputs, labels = inputs.to(device), labels.to(device)

    optimizer.zero_grad()
    logps = model.forward(inputs)
    loss = criterion(logps, labels)
    loss.backward()
    optimizer.step()

    running_loss += loss.item()
    
    if steps % print_every == 0:
        val_loss = 0
        accuracy = 0
        model.eval()
        
        with torch.no_grad():
            for inputs, labels in val_loader:
                inputs, labels = inputs.to(device), labels.to(device)
                logps = model.forward(inputs)
                batch_loss = criterion(logps, labels)
                
                val_loss += batch_loss.item()
                
                ps = torch.exp(logps)
                top_p, top_class = ps.topk(1, dim=1)
                equals = top_class == labels.view(*top_class.shape)
                accuracy += torch.mean(equals.tpye(torch.FloatTensor)).item()
        print('Epoch {}/{}'.format(epoch + 1, epochs))
        print('Train loss: {}'.format(running_loss/print_every))
        print('Val loss: {}'.format(val_loss/len(val_loader)))
        print('Val accuracy: {}'.format(accuracy/len(val_loader)))
        running_loss = 0
        model.train()
3 Likes

Could you post the forward method of your model?
It seems your output gets detached somehow, e.g. by calling detach() directly on a tensor or by leaving PyTorch and using some other library like numpy.
If it’s possible the whole model code would be interesting to see, too.

As a small side note, you shouldn’t call the forward method of your model, but the model directly instead: model(inputs).

4 Likes

Hi ptrblck,

I didn’t know that, thanks!

model = models.resnet50(pretrained=True)

for param in model.parameters():
param.requires_grad = False

model.classifier = nn.Sequential(nn.Linear(2048, 1024),
nn.ReLU(),
nn.Linear(1024, 102),
nn.LogSoftmax(dim=1))

criterion = nn.NLLLoss()
optimizer = optim.Adam(model.classifier.parameters(), lr=0.003)
model.to(device)

Thanks for the code.
It looks like you would like to swap the last linear layer of the pretrained ResNet with your nn.Sequential block.
However, resnet does not use self.classifier as its last layer, but self.fc. This also explains the error, since you are currently setting the required_grad flag to False for each layer, while your newly created model.classifier is not being used.

Try to assign your nn.Sequential block to model.fc instead, which should fix the error.

15 Likes

That makes sense.

It works, thank you for your help!!!

Hello, I have a similar problem here. And I don’t know how to solve it.

model = models.resnet50(pretrained=True)
num_in_features = model.fc.in_features
cls_num = 5
model.fc.out_features = cls_num

for param in model.parameters():
    param.requires_grad = False

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)

best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0

phase = 'train'
for epoch in range(30):
    print('Epoch {}/{}'.format(epoch+1, 30))
    print('-' * 10)

    # Each epoch has a training and validation phase
    for phase in ['train', 'valid']:
        if phase == 'train':
            model.train()  # Set model to training mode
        else:
            model.eval()   # Set model to evaluate mode

        running_loss = 0.0
        running_corrects = 0

        # Iterate over data.
        for inputs, labels in dataloaders[phase]:
            inputs = inputs.to(device)
            labels = labels.to(device)
            
            # zero the parameter gradients
            optimizer.zero_grad()

            # forward
            # track history if only in train
            with torch.set_grad_enabled(phase == 'train'):
                outputs = model(inputs)
                _, preds = torch.max(outputs, 1)
                loss = criterion(outputs, labels)

                # backward + optimize only if in training phase
                if phase == 'train':
                    #sched.step()
                    loss.backward()
                    optimizer.step()
                    ...

and it shows

RuntimeError                              Traceback (most recent call last)
<ipython-input-11-4b93aa6367ee> in <module>
      1 epochs = 40
      2 model.to(device)
----> 3 model = train_model(model, criterion, optimizer, epochs)

<ipython-input-10-608674d4de7a> in train_model(model, criterion, optimizer, num_epochs)
     37                     if phase == 'train':
     38                         #sched.step()
---> 39                         loss.backward()
     40                         optimizer.step()
     41 

D:\Anaconda\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
    105                 products. Defaults to ``False``.
    106         """
--> 107         torch.autograd.backward(self, gradient, retain_graph, create_graph)
    108 
    109     def register_hook(self, hook):

D:\Anaconda\lib\site-packages\torch\autograd\__init__.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
     91     Variable._execution_engine.run_backward(
     92         tensors, grad_tensors, retain_graph, create_graph,
---> 93         allow_unreachable=True)  # allow_unreachable flag
     94 
     95 

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

many thanks

You are setting all parameters to requires_grad = False, thus freezing the model.
If you want to train (some) parameters, you should set requires_grad = True for them (or just skip them as this is the default).

2 Likes

It works. I had a wrong understanding of requires_grad, thanks for your help!

I’m using vgg16 model. I got this error from the block of code
num_ftrs = model_ft.classifier[0].out_features
model_conv.fc = nn.Linear(num_ftrs, 2)

model_conv = model_conv.to(device)

criterion = nn.CrossEntropyLoss()

Observe that only parameters of final layer are being optimized as

opposed to before.

optimizer_conv = optim.SGD(model_conv.fc.parameters(), lr=0.001, momentum=0.9)

Decay LR by a factor of 0.1 every 7 epochs

exp_lr_scheduler = lr_scheduler.StepLR(optimizer_conv, step_size=7, gamma=0.1)

######################################################################

Train and evaluate

^^^^^^^^^^^^^^^^^^

On CPU this will take about half the time compared to previous scenario.

This is expected as gradients don’t need to be computed for most of the

network. However, forward does need to be computed.

model_conv = train_model(model_conv, criterion, optimizer_conv,
exp_lr_scheduler, num_epochs=25)

######################################################################

visualize_model(model_conv)

plt.ioff()
plt.show()
RuntimeError Traceback (most recent call last)

in ()
24
25 model_conv = train_model(model_conv, criterion, optimizer_conv,
—> 26 exp_lr_scheduler, num_epochs=25)
27
28 ######################################################################

2 frames

/usr/local/lib/python3.6/dist-packages/torch/autograd/init.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
91 Variable._execution_engine.run_backward(
92 tensors, grad_tensors, retain_graph, create_graph,
—> 93 allow_unreachable=True) # allow_unreachable flag
94
95

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
Can you please help me how to solve this issue?
Thanks

Could you post the code of train_model?
Maybe the output was detached accidentally.

PS: You can add code snippets by wrapping them in three backticks ``` :wink:

1 Like

I used the example code mentioned in the tutorial except the model I changed to vgg16
Here is the link
https://pytorch.org/tutorials/beginner/transfer_learning_tutorial.html
I changed the line

model_ft = models.vgg16(pretrained=True)
->num_ftrs = model_ft.classifier[0].out_features
model_ft.fc = nn.Linear(num_ftrs, 2)

model_ft = model_ft.to(device)
Do I need to change any other lines to run VGG16 model ?

Do I need to change anoth

vgg16 uses .classifier as the last classification block, not .fc.
Have a look at the architecture using print(model_ft):

...
    (29): ReLU(inplace)
    (30): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  )
  (avgpool): AdaptiveAvgPool2d(output_size=(7, 7))
  (classifier): Sequential(
    (0): Linear(in_features=25088, out_features=4096, bias=True)
    (1): ReLU(inplace)
    (2): Dropout(p=0.5)
    (3): Linear(in_features=4096, out_features=4096, bias=True)
    (4): ReLU(inplace)
    (5): Dropout(p=0.5)
    (6): Linear(in_features=4096, out_features=1000, bias=True)
  )
)

You should thus change the code to:

model_ft.classifier[6] = nn.Linear(4096, 2)

Hi
I have a same issue and I haven’t put any “requies_grad=false”. would you please advise about my codes.

# Model.
model = my_model()


criterium = nn.MSELoss()

# Adam optimizer with learning rate 0.1 and L2 regularization with weight 1e-4.
optimizer = torch.optim.Adam(model.parameters(),lr=0.1, weight_decay=1e-4)
# Set gradient to 0.
optimizer.zero_grad()

# Feed forward.
pred = model(data)
 _, output = torch.max(pred, 1)
output = output.type(torch.FloatTensor)

# Loss calculation.
loss = criterium(output, target)


# Gradient calculation.
loss.backward()

# Print loss every 10 iterations.
if k % 10 == 0:
    print('Loss {:.4f} at iter {:d}'.format(loss.item(), k))

# Model weight modification based on the optimizer.
optimizer.step()

You are detaching output from the computation graph by assigning it to the max. index here:

 _, output = torch.max(pred, 1)

Could you explain your use case a bit?
Would it be possible to reformulate your loss function, e.g. to return a single output value in your model and pass it the MSELoss?

Actually, It was a example code for my original code. I want to define a custom loss function in my original code but I couldn’t. Thus, I have tried to manipulate the loss function in a simple classification problem to solve the issue. may I put the original codes here? I need your kind support regarding the loss function. however, it is too long.

Maybe creating a new topic would be a good idea, as the original issue should be answered. :slight_smile:

hi,

i have the same error! but i dont want to train my model from scratch. i want to use the pretrained model as it is expect only for the fc layer . please let me know how to solve this

Did you freeze some parameters by setting their requires_grad attribute to False?
If so, make sure that your last layer, which you would like to train, contains parameters which require gradients.
If you get stuck, could you post a code snippet so that we can have a look?

ah! i was wrongly freezing the parameters of fc also i guess.

    self.model = models.resnet50(pretrained=True)
    for param in self.model.parameters():
      param.requires_grad = False
    # 30 classes
    num_ftrs = self.model.fc.in_features
    self.model.fc = nn.Linear(num_ftrs, 30)

this works

1 Like

Hello @ptrblck. I use Lovasz_highe (link https://github.com/bermanmaxim/LovaszSoftmax/blob/master/pytorch/lovasz_losses.py )
I have problem RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn
I change function with no error, can you fix error of Lovasz_loss in pytorch 1.12.0. Thanks