Hello everyone. I am facing an issue. I am explaining what I am trying to do.
I have a Traffic and Road sign dataset that contains 43 classes. I am trying to classify the images. I am using the resnet34 pre-trained model. I have AMD RX6600 GPU that I use for running the model. For running the model on my AMD GPU I am using Pytorch Directml and using this code

import torch_directml
dml = torch_directml.device()

to find the device. Using this dml instance, I push the mode and training data to the GPU. But the problem is weights do not update. After a lot for debugging, I found that the model grad becomes none in the training loop when using GPU. But in the CPU it works totally fine.
When I want to see the grad values, I found this issue. When the model is in CPU, this print statement prints some numbers. But when I run same code in GPU the following error occurs.

you can see p.grad become none. And that’s why when I use the optimizer.step() nothing is updated and the model does not learn anything. Can anyone help me with this issue?
You training loop is wrong, torch.no_grad() turn of dynamic graph (gradient) so of course no gradient is updated.

You use torch.no_grad() when you want to test (or validate) the model not training it.

Take a look at this tutorial.

Ok, so I have figured out something. After one backward pass, I printed the value using this code

for param in base_model.parameters():

when the model is in the CPU, it prints sum numbers.

But when the model is in the GPU this error happens.

AttributeError                            Traceback (most recent call last)
Cell In[91], line 2
      1 for param in base_model.parameters():
----> 2     print((

AttributeError: 'NoneType' object has no attribute 'data'

Some of the model parameters are none. I think this is the root cause of the issue.

My manual one iteration code is given below

num_classes = 2
num_epochs = 40
learning_rate = 1e-4
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.AdamW(base_model.parameters(), lr=learning_rate)

#taking only the first batch
for batch in dataloaders['train']:
    batch = {k: for k, v in batch.items()}

#forward pass
outputs = base_model(**batch)
labels = batch['Type']

#backward pass
loss = criterion(outputs, labels)

#to see any parameters get updated
for param in base_model.parameters():

Can you please tell me why this happens when I run the model in GPU?

Is the model on GPU ? If so, whats the point of .cpu() ?

when the model is on the GPU I cannot print the model param. That’s why I have to use .cpu() to print the value otherwise it throws an error.