Model not training

Shiv · July 20, 2017, 6:07am

Hi. I am moving from keras to pytorch. My code is running fine but the model is not training irrespective of different parameters settings. I have not initialised weights of any conv2d layer and pytorch must be doing what it is supposed to do by default. Are the uninitialised weights reason behind this ambiguity? Also, how could I initialise ‘normal distribution’ weights for conv2d?

alexis-jacq · July 20, 2017, 8:07am

For the normal initialisation you can simply do:

my_conv = nn.Conv2d(... )
nn.init.normal(my_conv.weight)

What do you mean by “the model is not training”? Is your loss increasing? Or the parameters remain unchanged?

Shiv · July 20, 2017, 8:19am

My parameters are not changing irrespective of different learning rates.

alexis-jacq · July 20, 2017, 8:33am

Are you sure you are well defining your optimizer with the parameters of your model? And that you are calling .backward() method before doing optimizer.step()? Can you show some lines of code then?

Shiv · July 20, 2017, 8:50am

Here is code:

for i, (train_images_1, train_labels_1) in enumerate(train_loader):

	running_loss = 0.0
	running_corrects = 0
             #wrap them in Variable
            if use_gpu:
            	train_images_1, train_labels_1 = Variable(train_images_1.cuda()), \
             Variable(train_labels_1.cuda())

	

            # zero the parameter gradients
            optimizer.zero_grad()

            # forward
            outputs = cnn(train_images_1)

            _, preds = torch.max(outputs.data, 1)
            loss = criterion(outputs, train_labels_1)
            loss.backward()
            optimizer.step()

alexis-jacq · July 20, 2017, 9:09am

Ok, if I understand your targets are the arguments max. This may not work, you should have a vector of zeros with a “one” where you want an activation:

for i, (train_images_1, train_labels_1) in enumerate(train_loader):
  # prepare target
  target = torch.zeros(1, num_labels)
  target[0,train_labels_1] = 1
  
  running_loss = 0.0
  running_corrects = 0
  #wrap them in Variable
  train_images_1 = Variable(train_images_1.cuda(), requires_grad=True)
  target = Variable(target.cuda(), requires_grad=False)
  
  # zero the parameter gradients
  optimizer.zero_grad()

  # forward
  outputs = cnn(train_images_1)
  loss = criterion(outputs, target)
  loss.backward()
  optimizer.step()

Shiv · July 20, 2017, 9:28am

I was using vector before. Then pytorch was treating it as multitarget problem and throwing run time error. I made the above changes too but its not working.

Thanks for your efforts.

Shivam_Chandhok · March 30, 2019, 6:52am

Hey,did you find any solution.
I am facing the same problem

Shivam_Chandhok · March 30, 2019, 6:53am

ptrblck · March 30, 2019, 10:51am

Could you describe your problem a bit and what you’ve tried so far?
Do you get an error or is the model not learning at all?
If you have a working Keras model, we could try to compare both implementations and look for differences and code bugs.

vijaytida · June 29, 2019, 8:46pm

Hi my model is not training properly when I added one layer to the pretrained vgg16 model
Here is the code I modified
num_ftrs = model_conv.classifier[6].out_features
model_conv.add_module(“classifier1”,nn.ReLU(inplace=True))
model_conv.classifier1=nn.Sequential(model_conv.classifier1,nn.Dropout(0.5),nn.Linear(num_ftrs,2))
model_conv.classifier.requires_grad=True
print(model_conv)
model_conv = model_conv.to(device)

when I run the code I got RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn to avoid this I added line loss = Variable(loss, requires_grad=True) to train_model function. Can you please help me?

ptrblck · June 29, 2019, 9:10pm

If you want to replace the last linear layer in model_conv.classifier, you would have to reassign the new nn.Sequential module back to .classifier instead of .classifier1.

vijaytida · June 29, 2019, 10:14pm

Thanks I solved the problem before I used sequential on the basic model which I had problem to access trainable weights.

Prakyath_Kantharaju · August 30, 2021, 1:32am

@ptrblck I saw this issue now. I have a similar problem noted here: Training not working

Can you please help me out with this?