Optimizer dont update weights

kiru883 · May 1, 2020, 12:36am

Hello, thank for attention I ran into the following problem, the weights before and after did not change, here is the code:

class fastRCNN(torch.nn.Module):

def __init__(self, params):

    super(fastRCNN, self).__init__()

    # roi pooling size

    self.rsize = params['roi']['output_size']

    ###########################   model architecture

    ###########################   CNN BLOCK

    self.cnn_block = layer.Sequential(

        layer.Conv2d(**params['cnn11']),

        layer.Conv2d(**params['cnn12']),

        layer.AvgPool2d(**params['pool1']),

        ##################################

        layer.Conv2d(**params['cnn21']),

        layer.Conv2d(**params['cnn22']),

        layer.AvgPool2d(**params['pool2']),

        ##################################

        layer.Conv2d(**params['cnn31']),

        layer.Conv2d(**params['cnn32']),

        layer.Conv2d(**params['cnn33']),

        layer.AvgPool2d(**params['pool3']),

        ##################################

        layer.Conv2d(**params['cnn41']),

        layer.Conv2d(**params['cnn42']),

        layer.Conv2d(**params['cnn43']),

        layer.AvgPool2d(**params['pool4']),

    )

    ########### **ROI LAYER** ##########

    ###########################   FC

    self.fc_block = layer.Sequential(

        layer.Linear(**params['dense11']),

        layer.Linear(**params['dense12'])

    )

    ###########################   CLASSIFIER OUTPUT

    self.classifier_output = layer.Sequential(

        layer.Linear(**params['denseclf'])

        #layer.Softmax()

    )

    ###########################   REGRESSOR OUTPUT

    self.regressor_output = layer.Linear(**params['densereg'])

def forward(self, Ximg, Xroi):

    cnn_inp = self.cnn_block(Ximg)

    ###########################

    roi_inp = roi_pool(cnn_inp, Xroi, self.rsize)

    ###########################

    inp = roi_inp.view(383, 512*7*7)

    inp = self.fc_block(inp)

    ###########################

    classif_out = self.classifier_output(inp)

    ###########################

    regress_out = self.regressor_output(inp)

    return torch.cat((classif_out, regress_out), dim=1)

params = {

            "cnn11": {"in_channels":3, "out_channels":64, "kernel_size":3},

            "cnn12": {"in_channels": 64, "out_channels":64,"kernel_size": 3},

            "pool1": {"kernel_size": 3},

            "cnn21": {"in_channels": 64, "out_channels":128, "kernel_size": 3},

            "cnn22": {"in_channels": 128, "out_channels":128, "kernel_size": 3},

            "pool2": {"kernel_size": 3},

            "cnn31": {"in_channels": 128, "out_channels":256, "kernel_size": 3},

            "cnn32": {"in_channels": 256, "out_channels":256 ,"kernel_size": 3},

            "cnn33": {"in_channels": 256, "out_channels":256, "kernel_size": 3},

            "pool3": {"kernel_size": 3},

            "cnn41": {"in_channels": 256, "out_channels":512, "kernel_size": 3},

            "cnn42": {"in_channels": 512, "out_channels":512, "kernel_size": 3},

            "cnn43": {"in_channels": 512, "out_channels":512, "kernel_size": 3},

            "pool4": {"kernel_size": 2},

            "roi": {"output_size": [7, 7]},

            "dense11": {"in_features":512*7*7, "out_features":4096},

            "dense12": {"in_features":4096, "out_features":4096},

            "denseclf": {"in_features":4096, "out_features": 1},

            "densereg": {"in_features":4096, "out_features": 4}

        }

model = fastRCNN(params)

Code for training model:

def lossf(tPred, tTrue):

p = torch.autograd.Variable(tPred[0].data, requires_grad=True)

print("test", p)

b = torch.autograd.Variable(tPred[1:].data, requires_grad=True)

a = torch.autograd.Variable(tTrue[1:].data, requires_grad=True) 

print("test", a)

return torch.sum(torch.abs(b-a)) - torch.log(p)

optimizer.zero_grad()

labels = torch.from_numpy(out[0])

outputs = model(torch.from_numpy(rgb_matr), torch.from_numpy(np.insert(one_img[1], 0, np.arange(one_img[1].shape[0]), axis=1)))

for ypred, ytrue in zip(outputs, labels):

loss = lossf(ypred, ytrue)

loss.backward(retain_graph=True)

optimizer.step()

ptrblck · May 1, 2020, 1:09am

You are detaching the tensor(s) from the computation graph by creating new Variables:

p = torch.autograd.Variable(tPred[0].data, requires_grad=True)
...

Instead of recreating these tensors, you should directly calculate the loss.

Also, Variables are deprecated since PyTorch 0.4so you can use tensors in newer versions. The usage of the.data` attribute is also not recommended, as it might have unwanted side effects.

kiru883 · May 1, 2020, 1:18am

Thank you very much for your reply! I was forced to use p = torch.autograd.Variable(tPred[0].data, requires_grad=True) due to an error one of the variables needed for gradient computation has been modified by an inplace operation… how i can use yPred with clices in my custom loss as torch.log(tPred[0]) + torch.sum(torch.abs(tPred[1:] - tTrue[1:])) ?

ptrblck · May 1, 2020, 1:23am

This loss calculation seems to work for this code snippet:

model = models.resnet50()
out = model(torch.randn(2, 3, 224, 224))
target = torch.zeros(2)

# Clip output, otherwise you might get NaNs
out = torch.clamp(out, 1e-6, out.max().item())
loss = torch.log(out[0]) + torch.sum(torch.abs(out[1:] - target[1:]))
loss.mean().backward()

Note that I had to clamp the output, as I would get NaN values from the log of negative numbers.

kiru883 · May 1, 2020, 1:34am

Thank, and last 2 questions,

How fix “one of the variables needed for gradient computation has been modified by an inplace operation…”, what is inplace operations in torch?
Does backpropagation pass through the function roi_pool from torchvision.ops?

ptrblck · May 1, 2020, 1:39am

Inplace operation manipulate the data in the tensor directly without creating a new return tensor. Such operations would have an underscore such as tensor.sigmoid_(). If you index a tensor and assign a value to it, this error might also be raised, so you should instead create a new tensor using the return values.
Yes, the ROIPool layer has a backward method and thus Autograd can compute the gradient and pass the gradient to the previous layers.

kiru883 · May 1, 2020, 1:46am

Thank for your answers, I will solve the problem.

kiru883 · May 1, 2020, 1:37pm

Hello, I used the ready-made loss function from torch.nn, but error “one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [4096, 1]]” everything is also there, maybe there is a problem in forward method? The model has 2 inputs and two outputs, I do not know what to do.

def forward(self, Ximg, Xroi):
cnn_inp = self.cnn_block(Ximg)
###########################
roi_inp = roi_pool(cnn_inp, Xroi, self.rsize)
###########################
inp = roi_inp.view(500, 51277)
fc_inp = self.fc_block(inp)
###########################
classif_out = self.classifier_output(fc_inp)
###########################
regress_out = self.regressor_output(fc_inp)
output = torch.cat((classif_out, regress_out), dim=1)
return classif_out, regress_out

By the way, the model output has a dimension example (500, 5) and therefore the back propagation should find the gradient and update the weights for each of the output vectors with a dimension (5,) maybe this has some kind of error? Also sometimes an error appears “Function ‘AddmmBackward’ returned nan values in its 2th output”

kiru883 · May 1, 2020, 3:19pm

Therefore, if im summarize and group information:

I get loss from my outputs, loss return scalar:
loss1 = lossf1(clf_output[0], label_clf[0])
loss2 = lossf2(reg_output[0], label_reg[0])
loss = loss1 + loss2
*where clf_output[0] or reg_output[0] vectors
and i get error “Function ‘AddmmBackward’ returned nan values in its 2th output.”
Before as i get error, i place loss scalar in tensor with argument (loss = torch.tensor([loss1+loss2], requires_grad=True))
I return to where I came from, model weights are not updated.

ptrblck · May 2, 2020, 3:54am

Have you checked the input to torch.log? As mentioned before, negative inputs will give you Nan outputs.
This will also detach your tensors and your model won’t get valid gradients. You should not recreate tensors, but use them directly.
See 2.