Loss Not Coverging with SGD

Eslearner · February 2, 2019, 2:15pm

Folks , I am relatively new to PyTorch and request your help as loss with SGD is not coverging. If anyone can take a look and advise I would really appreciate. Thanks in advance. Please excuse any formatting errors.

**# Custom Loss Function for Regression "Need to check if the formula is correct"**
**# Loss(h(xn),yn)=Log(1+exp(-yn wt xn)**

class Regress_Loss(nn.modules.Module):    
    def __init__(self):
        super(Regress_Loss,self).__init__()
    def forward(self, outputs, labels):
        batch_size = outputs.size()[0]
        mult = Variable((outputs*labels), requires_grad=True)
        loss = torch.sum(torch.log(1 + torch.exp(-mult)))
        return loss/batch_size

**# SGD**
loss_criteria = Regress_Loss()
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

#Logistic regression model and Loss
model = nn.Linear(input_size,num_classes)

total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):
        # Reshape images to (batch_size, input_size)
        images = images.reshape(-1, 28*28) 
        #Convert labels from 0,1 to -1,1
        labels = Variable(2*(labels.float()-0.5))

    # Forward pass
    outputs = model(images)
    # Getting the max
    oneout = torch.max(outputs.data, 1)[0]
    ## Converting output to 1 and -1 " Not sure if the below statements are correct...
    oneout[oneout < 0] = -1
    oneout[oneout > 0] = 1
    loss = loss_criteria(oneout, labels)       

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()        

    if (i+1) % 100 == 0:
        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

Loss output: 0.8445,0.6883,0.7976,0.8133,0.8289,0.7195 and so forth…

ptrblck · February 2, 2019, 2:47pm

In your loss function you are re-wrapping your inputs into a new Variable which will it from the computation graph. I guess the model parameters are not being updated at all. Also, Variables are deprecated since PyTorch 0.4.0 so you don’t need them anymore.
Could you try to calculate mult as mult = outputs * labels and run it again?

Eslearner · February 2, 2019, 3:02pm

Thanks for your quick reply.
I seem to get a run time error.

RuntimeError Traceback (most recent call last)
in
38 # Backward and optimize
39 optimizer.zero_grad()
—> 40 loss.backward()
41 optimizer.step()
42

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\tensor.py in backward(self, gradient, retain_graph, create_graph)
100 products. Defaults to False.
101 “”"
–> 102 torch.autograd.backward(self, gradient, retain_graph, create_graph)
103
104 def register_hook(self, hook):

~\AppData\Local\Continuum\anaconda3\lib\site-packages\torch\autograd_init_.py in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables)
88 Variable._execution_engine.run_backward(
89 tensors, grad_tensors, retain_graph, create_graph,
—> 90 allow_unreachable=True) # allow_unreachable flag
91
92

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

ptrblck · February 2, 2019, 3:28pm

I guess this error is thrown because you are converting oneout to -1 and 1.
Could you explain your use case a bit?
Maybe torch.tanh would also work if you want to bound your output into this range?

Eslearner · February 2, 2019, 5:37pm

Absolutely. I did take your suggestion and try torch.tanh instead of torch.max and got an error “The size of tensor a (2) must match the size of tensor b (64) at non-singleton dimension 0” when i used mult = outputs * labels in the loss function.
Use case -
a) I am using MNIST dataset and implementing a image classification to classify handwritten digit 0 and 1 only using the below code.
b) Then using Logistic regression model :
model = nn.Linear(input_size,num_classes)
c) Created the custom loss function that is above : Regress_Loss
d) Training the model where in I converted the labels from 0,1 to -1,1:
#Convert labels from 0,1 to -1,1
labels = Variable(2*(labels.float()-0.5))
e) Determining Loss with the below statements:

Forward pass

    outputs = model(images)
    # maximum value of two class prediction 
    oneout = torch.max(outputs.data, 1)[0]**
    ## exploring if converting the outputs to be same as labels (1,-1) will help
    oneout[oneout < 0] = -1
    oneout[oneout > 0] = 1
    loss = loss_criteria(oneout, labels)        
    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()   

   Then I am printing loss.

-----code for bullet (a)------------
subset_indices = ((train_data.train_labels == 0) + (train_data.train_labels == 1)).nonzero().view(-1)

train_loader = torch.utils.data.DataLoader(dataset=train_data,
batch_size=batch_size,
shuffle=False,
sampler=SubsetRandomSampler(subset_indices))

Eslearner · February 2, 2019, 9:05pm

I wanted to add little more output :
labels = Variable(2*(labels.float()-0.5))
print(labels)

tensor([ 1., 1., 1., 1., 1., 1., -1., -1., 1., -1., 1., -1., 1., -1.,
1., -1., 1., -1., -1., 1., -1., 1., 1., 1., 1., 1., 1., -1.,
-1., 1., 1., -1., -1., 1., 1., 1., -1., 1., -1., -1., 1., 1.,
1., 1., 1., 1., -1., -1., 1., -1., 1., -1., 1., 1., 1., 1.,
-1., -1., -1., 1., 1., 1., -1., 1.])
tensor([-1., -1., -1., -1., -1., 1., 1., 1., 1., -1., 1., -1., 1., 1.,
1., 1., 1., 1., 1., 1., -1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., -1., -1., 1., 1., -1., -1., 1., 1., -1., 1., 1.,
-1., 1., 1., 1., -1., 1., 1., -1., -1., -1., 1., 1., 1., 1.,
1.])

#Logistic regression model and Loss
model = nn.Linear(input_size,num_classes)

Forward pass

    outputs = model(images)
    print(outputs)

tensor([[ 0.2859, 0.5506],
[ 0.1532, 0.3645],
[ 0.0835, 0.3317],
[-0.3868, 0.3129],
[ 0.2356, 0.4399],…
…
[-0.5188, 0.3862],
[ 0.0870, 0.2604]], grad_fn=)

oneout = torch.max(outputs.data,1)[0]
print(oneout)
tensor([ 0.1414, 0.9170, 0.2080, 0.1533, 0.4046, 0.1832, 0.6290, 0.5015,
0.0848, 0.0381, 0.6226, -0.2800, 0.3362, 0.2727, 0.1597, 0.0630,
0.3553, 0.0353, 0.1693, -0.0425, 0.1555, 0.1116, 0.0479, 0.3952,
0.0967, -0.0542, 0.1298, -0.2771, 0.7281, -0.3405, -0.0428, 0.1271,
0.0476, -0.1436, -0.0904, 0.1636, -0.1354, 0.1565, 0.3858, -0.0791,
0.2349, 0.2871, -0.1032, 0.0958, 0.0709, 0.2271, 0.0794, 0.0134,
-0.0602, 0.4522, 0.1116, 1.0347, 0.2286, -0.1382, -0.0443, 0.1548,
0.1367, -0.3741, -0.1599, 0.5601, -0.3454, 0.2412, 0.3693, -0.4265])

loss = loss_criteria(oneout, labels)

    # Backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()        

    if (i+1) % 100 == 0:
        print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
               .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

So when I print the loss it does not converge. Any help is much appreciated…