My logsigmoid seems not convergent with torch.nn.Linear(2048,*)

Cherrybruin · December 16, 2018, 3:41am

i am using pytorch 1.0(cpu no cuda), python 3.7

i random three 2048-D Tensors ( i , j , k ) ,

first a Linear layer transfer these Tensor to 128-D Tensor (seem others are also exist this problem)

second, calculate loss=-logsigmoid(i*j-i*k)

but, when i*j - i*k < 0 , the loss can increase rapidly, i*j - i*k>0 , the loss can decrease slowly

Theoretically,the loss will be lower in any condition , isn’t it?

here is a simple demo:

import torch as T
from torch.nn import Module, functional as F, Linear
from torch.optim import SGD, Adam
vis = 2048
x = T.randn([vis],requires_grad=True)
W = Linear(vis,128)
z = T.randn([vis])
bb = T.randn([vis])
for i in range(30):
    
    y = -F.logsigmoid((W(x)).unsqueeze(0).mm((W(z)).unsqueeze(0).t())- (W(x)).unsqueeze(0).mm((W(bb)).unsqueeze(0).t()))
    opt = SGD(W.parameters(),lr=0.07,weight_decay=0.01)

    opt.zero_grad()
    print((W(x)).unsqueeze(0).mm((W(z)).unsqueeze(0).t()) , (W(x)).unsqueeze(0).mm((W(bb)).unsqueeze(0).t()))
    print(y)
    y.backward()
    opt.step()

thanks a lot~!

ptrblck · December 16, 2018, 7:04pm

I’ve fixed some minor issues in the code (moved the optimizer before the for loop to avoid recreation etc.) and your code seems to work fine in most runs. Sometimes, probably due to bad init values, the loss exploded, but in most cases your experiment seems to work.

Cherrybruin · December 17, 2018, 2:19am

I have added a norm layer to my model, this problem seem to be solved .

Far more,since I am a beginner, the formula is proved can be optimize，why this loss still exploded…

and, how can i predict the loss would explore in my next time? let it explore,and then fix it?

ptrblck · December 17, 2018, 9:52am

Most likely you could stabilize it using a more suitable weight initialization.
I’m not sure which method would work the best for your model, so you might want to try out some methods.
Here is a small example using xavier_uniform:

def weights_init(m):
    if isinstance(m, nn.Linear):
        nn.init.xavier_uniform_(m.weight.data)
        nn.init.xavier_uniform_(m.bias.data)
    

model = MyModel()
model.apply(weights_init)

Cherrybruin · December 22, 2018, 5:28pm

this is a new skill to me , it seem very useful