My logsigmoid seems not convergent with torch.nn.Linear(2048,*)

i am using pytorch 1.0(cpu no cuda), python 3.7

i random three 2048-D Tensors ( i , j , k ) ,

first a Linear layer transfer these Tensor to 128-D Tensor (seem others are also exist this problem)

second, calculate loss=-logsigmoid(i*j-i*k)

but, when i*j - i*k < 0 , the loss can increase rapidly, i*j - i*k>0 , the loss can decrease slowly

Theoretically,the loss will be lower in any condition , isn’t it?

here is a simple demo:

import torch as T
from torch.nn import Module, functional as F, Linear
from torch.optim import SGD, Adam
vis = 2048
x = T.randn([vis],requires_grad=True)
W = Linear(vis,128)
z = T.randn([vis])
bb = T.randn([vis])
for i in range(30):
    y = -F.logsigmoid((W(x)).unsqueeze(0).mm((W(z)).unsqueeze(0).t())- (W(x)).unsqueeze(0).mm((W(bb)).unsqueeze(0).t()))
    opt = SGD(W.parameters(),lr=0.07,weight_decay=0.01)

    print((W(x)).unsqueeze(0).mm((W(z)).unsqueeze(0).t()) , (W(x)).unsqueeze(0).mm((W(bb)).unsqueeze(0).t()))

thanks a lot~!

I’ve fixed some minor issues in the code (moved the optimizer before the for loop to avoid recreation etc.) and your code seems to work fine in most runs. Sometimes, probably due to bad init values, the loss exploded, but in most cases your experiment seems to work.

I have added a norm layer to my model, this problem seem to be solved .

Far more,since I am a beginner, the formula is proved can be optimize,why this loss still exploded…

and, how can i predict the loss would explore in my next time? let it explore,and then fix it?

Most likely you could stabilize it using a more suitable weight initialization.
I’m not sure which method would work the best for your model, so you might want to try out some methods.
Here is a small example using xavier_uniform:

def weights_init(m):
    if isinstance(m, nn.Linear):

model = MyModel()

this is a new skill to me , it seem very useful