Why my loss function's value doesn't going down?

jaeyung1001 · October 15, 2018, 1:31pm

my model is like below:

class Net(nn.Module):
    
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(4909, 1500)
        self.relu1 = nn.ReLU()
        self.dout = nn.Dropout(0.2)
        self.fc2 = nn.Linear(1500, 300)
        self.prelu = nn.PReLU(1)
        self.out = nn.Linear(300, 1)
        self.out_act = nn.Sigmoid()
        
    def forward(self, input_):
        a1 = self.fc1(input_)
        h1 = self.relu1(a1)
        dout = self.dout(h1)
        a2 = self.fc2(dout)
        h2 = self.prelu(a2)
        a3 = self.out(h2)
        y = self.out_act(a3)
        return y

and… define my model and loss function & optimizer function

model = Net()

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01, betas=(0.9, 0.999))

now start to train:

def train(epoch):
    model.train()
    for i in range(len(x_train_tensor)):
        optimizer.zero_grad()
        output = model(x_train_tensor[i])
        print(output.data[0], y_train_tensor[i])
        loss = criterion(output.data[0], y_train_tensor[i])
        loss.backward()
        optimizer.step()
        losses.append(loss.data[0])
        print("loss: {}".format(loss.data[0]))

but… output is like below

tensor(0.4984) tensor(1., grad_fn=)
loss: 0.47467532753944397
tensor(0.5021) tensor(1., grad_fn=)
loss: 0.4732956886291504
tensor(0.5000) tensor(0., grad_fn=)
loss: 0.9740557670593262
tensor(0.4942) tensor(1., grad_fn=)

when the label is 1 loss value is almost 0.4, but when a label is 0 loss function up to 0.9. these values repeat and don’t go to low value.

How can I resolve it? thank you!

ptrblck · October 15, 2018, 1:46pm

nn.Sigmoid and nn.BCEWithLogitsLoss don’t fit together.
Either remove the nn.Simgoid or use nn.BCELoss.

jaeyung1001 · October 15, 2018, 3:19pm

if i change the nn.BCEWithLogitsLoss to nn.BCELoss function error was:

RuntimeError: the derivative for ‘target’ is not implemented

but remove nn.Sigmoid and leave BCEWithLogitsLoss there is no error occured

ptrblck · October 15, 2018, 3:22pm

Oh wait, there seems to be another issue.
Could you try to pass the ouput directly to your criterion instead of output.data[0]?

jaeyung1001 · October 15, 2018, 3:38pm

oh… same error… haha

ptrblck · October 15, 2018, 3:44pm

I’m not sure why you get this error.
Your code runs fine for this dummy input and target:

def train(epoch):
    model.train()
    for i in range(len(x)):
        optimizer.zero_grad()
        output = model(x[i])
        print(output, target[i])
        loss = criterion(output, target[i])
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
        print("loss: {}".format(loss.item()))

x = torch.randn(1, 4909)
target = torch.tensor([[1.]])

losses = []
train(0)

Santhoshnumberone · October 15, 2018, 3:45pm

In theory the error during training is called bias.
Ways to reduce bias

Increase the number of hidden layers
change the NN architecture
Train NN for longer

MAYBE! Threshold you are using for dropout might be too high, which are causing more units/neurons to turn off as you train longer.

jaeyung1001 · October 16, 2018, 1:32am

I extracted my data from csv file, and define like below:

x_train_tensor = Variable(torch.FloatTensor(x_train.values))
x_test_tensor = Variable(torch.FloatTensor(x_test.values))
y_train_tensor = Variable(torch.Tensor(y_train.values), requires_grad = True)
y_test_tensor = Variable(torch.Tensor(y_test.values), requires_grad = True)

is there any problem ?

ptrblck · October 16, 2018, 6:19am

Could you try to use torch.from_numpy() to get the tensors instead of wrapping the numpy arrays directly? It’s the recommended way, but I’m not sure if it’s related to this issue.

phdproblems · September 4, 2019, 2:47pm

My experience with this was different, when I was training a test model to test this exact thing I noticed that when I applied nn.Sigmoid on model output and used BCELoss() I got very bad results, my loss actually went to NaN after some iterations.

Similarly when I did not use nn.Sigmoid at model output and used BCEWithLogitsLoss() I agaain got bad results, no more NaN’s but error was not dropping from 0.999

Then I used nn.Sigmoid and BCEWithLogitsLoss() and got expected results, loss was dropping and so was error.

So can you please mention why these two dont go well together whereas my tests showed them working together.

Thanks

ptrblck · September 4, 2019, 2:58pm

Since sigmoid will be applied twice in this (wrong) approach, you might have scaled down the gradients, thus stabilized the training, e.g. if your learning rate was too high.
Here is a small example showing this effect:

model = nn.Sequential(
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 1)
)

data = torch.randn(1, 10)
target = torch.randint(0, 2, (1, 1)).float()

# 1) nn.BCEWithLogitsLoss
output = model(data)
loss = F.binary_cross_entropy_with_logits(output, target)
loss.backward()

print(model[0].weight.grad.norm())
> tensor(0.1741)
print(model[2].weight.grad.norm())
> tensor(0.2671)

# 2) nn.BCELoss
model.zero_grad()
output = model(data)
loss = F.binary_cross_entropy(torch.sigmoid(output), target)
loss.backward()
print(model[0].weight.grad.norm())
> tensor(0.1741)
print(model[2].weight.grad.norm())
> tensor(0.2671)

# 3) wrong
model.zero_grad()
output = model(data)
loss = F.binary_cross_entropy_with_logits(torch.sigmoid(output), target)
loss.backward()
print(model[0].weight.grad.norm())
> tensor(0.0595)
print(model[2].weight.grad.norm())
> tensor(0.0914)

Your loss might blow up and get eventually a NaN value, e.g. if the learning rate is set too high, which would also fit my assumption.

While applying sigmoid twice might have helped in your use case, I would recommend to try to debug the exploding loss (or NaN values).

phdproblems · September 4, 2019, 3:10pm

Many thanks to giving such a helpful reply to such an old topic

This really helped me a lot