Why my loss function's value doesn't going down?

my model is like below:

class Net(nn.Module):
    
    def __init__(self):
        super().__init__()
        self.fc1 = nn.Linear(4909, 1500)
        self.relu1 = nn.ReLU()
        self.dout = nn.Dropout(0.2)
        self.fc2 = nn.Linear(1500, 300)
        self.prelu = nn.PReLU(1)
        self.out = nn.Linear(300, 1)
        self.out_act = nn.Sigmoid()
        
    def forward(self, input_):
        a1 = self.fc1(input_)
        h1 = self.relu1(a1)
        dout = self.dout(h1)
        a2 = self.fc2(dout)
        h2 = self.prelu(a2)
        a3 = self.out(h2)
        y = self.out_act(a3)
        return y

and… define my model and loss function & optimizer function

model = Net()

criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=0.01, betas=(0.9, 0.999))

now start to train:

def train(epoch):
    model.train()
    for i in range(len(x_train_tensor)):
        optimizer.zero_grad()
        output = model(x_train_tensor[i])
        print(output.data[0], y_train_tensor[i])
        loss = criterion(output.data[0], y_train_tensor[i])
        loss.backward()
        optimizer.step()
        losses.append(loss.data[0])
        print("loss: {}".format(loss.data[0]))

but… output is like below

tensor(0.4984) tensor(1., grad_fn=)
loss: 0.47467532753944397
tensor(0.5021) tensor(1., grad_fn=)
loss: 0.4732956886291504
tensor(0.5000) tensor(0., grad_fn=)
loss: 0.9740557670593262
tensor(0.4942) tensor(1., grad_fn=)

when the label is 1 loss value is almost 0.4, but when a label is 0 loss function up to 0.9. these values repeat and don’t go to low value.

How can I resolve it? thank you!

nn.Sigmoid and nn.BCEWithLogitsLoss don’t fit together.
Either remove the nn.Simgoid or use nn.BCELoss.

3 Likes

if i change the nn.BCEWithLogitsLoss to nn.BCELoss function error was:

RuntimeError: the derivative for ‘target’ is not implemented

but remove nn.Sigmoid and leave BCEWithLogitsLoss there is no error occured

Oh wait, there seems to be another issue.
Could you try to pass the ouput directly to your criterion instead of output.data[0]?

oh… same error… haha

I’m not sure why you get this error.
Your code runs fine for this dummy input and target:

def train(epoch):
    model.train()
    for i in range(len(x)):
        optimizer.zero_grad()
        output = model(x[i])
        print(output, target[i])
        loss = criterion(output, target[i])
        loss.backward()
        optimizer.step()
        losses.append(loss.item())
        print("loss: {}".format(loss.item()))

x = torch.randn(1, 4909)
target = torch.tensor([[1.]])

losses = []
train(0)
1 Like

In theory the error during training is called bias.
Ways to reduce bias

  1. Increase the number of hidden layers

  2. change the NN architecture

  3. Train NN for longer

MAYBE! Threshold you are using for dropout might be too high, which are causing more units/neurons to turn off as you train longer.

2 Likes

I extracted my data from csv file, and define like below:

x_train_tensor = Variable(torch.FloatTensor(x_train.values))
x_test_tensor = Variable(torch.FloatTensor(x_test.values))
y_train_tensor = Variable(torch.Tensor(y_train.values), requires_grad = True)
y_test_tensor = Variable(torch.Tensor(y_test.values), requires_grad = True)

is there any problem ?

Could you try to use torch.from_numpy() to get the tensors instead of wrapping the numpy arrays directly? It’s the recommended way, but I’m not sure if it’s related to this issue.

My experience with this was different, when I was training a test model to test this exact thing I noticed that when I applied nn.Sigmoid on model output and used BCELoss() I got very bad results, my loss actually went to NaN after some iterations.

Similarly when I did not use nn.Sigmoid at model output and used BCEWithLogitsLoss() I agaain got bad results, no more NaN’s but error was not dropping from 0.999

Then I used nn.Sigmoid and BCEWithLogitsLoss() and got expected results, loss was dropping and so was error.

So can you please mention why these two dont go well together whereas my tests showed them working together.

Thanks

Since sigmoid will be applied twice in this (wrong) approach, you might have scaled down the gradients, thus stabilized the training, e.g. if your learning rate was too high.
Here is a small example showing this effect:

model = nn.Sequential(
    nn.Linear(10, 10),
    nn.ReLU(),
    nn.Linear(10, 1)
)

data = torch.randn(1, 10)
target = torch.randint(0, 2, (1, 1)).float()

# 1) nn.BCEWithLogitsLoss
output = model(data)
loss = F.binary_cross_entropy_with_logits(output, target)
loss.backward()

print(model[0].weight.grad.norm())
> tensor(0.1741)
print(model[2].weight.grad.norm())
> tensor(0.2671)

# 2) nn.BCELoss
model.zero_grad()
output = model(data)
loss = F.binary_cross_entropy(torch.sigmoid(output), target)
loss.backward()
print(model[0].weight.grad.norm())
> tensor(0.1741)
print(model[2].weight.grad.norm())
> tensor(0.2671)

# 3) wrong
model.zero_grad()
output = model(data)
loss = F.binary_cross_entropy_with_logits(torch.sigmoid(output), target)
loss.backward()
print(model[0].weight.grad.norm())
> tensor(0.0595)
print(model[2].weight.grad.norm())
> tensor(0.0914)

Your loss might blow up and get eventually a NaN value, e.g. if the learning rate is set too high, which would also fit my assumption.

While applying sigmoid twice might have helped in your use case, I would recommend to try to debug the exploding loss (or NaN values).

2 Likes

Many thanks to giving such a helpful reply to such an old topic :slight_smile: :heart:

This really helped me a lot :slight_smile:

2 Likes