Output of regression model always 0 or 1

chenjesu · February 1, 2020, 9:52pm

I have tried both MSE and KLDiv losses, and everything I can think of / search for online.

The model always starts with a real number output and during training immediately gravitates toward 0 or 1, even though most of the target values are fractional numbers.

Here is my model (apologies for any typos, I am typing this out):

class Net(nn.Module):
   def __init__(self):
      super(Net, self).__init__()
      self.fc1 = nn.Linear(240*120*3, 300)
      self.fc2 = nn.Linear(300,3)
   def forward(self, x):
      x = torch.flatten(x, 1)
      x = self.fc1(x)
      x = F.relu(x)
      x = self.fc2(x)
      output = torch.sigmoid(x,)
      return output

Here is my training loop:

def train(model, device, train_loader, optimizer, lossfn, epoch):
   model.train(False)
   for b_idx, (data, target) in enumerate(train_loader):
      data, target = data.to(device), target.to(device)
      optimizer.zero_grad()
      output = model(data)
      loss = lossfn(output, target)
      print(output, target, loss)
      loss.backward()
      optimizer.step()

The loss function is both nn.KLDivLoss/MSELoss
Both result in the same output for all inputs, and all the outputs are 0 or 1.

MSELoss:

output: tensor([[0., 1., 1.],
                       [0., 1., 1.],
                       [0., 1., 1.],
                       [0., 1., 1.],
                       [0., 1., 1.],
                       [0., 1., 1.],
                    ...]], device='cuda:0', grad_fn=<SigmoidBackward>)
target: tensor([[0.3438, 0.5781, 0.9688],
                       [0.0000, 0.6562, 0.0000],
                       [0.3438, 0.5781, 0.9688],
                       [0.3438, 0.5781, 0.9688],
                       [0.0000, 0.0000, 0.0000],
                       [0.3438, 0.5781, 0.9688],
                    ...]], device='cuda:0')

KLDiv loss is pretty much the same except all of the outputs are [1., 1., 1.] instead of [0., 1., 1.].

Before training, the outputs look much more normal:

output: tensor([[0.5651, 0.4700, 0.5182],
                        [0.5704, 0.4738, 0.5086],
                     ....etc]]

I would guess the issue is with how I’m performing the loss, since the outputs look fine before training.
Any help would be appreciated.

ptrblck · February 2, 2020, 1:26am

I would guess this is the “nature” of the sigmoid activation, as the gradients will have the highest value at f(x=0) = 0.5, which would push the output towards the limits.

Usually you would use this activation for a binary or multi-label classification use case.
If you are dealing with a regression, I would try to remove the output activation and check, if it’ll work.

maze09 · July 15, 2020, 11:39am

Can this be solved by using tanh to restrict the output ?

ptrblck · July 15, 2020, 7:28pm

I don’t know, if this was the issue, as the author of the post never followed up.
Note that tanh would have a similar behavior of the gradient and of course your output would be in the range [-1, 1], but you could try it out for your model.

maze09 · July 15, 2020, 9:16pm

Thank you ! I will give it a try !

chenjesu · July 19, 2020, 2:16am

This was the issue. If I recall correctly, I discretized the output data and treated it like a multiclass classification problem to fix the issue, which wouldn’t work for all problems like this. Using a linear activation for the output layer is a good thing to try if gradients are a problem.