I have tried both MSE and KLDiv losses, and everything I can think of / search for online.
The model always starts with a real number output and during training immediately gravitates toward 0 or 1, even though most of the target values are fractional numbers.
Here is my model (apologies for any typos, I am typing this out):
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(240*120*3, 300)
self.fc2 = nn.Linear(300,3)
def forward(self, x):
x = torch.flatten(x, 1)
x = self.fc1(x)
x = F.relu(x)
x = self.fc2(x)
output = torch.sigmoid(x,)
return output
Here is my training loop:
def train(model, device, train_loader, optimizer, lossfn, epoch):
model.train(False)
for b_idx, (data, target) in enumerate(train_loader):
data, target = data.to(device), target.to(device)
optimizer.zero_grad()
output = model(data)
loss = lossfn(output, target)
print(output, target, loss)
loss.backward()
optimizer.step()
The loss function is both nn.KLDivLoss/MSELoss
Both result in the same output for all inputs, and all the outputs are 0 or 1.
MSELoss:
output: tensor([[0., 1., 1.],
[0., 1., 1.],
[0., 1., 1.],
[0., 1., 1.],
[0., 1., 1.],
[0., 1., 1.],
...]], device='cuda:0', grad_fn=<SigmoidBackward>)
target: tensor([[0.3438, 0.5781, 0.9688],
[0.0000, 0.6562, 0.0000],
[0.3438, 0.5781, 0.9688],
[0.3438, 0.5781, 0.9688],
[0.0000, 0.0000, 0.0000],
[0.3438, 0.5781, 0.9688],
...]], device='cuda:0')
KLDiv loss is pretty much the same except all of the outputs are [1., 1., 1.] instead of [0., 1., 1.].
Before training, the outputs look much more normal:
output: tensor([[0.5651, 0.4700, 0.5182],
[0.5704, 0.4738, 0.5086],
....etc]]
I would guess the issue is with how I’m performing the loss, since the outputs look fine before training.
Any help would be appreciated.