Simple feed forward network with targets between 0 and 1 doesn't learn well

I’m trying to fit this very simple data with targets in the range (0, 1), but even when two values are far apart, like .2 and .45, the loss is very small. So I end up with a very small loss, but the predictions are very wrong. How can improve my predictions?

Here’s my data and the model predictions after training for 1000 epochs with Adam, MSE, and a learning rate of .001. Trying BCELoss didn’t improve the results.

Inputs = [[14.0110, 2.0000], [18.9990, 3.0000], [12.0110, 0.0000], [16.9990, 1.0000]]
Targets = [0.6774895, 0.12747164, 0.02246823, 0.00056751847]
Predictions = [0.37114963, 0.31669545, 0.1016788, 0.09256815]

The last sample’s prediction is 163 times bigger than its target, but the loss is tiny (0.0084).

Here’s my model:

class Net(torch.nn.Module):
    def __init__(self, num_features):
        super().__init__()

        self.linear_hidden_1 = torch.nn.Linear(num_features, num_features * 2)
        self.linear_out = torch.nn.Linear(num_features * 2, 1)

    def forward(self, inputs):
        hidden = torch.nn.ReLU()(self.linear_hidden_1(inputs))

        out = torch.nn.Sigmoid()(self.linear_out(hidden)).flatten()

        return out

I would really appreciate any suggestions how to get it to predict properly.

If your target values have a small values range and are thus also yielding a small loss, you could try to e.g. increase the learning rate. Alternatively, you could also normalize the targets to a “proper” range during training and “unnormalize” the model predictions during evaluation to get the real prediction values again.

If the target range is so small, does it really matter that the loss is small too?

Since updates to the weights have to be small since the values are packed together in such a small range, I feel like the small loss might not be a problem (e.g. if the range was (0,10), the weights & loss might just be scaled up by 10 & 10^2 without changing learning performance much).

I tried to train with targets scaled up to (0,100) but it didn’t noticeably improve performance.