Neural network could not learn on this type of data, why?

Yes, this could work and is sometimes used if the target doesn’t have “optimal” values.
You could normalize the input (which is usually done anyway) and also normalize the target to train the model.
After the training you could unnormalize the predictions using the target stats and calculate the “real loss” based on this new prediction tensor.

Here is a code snippet using your initial dataset, which can successfully learn the data using this approach:

class Network(nn.Module):
    def __init__(self):
        super(Network, self).__init__()
        self.fc1 = nn.Linear(in_features=1, out_features=50, bias=True)
        self.fc2 = nn.Linear(in_features=50, out_features=1, bias=True)
    
    def forward(self, x):
        x = F.relu(self.fc1(x))
        x = self.fc2(x)
        return x
        
model = Network()
optimizer = torch.optim.Adam(model.parameters(), lr=1e-3)
criterion = nn.MSELoss()

input  = torch.tensor([x for x in range(1, 4000)]).float()
target  = torch.tensor([((d**3)) for d in input]).float()
input = input.unsqueeze(1)
target = target.unsqueeze(1)

# normalize input
input_mean = torch.mean(input)
input_std = torch.std(input)
input = input - input_mean
input = input / input_std

# normalize target
target_mean = torch.mean(target)
target_std = torch.std(target)
target_norm = target - target_mean
target_norm = target_norm / target_std

for epoch in range(1000):
    optimizer.zero_grad()
    out = model(input)
    loss = criterion(out, target_norm)
    loss.backward()
    optimizer.step()
    print('Epoch {}, loss {}, max. pred val {}'.format(
        epoch, loss.item(), out.max()))
        
with torch.no_grad():
    pred = model(input)
    # unnormlalize prediction
    pred = pred * target_std
    pred = pred + target_mean

plt.figure(figsize=(16,9))
plt.xlabel('Epoch -->', size=20)
plt.ylabel('Loss -->', size=20)
plt.xticks(size=20)
plt.yticks(size=20)
plt.title('Loss graph')
plt.plot(pred.numpy())
plt.plot(target.numpy())
plt.show()

If you comment the normalization out, you will see that the loss is huge (~1e20) and thus the gradients will also be very large. While this might be alright at the beginning of the training, I think you would have to play around with a learning rate scheduler in order to this the original dataset properly, so I would recommend to take a look at the normalization approach instead.

5 Likes