Output of the model depends on the shape of the weights tensor

I want to train the model to sum the three inputs. So it is as simple as possible.

Firstly the weights are initialized randomly. It produces bad error estimate (approx. 0.5)

Then I initialize the weights with zeros. There are two options:

  1. the shape of the weights tensor is [1, 3]
  2. the shape of the weights tensor is [3]

When I choose the 1st option the model still works bad and can’t learn this simple formula.

When I choose the 2nd option it works perfect with the error of 10e-12.

Why the result depends on the shape of the weights? Why do I need to initialize the model with zeros to solve this simple problem?

    import torch
    from torch.nn import Sequential as Seq, Linear as Lin
    from torch.optim.lr_scheduler import ReduceLROnPlateau
    X = torch.rand((1024, 3))
    y = (X[:,0] + X[:,1] + X[:,2])
    m = Seq(Lin(3, 1, bias=False))
    # 1 option
    m[0].weight = torch.nn.parameter.Parameter(torch.tensor([[0, 0, 0]], dtype=torch.float))
    # 2 option
    #m[0].weight = torch.nn.parameter.Parameter(torch.tensor([0, 0, 0], dtype=torch.float))
    optim = torch.optim.SGD(m.parameters(), lr=10e-2)
    scheduler = ReduceLROnPlateau(optim, 'min', factor=0.5, patience=20, verbose=True)
    mse = torch.nn.MSELoss()
    for epoch in range(500):
        out = m(X)
        loss = mse(out, y)
        if epoch % 20 == 0:

In your first approach you will get a warning, which you seem to ignore:

Using a target size (torch.Size([1024])) that is different to the input size (torch.Size([1024, 1])). This will likely lead to incorrect results due to broadcasting. Please ensure they have the same size.

After fixing this issue via:

loss = mse(out, y.unsqueeze(1))

the losses behave equal.