Sometimes got zeros output when training xor task for a small network

I am training a small network, focusing on the task xor. However, sometimes it will output an all-zeros tensor on the training data.

    train = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
    label = torch.tensor([0, 1, 1, 0], dtype=torch.float32).reshape(4, 1)
    loss_fn = torch.nn.MSELoss()
    lr = 0.005
    net = torch.nn.Sequential(
        torch.nn.Linear(2, 2),
        torch.nn.Linear(2, 1),

    for i in range(1000):

        output = net(train)
        loss = loss_fn(output, label)
        # print(loss.item())

        with torch.no_grad():
            for p in net.parameters():
                p.sub_(lr * p.grad)


So is my code wrong? Or just because of the randomly initialized network parameter’s value?


Are you sure you need the last ReLU in your net?
Also I loosely remember seeing that the point where all weights are 0s is problematic for xor with very small nets.

In Linear's construction, reset_parameter will be called.

    def reset_parameters(self):
        stdv = 1. / math.sqrt(self.weight.size(1)), stdv)
        if self.bias is not None:
  , stdv)

I think the initialized parameters have a low probability to be all 0s. But the all-zeros output happened frequently. I’ve checked the .grad for net parameter in each loop of iteration when the all-zeros output happen, the result is also all-zeros.