I am training a small network, focusing on the task xor. However, sometimes it will output an all-zeros tensor on the training data.

```
train = torch.tensor([[0, 0], [0, 1], [1, 0], [1, 1]], dtype=torch.float32)
label = torch.tensor([0, 1, 1, 0], dtype=torch.float32).reshape(4, 1)
loss_fn = torch.nn.MSELoss()
lr = 0.005
net = torch.nn.Sequential(
torch.nn.Linear(2, 2),
torch.nn.ReLU(),
torch.nn.Linear(2, 1),
torch.nn.ReLU()
)
for i in range(1000):
output = net(train)
loss = loss_fn(output, label)
net.zero_grad()
loss.backward()
# print(loss.item())
with torch.no_grad():
for p in net.parameters():
p.sub_(lr * p.grad)
print(net(train))
```

So is my code wrong? Or just because of the randomly initialized network parameter’s value?