Why tensor cannot be trained initialized all zeros?


The sudo simulate network looks like:
x = x*att
y = mlp(x)

as for att:
att is initialized with all ones, all rand, or all zeros
att = mlp(relu(att))
att = softmax(att)

When I initialized with torch.ones or torch.rand, it can be trained properly. However if I initialize it with all zeros, it remains the same til the end. Just curious, what is the difference between all ones and all zeros as they are exactly the same after softmax operation at very beginning( say [0.25, 0.25, 0.25, 0.25]).

Thank you!

If there is no bias, 0 always makes 0.
Think about how the forward/backward propagations work.

I get the point. Thanks very much.

Thanks. Please check the reply as a solution to make this issue solved