Why tensor cannot be trained initialized all zeros?

Ximeng · July 18, 2022, 1:08am

Hi!

The sudo simulate network looks like:
inut:x
x = x*att
y = mlp(x)

as for att:
att is initialized with all ones, all rand, or all zeros
att = mlp(relu(att))
att = softmax(att)

When I initialized with torch.ones or torch.rand, it can be trained properly. However if I initialize it with all zeros, it remains the same til the end. Just curious, what is the difference between all ones and all zeros as they are exactly the same after softmax operation at very beginning( say [0.25, 0.25, 0.25, 0.25]).

Thank you！

thecho7 · July 18, 2022, 1:28am

If there is no bias, 0 always makes 0.
Think about how the forward/backward propagations work.

Ximeng · July 18, 2022, 1:33am

I get the point. Thanks very much.

thecho7 · July 18, 2022, 1:41am

Thanks. Please check the reply as a solution to make this issue solved