One more thing you can try; that is applying nonlinear activations over linear layers. Here is the discussion which can be helpful.
Link: Loss does not change and weights remain zero - #6 by ptrblck
One more thing you can try; that is applying nonlinear activations over linear layers. Here is the discussion which can be helpful.
Link: Loss does not change and weights remain zero - #6 by ptrblck