i was making a nn with just categorical features , hence used nn.Embedding , after which i applied linear layer!
and i found out that the output distribution does not nave 1 as standard deviation, seems to me because embedding are initalized with normal(0,1) distribution and layer with uniform distribution!
hence if the std is not 1 with increasing depth of network std of output must tend towards 0 , hece vanishing gradients!!
so should i change the type of initialization or it will work fine , with no problems??
becuz to me seems to be a bit odd!!!
i tried a deep network of 8 layers and found that output std is 0.09 and same is the gradients std , hence it seems to me a vanishing gradients!