Hi,
Lets say, my input is a sparse vector of 5000 dim. If I declare a normal linear layer
L=nn.Linear(5000,500)
output=L(input_sparse_vector)
and optimize it as usual instead it using torch.sparse, what problem I might have?
will the gradient calculation in every step will be the same?