What are the advantages of using torch.sparse api

Hi,
Lets say, my input is a sparse vector of 5000 dim. If I declare a normal linear layer

L=nn.Linear(5000,500)
output=L(input_sparse_vector)

and optimize it as usual instead it using torch.sparse, what problem I might have?

will the gradient calculation in every step will be the same?