ValueError: Sparse params at indices [0, 1]: SparseAdam requires dense parameter tensors

when I initialize my nn.Module-Layer with a sparse weight matrix (as a nn.Parameter).

I wonder wether this is an error or indeed wanted. The second would lead me to the question why you would need a SparseAdam optimizer if it is not working with sparse weights.

inputs,outputs and weights of the model are initialized as torch.sparse.FloatTensor and the only model layer is a simple sparse matrix multiplication. Where do I have to change the gradients?

Do I have to define the backward and the forward by myself with an autograd function?

Yes, this might be a possible approach and you could then use torch.sparse.addmm() to create the sparse gradients. However, a custom nn.Module might be sufficient as the operation should be differentiable and thus you wouldn’t need to implement the backward as well. This list shows the supported operations and if the resulting gradient will be sparse.