_sparse_mask attribute in sparse gradients computed with custom torch.autograd.Function

I have a custom torch.autograd.Function that outputs a torch.sparse.FloatTensor gradient in the backward pass and I’d like to use built-in optimizers such as SparseAdam and Adagrad to optimize my variables using the sparse gradients. SGD has no problem using my custom torch.sparse.FloatTensor gradient, but SparseAdam and Adagrad expect a _sparse_mask attribute. I get this error when running optimizer.step(): AttributeError: 'torch.sparse.FloatTensor' object has no attribute '_sparse_mask. I couldn’t find much information about _sparse_mask in the documentation. What’s the proper way to output custom sparse tensors to work with sparse optimizers?