How to update a sparse matrix within a model?

I want to introduce a sparse matrix into my neural network. After initialization, I hope this sparse matrix can do autograd and update the value only on these non-zero positions. But I havn’t find a good way to do this and I have serval questions about this. Here is a testing code to show this:

import torch
import torch.nn as nn
from torch import optim

class MyNet(nn.Module):
    def __init__(self, device):
        super(MyNet, self).__init__()
        idx = [[10], [10]]
        i = torch.LongTensor(idx)
        value = [5]
        v = torch.FloatTensor(value)
        self.At = torch.sparse_coo_tensor(i, v, torch.Size([10000, 10000]), requires_grad=True, device=device)

    def forward(self, x):
        vector = torch.sparse.mm(self.At, x)
        pre = torch.sum(vector, 0)
        return pre

if __name__ == '__main__':
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    input = torch.rand(10000, 1).to(device)
    output = torch.rand(1).to(device)

    net = MyNet(device)
    net.to(device=device)

    print(net.At)

    optimizer = optim.SGD([net.At], lr=0.001)
    scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
    criterion = nn.MSELoss()

    epochs = 10
    for epoch in range(epochs):
        net.train()

        pre = net(input)
        loss = criterion(pre, output)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        net.At = net.At.coalesce()

    print(net.At)

(1) For the optimizer, it seems only SGD can work.
If I use Adama, it will just say that

RuntimeError: Adam does not support sparse gradients, please consider SparseAdam instead

But when I use SparseAdam, it will give me following error

Traceback (most recent call last):
  File "D:/Pytorch-UNet/test.py", line 44, in <module>
    optimizer.step()
  File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\optim\lr_scheduler.py", line 67, in wrapper
    return wrapped(*args, **kwargs)
  File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\optim\sparse_adam.py", line 87, in step
    old_exp_avg_values = exp_avg.sparse_mask(grad)._values()
RuntimeError: sparse tensors do not have is_contiguous

So I’m confused is SGD the only way to do sprase matrix update right now in Pytorch? If so, how/when should I use SparseAdam also?

(2) At the very begining, there is only one position [10,10] in the sparse matirx has non-zero value 5, but once I call optimizer.step(), it will not directly update this value but generate another value with the same index. This why I also call .coalesce() at the end of each iteration. I’m not sure is it correct to call coalesce() here to get the updated results? Besides, is there any way I can avoid this duplication and directly update the value? Since in my real projection this sparce matrix still get many non-zero data, this duplication can cost much memory.