I want to introduce a sparse matrix into my neural network. After initialization, I hope this sparse matrix can do autograd and update the value only on these non-zero positions. But I havn’t find a good way to do this and I have serval questions about this. Here is a testing code to show this:
import torch
import torch.nn as nn
from torch import optim
class MyNet(nn.Module):
def __init__(self, device):
super(MyNet, self).__init__()
idx = [[10], [10]]
i = torch.LongTensor(idx)
value = [5]
v = torch.FloatTensor(value)
self.At = torch.sparse_coo_tensor(i, v, torch.Size([10000, 10000]), requires_grad=True, device=device)
def forward(self, x):
vector = torch.sparse.mm(self.At, x)
pre = torch.sum(vector, 0)
return pre
if __name__ == '__main__':
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
input = torch.rand(10000, 1).to(device)
output = torch.rand(1).to(device)
net = MyNet(device)
net.to(device=device)
print(net.At)
optimizer = optim.SGD([net.At], lr=0.001)
scheduler = optim.lr_scheduler.ExponentialLR(optimizer, gamma=0.9)
criterion = nn.MSELoss()
epochs = 10
for epoch in range(epochs):
net.train()
pre = net(input)
loss = criterion(pre, output)
optimizer.zero_grad()
loss.backward()
optimizer.step()
net.At = net.At.coalesce()
print(net.At)
(1) For the optimizer, it seems only SGD can work.
If I use Adama, it will just say that
RuntimeError: Adam does not support sparse gradients, please consider SparseAdam instead
But when I use SparseAdam, it will give me following error
Traceback (most recent call last):
File "D:/Pytorch-UNet/test.py", line 44, in <module>
optimizer.step()
File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\optim\lr_scheduler.py", line 67, in wrapper
return wrapped(*args, **kwargs)
File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\autograd\grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "D:\Anaconda3\envs\pytorch\lib\site-packages\torch\optim\sparse_adam.py", line 87, in step
old_exp_avg_values = exp_avg.sparse_mask(grad)._values()
RuntimeError: sparse tensors do not have is_contiguous
So I’m confused is SGD the only way to do sprase matrix update right now in Pytorch? If so, how/when should I use SparseAdam also?
(2) At the very begining, there is only one position [10,10] in the sparse matirx has non-zero value 5, but once I call optimizer.step(), it will not directly update this value but generate another value with the same index. This why I also call .coalesce() at the end of each iteration. I’m not sure is it correct to call coalesce() here to get the updated results? Besides, is there any way I can avoid this duplication and directly update the value? Since in my real projection this sparce matrix still get many non-zero data, this duplication can cost much memory.