Derivative for aten::scatter_ is not implemented

alvarogutyerrez · December 12, 2022, 8:58am

I am having problems backpropagating (loss.backward()) the error when my model uses the aten::scatter_ function to compute the loss function.

First, I define my model where in the forward function, I use the aten::scatter_ function to create the product across individuals with the same id.

import torch
import torch.nn as nn
import torch.optim as optim
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.beta = nn.Linear(1, 1, bias=False)

    def forward(self, x, id):
        b = self.beta(torch.ones(1, 1))
        xb = x*b
        N = torch.unique(id).shape[0]
        scattering = torch.ones(N, 1, dtype=x.dtype)
        # Here i am computing the product across rows of the same id
        scattering_res = scattering.scatter_(0, id-1, xb, reduce='multiply' )
        loss = torch.sum(scattering_res)
        return loss

Here I create some tensors to apply the model and replicate the error message I am getting:

id = torch.tensor([1, 1, 2, 2, 3, 3, 4, 4, 5, 5],dtype=torch.int64).reshape(10, 1)
x = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=torch.float).reshape(10, 1)
y = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],dtype=torch.float).reshape(10, 1)
net = Net() 
print(net)
optimizer = optim.SGD(net.parameters(), lr=0.01)
criterion = nn.MSELoss()

# Update params
optimizer.zero_grad()
loss = net(x, id)
loss.backward()
## RuntimeError: derivative for aten::scatter_ is not implemented

Do you know how to solve this issue?. Additionally, I came across a GitHub issue (Derivative issue when using scatter_max · Issue #63 · rusty1s/pytorch_scatter · GitHub) where a similar problem was posted, but I couldn’t adjust it to solve my problem.

Below you can see the whole traceback error message.

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Untitled-2 in <cell line: 32>()
     <a href='untitled:Untitled-2?line=31'>32</a> optimizer.zero_grad()
     <a href='untitled:Untitled-2?line=32'>33</a> loss = net(x, id)
---> <a href='untitled:Untitled-2?line=33'>34</a> loss.backward()

File c:\Users\u0133260\Anaconda3\envs\pyt\lib\site-packages\torch\_tensor.py:396, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    387 if has_torch_function_unary(self):
    388     return handle_torch_function(
    389         Tensor.backward,
    390         (self,),
   (...)
    394         create_graph=create_graph,
    395         inputs=inputs)
--> 396 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

File c:\Users\u0133260\Anaconda3\envs\pyt\lib\site-packages\torch\autograd\__init__.py:173, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    168     retain_graph = create_graph
    170 # The reason we repeat same the comment below is that
    171 # some Python versions print out the first line of a multi-line function
    172 # calls in the traceback and some print out the last line
--> 173 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    175     allow_unreachable=True, accumulate_grad=True)

RuntimeError: derivative for aten::scatter_ is not implemented

crossposted at python - PyTorch - derivative for aten::scatter_ is not implemented - Stack Overflow

tom · December 12, 2022, 1:41pm

It’s the in-place variant that does not have a backward. If you really need to scatter to a “background” of ones, you could torch.scatter (the not inplace version) xb - 1 and then add 1 (`scattering_res += 1) should work.

Best regards

Thomas

fas · February 10, 2023, 9:24pm

Hi Torch team, I am getting same error message though I am using non in-place version. The documentation does not mention what input Tensor parameter is. I believe it is the same as what self under in-place version. I am using version torch 1.12*

class testScatter(torch.nn.Module):

def __init__(self):
    super(testScatter, self).__init__()
    self.nn1 = torch.nn.Linear(in_features=11, out_features=3, dtype=torch.float)
    self.nl1 = torch.nn.Tanh()
    self.nn2 = torch.nn.Linear(in_features=3, out_features=1, dtype=torch.float)
    self.nl2 = torch.nn.ReLU()
def forward(self, x, scatterIndx):
    preScatter  = self.nl1(self.nn1(x))
    scatterRes1 = torch.zeros((2,3), dtype=torch.float)
    scatterRes2 = torch.scatter(input=scatterRes1, dim=0, index=scatterIndx, src=preScatter, reduce='add')
    ret         = self.nl2(self.nn2(scatterRes2))
    return ret

RuntimeError: derivative for aten::scatter is not implemented
Any advise?
Thank you.

ptrblck · February 14, 2023, 6:12am

It works for me as seen in this minimal code snippet:

src = torch.randn(2, 5).float()
src.requires_grad_()
index = torch.tensor([[0, 1, 2, 0, 2], [0, 1, 2, 0, 1]])
input = torch.zeros(3, 5, dtype=src.dtype)
out = torch.scatter(input, dim=0, index=index, src=src)
out.mean().backward()
print(src.grad)

Could you check what the difference between this code and yours might be?

fas · February 14, 2023, 10:29pm

Thank you ptrblck for the reply. If I only add reduce=‘add’ to torch.scatter command mentioned in your reply I get " RuntimeError: derivative for aten::scatter is not implemented".
So how reduce works for non in-place scatter func? Also, if invalid for non in-place, parser does not give error. In place allows ‘add’ and ‘multiply’ as reduce values.

Thank you again.

ptrblck · February 15, 2023, 6:04am

Thanks for the clarification!
Indeed, scatter lacks Autograd support when the reduce argument is used and you should use the explicit torch.scatter_add method instead:

src = torch.randn(2, 5).float()
src.requires_grad_()
index = torch.tensor([[0, 1, 2, 0, 2], [0, 1, 2, 0, 1]])
input = torch.zeros(3, 5, dtype=src.dtype)
ref = torch.scatter(input, dim=0, index=index, src=src, reduce="add")
out = torch.scatter_add(input, dim=0, index=index, src=src)
print((ref - out).abs().max())
# tensor(0., grad_fn=<MaxBackward1>)
out.mean().backward()
print(src.grad)

which works fine.
This issue is also discussed here and the reduce argument will be deprecated soon.

fas · February 16, 2023, 4:50pm

Thank you. Appreciate it.