Derivative for aten::scatter_ is not implemented

I am having problems backpropagating (loss.backward()) the error when my model uses the aten::scatter_ function to compute the loss function.

First, I define my model where in the forward function, I use the aten::scatter_ function to create the product across individuals with the same id.

import torch
import torch.nn as nn
import torch.optim as optim
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        self.beta = nn.Linear(1, 1, bias=False)

    def forward(self, x, id):
        b = self.beta(torch.ones(1, 1))
        xb = x*b
        N = torch.unique(id).shape[0]
        scattering = torch.ones(N, 1, dtype=x.dtype)
        # Here i am computing the product across rows of the same id
        scattering_res = scattering.scatter_(0, id-1, xb, reduce='multiply' )
        loss = torch.sum(scattering_res)
        return loss

Here I create some tensors to apply the model and replicate the error message I am getting:

id = torch.tensor([1, 1, 2, 2, 3, 3, 4, 4, 5, 5],dtype=torch.int64).reshape(10, 1)
x = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10], dtype=torch.float).reshape(10, 1)
y = torch.tensor([1, 2, 3, 4, 5, 6, 7, 8, 9, 10],dtype=torch.float).reshape(10, 1)
net = Net() 
optimizer = optim.SGD(net.parameters(), lr=0.01)
criterion = nn.MSELoss()

# Update params
loss = net(x, id)
## RuntimeError: derivative for aten::scatter_ is not implemented

Do you know how to solve this issue?. Additionally, I came across a GitHub issue (Derivative issue when using scatter_max · Issue #63 · rusty1s/pytorch_scatter · GitHub) where a similar problem was posted, but I couldn’t adjust it to solve my problem.

Below you can see the whole traceback error message.

RuntimeError                              Traceback (most recent call last)
Untitled-2 in <cell line: 32>()
     <a href='untitled:Untitled-2?line=31'>32</a> optimizer.zero_grad()
     <a href='untitled:Untitled-2?line=32'>33</a> loss = net(x, id)
---> <a href='untitled:Untitled-2?line=33'>34</a> loss.backward()

File c:\Users\u0133260\Anaconda3\envs\pyt\lib\site-packages\torch\, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    387 if has_torch_function_unary(self):
    388     return handle_torch_function(
    389         Tensor.backward,
    390         (self,),
    394         create_graph=create_graph,
    395         inputs=inputs)
--> 396 torch.autograd.backward(self, gradient, retain_graph, create_graph, inputs=inputs)

File c:\Users\u0133260\Anaconda3\envs\pyt\lib\site-packages\torch\autograd\, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    168     retain_graph = create_graph
    170 # The reason we repeat same the comment below is that
    171 # some Python versions print out the first line of a multi-line function
    172 # calls in the traceback and some print out the last line
--> 173 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    174     tensors, grad_tensors_, retain_graph, create_graph, inputs,
    175     allow_unreachable=True, accumulate_grad=True)

RuntimeError: derivative for aten::scatter_ is not implemented

crossposted at python - PyTorch - derivative for aten::scatter_ is not implemented - Stack Overflow

It’s the in-place variant that does not have a backward. If you really need to scatter to a “background” of ones, you could torch.scatter (the not inplace version) xb - 1 and then add 1 (`scattering_res += 1) should work.

Best regards


Hi Torch team, I am getting same error message though I am using non in-place version. The documentation does not mention what input Tensor parameter is. I believe it is the same as what self under in-place version. I am using version torch 1.12*

class testScatter(torch.nn.Module):

def __init__(self):
    super(testScatter, self).__init__()
    self.nn1 = torch.nn.Linear(in_features=11, out_features=3, dtype=torch.float)
    self.nl1 = torch.nn.Tanh()
    self.nn2 = torch.nn.Linear(in_features=3, out_features=1, dtype=torch.float)
    self.nl2 = torch.nn.ReLU()
def forward(self, x, scatterIndx):
    preScatter  = self.nl1(self.nn1(x))
    scatterRes1 = torch.zeros((2,3), dtype=torch.float)
    scatterRes2 = torch.scatter(input=scatterRes1, dim=0, index=scatterIndx, src=preScatter, reduce='add')
    ret         = self.nl2(self.nn2(scatterRes2))
    return ret

RuntimeError: derivative for aten::scatter is not implemented
Any advise?
Thank you.

It works for me as seen in this minimal code snippet:

src = torch.randn(2, 5).float()
index = torch.tensor([[0, 1, 2, 0, 2], [0, 1, 2, 0, 1]])
input = torch.zeros(3, 5, dtype=src.dtype)
out = torch.scatter(input, dim=0, index=index, src=src)

Could you check what the difference between this code and yours might be?

Thank you ptrblck for the reply. If I only add reduce=‘add’ to torch.scatter command mentioned in your reply I get " RuntimeError: derivative for aten::scatter is not implemented".
So how reduce works for non in-place scatter func? Also, if invalid for non in-place, parser does not give error. In place allows ‘add’ and ‘multiply’ as reduce values.

Thank you again.

Thanks for the clarification!
Indeed, scatter lacks Autograd support when the reduce argument is used and you should use the explicit torch.scatter_add method instead:

src = torch.randn(2, 5).float()
index = torch.tensor([[0, 1, 2, 0, 2], [0, 1, 2, 0, 1]])
input = torch.zeros(3, 5, dtype=src.dtype)
ref = torch.scatter(input, dim=0, index=index, src=src, reduce="add")
out = torch.scatter_add(input, dim=0, index=index, src=src)
print((ref - out).abs().max())
# tensor(0., grad_fn=<MaxBackward1>)

which works fine.
This issue is also discussed here and the reduce argument will be deprecated soon.


Thank you. Appreciate it.