Does scatter_ support autograd?

ginobilinie · October 12, 2017, 2:31am

Given a categorical feature map, for example, mat (batchxnclassesxHxW), I’d like encode it to one-hoe format. I know one way is to using scatter_. My question is that this kind of operation is autograd supported or not?

Thanks.

result1 = torch.unsqueeze(results, 1) 
results_one_hot = Variable(torch.cuda.FloatTensor(inputSZ).zero_()) 
results_one_hot.scatter_(1,result1,1)

ginobilinie · October 12, 2017, 3:33am

Personally, I donot think this operation support autograd.

albanD · October 12, 2017, 9:30am

Hi,
It does support autograd, but can compute gradients only wrt the input tensor and not the indices (as the gradients wrt the indices does not exist).
This code snippet should make it clear:

import torch
from torch.autograd import Variable


inp = Variable(torch.zeros(10), requires_grad=True)
# You need to set requires_grad=False because scatter does not give gradient wrt to indices
indices = Variable(torch.Tensor([2, 5]).long(), requires_grad=False)

# We need this otherwise we would modify a leaf Variable inplace
inp_clone = inp.clone()
inp_clone.scatter_(0, indices, 1)

inp_clone.sum().backward()
# So the values that are not modified by scatter have a 1 gradient
# The values changed by scatter have a 0 gradient as they were overwritten by the scatter
print(inp.grad)

ginobilinie · October 12, 2017, 3:21pm

Thanks. @albanD

I’m wondering how can I know whether it support autograd or not? For example, torch.dot doesn’t support autograd, but torch.mm support. I am not sure what’s the rule to decide whether the operation can support autograd or not?

albanD · October 12, 2017, 3:24pm

Hi,

All function that works when you feed them with Variables support autograd.
torch.dot actually support autograd:

a = Variable(torch.rand(10))
out = torch.dot(a, a)
assert(isinstance(out, Variable)) # works

The output is actually a Variable with a Tensor containing one element.

ginobilinie · October 12, 2017, 3:27pm

Thanks.

But I’m quite confused, as in the official website, torch.dot returns a float type, how can I make it into a Variable without packing it?

torch.dot(tensor1, tensor2) → float

albanD · October 12, 2017, 3:30pm

Unfortunately, right now (this will change in the near future), a Variable can only contain a Tensor and not directly a number. To get around this, the function will return a Variable containing a Tensor with one element instead of a Variable containing just a number. See below:

import torch
from torch.autograd import Variable

a = torch.rand(10)

print("Operating on Tensor")
print(torch.dot(a, a))

v_a = Variable(a)

print("Operating on Variable")
print(torch.dot(v_a, v_a))

ginobilinie · October 12, 2017, 3:44pm

I see. @albanD

if we operation on a tensor (without Variable), torch.dot returns a float.
if we operation on a Variable tensor, torch.dot returns a variable tensor which contains one elements.

Then I think the office specification should better make it more clear. I like pytorch a lot, but I think some part of the official specification is not quite clear.

albanD · October 12, 2017, 3:45pm

It is currently work in progress to make Variable being able to contain both a Tensor or a python number.
When this is out, this will work as you expect.

ginobilinie · October 12, 2017, 4:35pm

Thanks a lot. Expect to it.

MasterofPLM · June 5, 2019, 1:36pm

Hello, I meet similar problem when I use Tensor.scatter_(), and I hope to get some suggestion from you. I use Tensor.scatter_() as:

cosine = F.linear(F.normalize(input), F.normalize(self.weight))
sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
phi = cosine * self.cos_m - sine * self.sin_m  # phi = cos(theta + m)
one_hot = torch.zeros(cosine.size(), device='cuda')
one_hot.scatter_(1, label.view(-1, 1).long(), 1)

output = (one_hot * phi) + ((1.0 - one_hot) * cosine) # can update without this line
# output = phi # can update
# output = cosine # can update

I want to get a value from phi and other values from cosine, and I found that the parameter self.weight can’t be updated, while self.weight.grad is not all zero but self.weight.grad.sum() is zero. The self.weight can be updated without the last line.
I also tried your advice:

inp = Variable(torch.zeros(10), requires_grad=True)
# You need to set requires_grad=False because scatter does not give gradient wrt to indices
indices = Variable(torch.Tensor([2, 5]).long(), requires_grad=False)

# We need this otherwise we would modify a leaf Variable inplace
inp_clone = inp.clone()
inp_clone.scatter_(0, indices, 1)

But this didn’t work either. This problem really confused me and I hope to get some advice.

albanD · June 6, 2019, 12:59pm

Hi,

This issue is quite old and a lot has changed since. In particular Variables have been removed !
Would you have a small code sample that shows the weights not updating?

MasterofPLM · June 8, 2019, 6:44am

My test code is:

                if phase == 'train':
                    loss.backward()
                    print(metric.weight.grad[0][:10])

                    a = metric.weight.data.clone()
                    print(a[0][:10])
                    optimizer.step()
                    b = metric.weight.data.clone()
                    print(b[0][:10])
                    equal = abs(a-b)
                    print(torch.sum(equal > 1e-7))

output is:

tensor([-1.4628e-09, -7.1153e-09, -3.9151e-09, -4.6540e-09,  6.4179e-09,
        -3.9719e-09, -5.8276e-09, -7.8870e-09, -5.6572e-09,  4.5232e-09],
       device='cuda:0') # metric.weight.grad
tensor([ 0.0327, -0.0318,  0.0316, -0.0522, -0.0627,  0.0217,  0.0545,  0.0484,
         0.0454,  0.0652], device='cuda:0') # weight before update
tensor([ 0.0323, -0.0314,  0.0312, -0.0515, -0.0618,  0.0214,  0.0537,  0.0477,
         0.0448,  0.0644], device='cuda:0') # weight after update
tensor(359384, device='cuda:0')

I found some mistakes in my test code, and my weights are updated actually. But I still get a constant loss and accuracy.