Does scatter_ support autograd?

(No Name) #1

Given a categorical feature map, for example, mat (batchxnclassesxHxW), I’d like encode it to one-hoe format. I know one way is to using scatter_. My question is that this kind of operation is autograd supported or not?

Thanks.

result1 = torch.unsqueeze(results, 1) 
results_one_hot = Variable(torch.cuda.FloatTensor(inputSZ).zero_()) 
results_one_hot.scatter_(1,result1,1)
1 Like
(No Name) #2

Personally, I donot think this operation support autograd.

(Alban D) #3

Hi,
It does support autograd, but can compute gradients only wrt the input tensor and not the indices (as the gradients wrt the indices does not exist).
This code snippet should make it clear:

import torch
from torch.autograd import Variable


inp = Variable(torch.zeros(10), requires_grad=True)
# You need to set requires_grad=False because scatter does not give gradient wrt to indices
indices = Variable(torch.Tensor([2, 5]).long(), requires_grad=False)

# We need this otherwise we would modify a leaf Variable inplace
inp_clone = inp.clone()
inp_clone.scatter_(0, indices, 1)

inp_clone.sum().backward()
# So the values that are not modified by scatter have a 1 gradient
# The values changed by scatter have a 0 gradient as they were overwritten by the scatter
print(inp.grad)
(No Name) #4

Thanks. @albanD

I’m wondering how can I know whether it support autograd or not? For example, torch.dot doesn’t support autograd, but torch.mm support. I am not sure what’s the rule to decide whether the operation can support autograd or not?

(Alban D) #5

Hi,

All function that works when you feed them with Variables support autograd.
torch.dot actually support autograd:

a = Variable(torch.rand(10))
out = torch.dot(a, a)
assert(isinstance(out, Variable)) # works

The output is actually a Variable with a Tensor containing one element.

1 Like
(No Name) #6

Thanks.

But I’m quite confused, as in the official website, torch.dot returns a float type, how can I make it into a Variable without packing it?

torch.dot(tensor1, tensor2) → float

(Alban D) #7

Unfortunately, right now (this will change in the near future), a Variable can only contain a Tensor and not directly a number. To get around this, the function will return a Variable containing a Tensor with one element instead of a Variable containing just a number. See below:

import torch
from torch.autograd import Variable

a = torch.rand(10)

print("Operating on Tensor")
print(torch.dot(a, a))

v_a = Variable(a)

print("Operating on Variable")
print(torch.dot(v_a, v_a))
1 Like
(No Name) #8

I see. @albanD

if we operation on a tensor (without Variable), torch.dot returns a float.
if we operation on a Variable tensor, torch.dot returns a variable tensor which contains one elements.

Then I think the office specification should better make it more clear. I like pytorch a lot, but I think some part of the official specification is not quite clear.

(Alban D) #9

It is currently work in progress to make Variable being able to contain both a Tensor or a python number.
When this is out, this will work as you expect.

(No Name) #10

Thanks a lot. Expect to it.

#11

Hello, I meet similar problem when I use Tensor.scatter_(), and I hope to get some suggestion from you. I use Tensor.scatter_() as:

cosine = F.linear(F.normalize(input), F.normalize(self.weight))
sine = torch.sqrt(1.0 - torch.pow(cosine, 2))
phi = cosine * self.cos_m - sine * self.sin_m  # phi = cos(theta + m)
one_hot = torch.zeros(cosine.size(), device='cuda')
one_hot.scatter_(1, label.view(-1, 1).long(), 1)

output = (one_hot * phi) + ((1.0 - one_hot) * cosine) # can update without this line
# output = phi # can update
# output = cosine # can update

I want to get a value from phi and other values from cosine, and I found that the parameter self.weight can’t be updated, while self.weight.grad is not all zero but self.weight.grad.sum() is zero. The self.weight can be updated without the last line.
I also tried your advice:

inp = Variable(torch.zeros(10), requires_grad=True)
# You need to set requires_grad=False because scatter does not give gradient wrt to indices
indices = Variable(torch.Tensor([2, 5]).long(), requires_grad=False)

# We need this otherwise we would modify a leaf Variable inplace
inp_clone = inp.clone()
inp_clone.scatter_(0, indices, 1)

But this didn’t work either. This problem really confused me and I hope to get some advice.

(Alban D) #12

Hi,

This issue is quite old and a lot has changed since. In particular Variables have been removed !
Would you have a small code sample that shows the weights not updating?

#15

My test code is:

                if phase == 'train':
                    loss.backward()
                    print(metric.weight.grad[0][:10])

                    a = metric.weight.data.clone()
                    print(a[0][:10])
                    optimizer.step()
                    b = metric.weight.data.clone()
                    print(b[0][:10])
                    equal = abs(a-b)
                    print(torch.sum(equal > 1e-7))  

output is:

tensor([-1.4628e-09, -7.1153e-09, -3.9151e-09, -4.6540e-09,  6.4179e-09,
        -3.9719e-09, -5.8276e-09, -7.8870e-09, -5.6572e-09,  4.5232e-09],
       device='cuda:0') # metric.weight.grad
tensor([ 0.0327, -0.0318,  0.0316, -0.0522, -0.0627,  0.0217,  0.0545,  0.0484,
         0.0454,  0.0652], device='cuda:0') # weight before update
tensor([ 0.0323, -0.0314,  0.0312, -0.0515, -0.0618,  0.0214,  0.0537,  0.0477,
         0.0448,  0.0644], device='cuda:0') # weight after update
tensor(359384, device='cuda:0')

I found some mistakes in my test code, and my weights are updated actually. But I still get a constant loss and accuracy.