[solved] Apply a `nn.Linear()` layer only to a part of a Variable

I wish to apply a nn.Linear() layer only to a part of a Variable. How to go about it?

As an example consider a Variable feat of type torch.cuda.FloatTensor of size 256x20x51, stored in a batch–first representation. I wish to apply the linear layer on just one of 20 (1x51 vectors) for each batch_number.

So for example if the feat variable is something like:

Variable containing:
(0 ,.,.) = 
   0   1   2   3
   4   5   6   7
   8   9  10  11

(1 ,.,.) = 
  12  13  14  15
  16  17  18  19
  20  21  22  23
[torch.FloatTensor of size 2x3x4]

I need to convert this into

Variable containing:
(0 ,.,.) = 
   0   1   2   3
   4   5   6   7
   fc([8   9  10  11])

(1 ,.,.) = 
  12  13  14  15
  fc([16  17  18  19])
  20  21  22  23
[torch.FloatTensor of size 2x3x4]

where idx = [2, 1], corresponding to the rows I wish to apply the linear layer to in each batch_no; and fc = nn.Linear(4, 4).

Also, is the following method correct?

for each batch_i and corresponding idx_i
feat.data[batch_i][idx_i] = fc(feat[batch_i][idx_i]).data.

Does it mess up the autograd procedure in any way?

Using .data always messes up autograd because it doesn’t track history. Since you are modifying Variables that requires grad in-place, something like below will always not work:

feat[2, 1] = fc(feat[2, 1])

because the original feat is definitely needed to calculate gradient. But the following should work:

featc = feat.clone()
featc[2, 1] = fc(feat[2, 1])

then use featc in the following computation.

Yes, I tried using that. But it results in the following error:

input_x[1][1] = fc(input_x[1][1])
*** RuntimeError: in-place operations can be only used on variables that don't share storage with any other variables, but detected that there are 20 objects sharing it

In general, consider the following code:

(Pdb) feat = torch.autograd.Variable(torch.Tensor([[1, 2, 3], [4, 5, 6]]))
(Pdb) feat
Variable containing:
 1  2  3
 4  5  6
[torch.FloatTensor of size 2x3]

(Pdb) feat[0][0] = 9
*** RuntimeError: in-place operations can be only used on variables that don't share storage with any other variables, but detected that there are 2 objects sharing it
  • Any idea about how to get around it?
  • Also, are you sure that using .data always messes up autograd?

You are doing exactly what I said that doesn’t work…

This is bound to fail. It is first doing a slicing and getting feat[0], and then do a __setitem__. But after you did the slicing, you already get two objects sharing the same storage (feat and feat[0]), both in scope, so it doesn’t work. feat[0, 0] = 9 works as expected.

Yes, that works.
Thank you very much!

Can you give some references, to study about autograd. In particular about where the following is explained:

.data accessed the underlying tensor. History tracking is on the wrapping variable. So it should really only be used if you don’t want autograd.

Hi Simon. I had one more doubt regarding the .data access. Suppose, I have a network which outputs probabilities of type:

>>> probs
Variable containing:
-0.1406  0.3101
[torch.FloatTensor of size 1x2]

Now, I want to define the loss using the value of the elements of probs, i.e.

if probs[0, 0] > probs[0, 1]:
    # use loss function 1
else:
    # use loss function 2

Directly using this results in
*** RuntimeError: bool value of Variable objects containing non-empty torch.ByteTensor is ambiguous.

So, instead as suggested here: How to use condition flow?, we can use something like:

if probs.data[0, 0] > probs.data[0, 1]:
    # use loss function 1
else:
    # use loss function 2

So my question is, does this not interfere with autograd? Since, how is the history to be tracked to the actual values of the first and second index of probs.

  • If it does not interfere with autograd: then please tell why not, and
  • if it does interfere with autograd: what is the correct way to do this?

Note: I am using an earlier version of PyTorch, so Variable objects occurs.

Thanks.

Comparison is nondifferentiable, so you are fine.