Partially reset a Variable, in-place vs. new Variable

Atcold · March 7, 2017, 3:04pm

I would like to partially reset a Variable, for a specific batch index.
At the moment I am using an in-place operation, which works “fine”.

def selective_zero(s, new):
    for b, reset in enumerate(new):
        if reset:
            for state_layer in s:
                state_layer.data[b].zero_()

selective_zero(state, y[t + 1] != y[t])

In order to complete this, I was thinking to register a hook in order to zero the gradient correspondingly as well.

I am now thinking whether returning a new state Variable multiplied with an appropriate mask, instead, would have the combined effect I am actually after. This, though, would involve a larger amount of computations. Hmm…

Something about this:

for state_layer in s:
    state_layer.data[b].zero_()
    state_layer.register_hook(lambda grad: grad[b].zero_())

albanD · March 7, 2017, 3:23pm

Are you sure you want to do state_layer.data[b].zero_() and not state_layer[b].zero_() ? Because in the first case, the operation is not registered to autograd and can have unexpected behaviour.

If what you want is to reset the content of the tensor to use it again independently of how you were using it before, you should repack it in a new Variable otherwise it will still have the history of the previous usage.

Atcold · March 7, 2017, 3:26pm

That’s why I am registering a backward hook, to kill the history for a given batch index.
Repackaging the whole Variable would kill all the history, which is unacceptable.
I am now wondering whether

state_layer.data[b].zero_()
state_layer.register_hook(lambda grad: grad[b].zero_())

is equivalent to your

state_layer[b].zero_()

albanD · March 7, 2017, 3:36pm

If you have the indices of the layers you want to reset (indices as a LongTensor), I think the cleanest way would be state_layer.index_fill_(0, indices, 0) . That would remove one for loop.
I guess if state is a python list, you cannot avoid this list. If state is actually a Variable, you can do state.index_fill_(1, indices, 0) which would be the most efficient I think.

Whether or not the two formulation are equivalent, I don’t know enough to say yes for sure, @apaszke would have to confirm.

Atcold · March 7, 2017, 3:38pm

Yes, state is a list (each layer has its own dimensionality).
About the equivalence, I’m trying now some basic tests… perhaps I will get it by my own.

Edit: oh my… I misread your post. Yes, indeed state_layer.index_fill_(0, indices, 0) is what I was after. And new.nonzero() is a LongTensor too. Sorry… All it’s needed is to encapsulate it into a Variable.

albanD · March 7, 2017, 3:39pm

From my (personal) point of view, I would advise against playing with the .data of the Variable by hand (your hack may break in the future), especially in this case where you have an autograd function that does what you want.

Atcold · March 7, 2017, 3:51pm

Hmm, Variable does not support fill_() or zero_() methods… I need a mask, which I don’t really want… but which performs the two steps my hack involve.
So, is there now a workout about this selective zeroing?

OK, making progress:

In [22]: c
Out[22]: 
Variable containing:
 0.8491
 0.1877
 0.1560
 0.5188
 0.7464
[torch.FloatTensor of size 5]

In [23]: set(dir(c)) - set(dir(c.data))
Out[23]: 
{'__getattr__',
 '__rpow__',
 '__setstate__',
 '_add',
 '_addcop',
 '_backward_hooks',
 '_blas',
 '_creator',
 '_do_backward',
 '_execution_engine',
 '_fallthrough_methods',
 '_get_type',
 '_grad',
 '_static_blas',
 '_sub',
 '_version',
 'backward',
 'creator',
 'data',
 'detach',
 'detach_',
 'grad',
 'index_add',
 'index_copy',
 'index_fill',
 'masked_copy',
 'masked_fill',
 'output_nr',
 'register_hook',
 'reinforce',
 'requires_grad',
 'resize',
 'resize_as',
 'scatter',
 'volatile'}

albanD · March 7, 2017, 4:02pm

This will do the fill for one dimension: myvar[:] = 0 ?

Atcold · March 7, 2017, 4:07pm

>>> a = V(torch.rand(5), requires_grad=True)
Variable containing:
 0.8491
 0.1877
 0.1560
 0.5188
 0.7464
[torch.FloatTensor of size 5]

>>> c = a[2] * 0
Variable containing:
 0
[torch.FloatTensor of size 1]

I need to end up with the same size…
I’ve also tried

>>> c = a.index_fill(0, 3, 0)
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-25-ad8f493be136> in <module>()
----> 1 c = a.index_fill(0, 3, 0)

/home/atcold/anaconda3/lib/python3.5/site-packages/torch/autograd/variable.py in index_fill(self, dim, index, value)
    627 
    628     def index_fill(self, dim, index, value):
--> 629         return IndexFill(dim, value)(self, index)
    630 
    631     def index_fill_(self, dim, index, value):

RuntimeError: expected a Variable argument, but got int

The only think that worked is masking, but it’s not acceptable…

In [12]: mask = V(torch.Tensor([1, 1, 1, 0, 1]))

In [13]: b = a * mask

In [14]: b
Out[14]: 
Variable containing:
 0.8491
 0.1877
 0.1560
 0.0000
 0.7464
[torch.FloatTensor of size 5]

In [15]: m = b.mean()

In [16]: m
Out[16]: 
Variable containing:
 0.3879
[torch.FloatTensor of size 1]

In [17]: m.backward()

In [18]: a.grad
Out[18]: 
Variable containing:
 0.2000
 0.2000
 0.2000
 0.0000
 0.2000
[torch.FloatTensor of size 5]

albanD · March 7, 2017, 4:17pm

You Need to give it indices as LongTensor.
See example below:

import torch
from torch.autograd import Variable

a = Variable(torch.rand(5, 3), requires_grad=True)
a = a.clone() # Otherwise we change inplace a leaf Variable
print(a)

ind = Variable(torch.LongTensor([3]))
a.index_fill_(0, ind, 0)

print(a)

a[1, :] = 0

print(a)

Atcold · March 7, 2017, 4:24pm

Love you

Just a correction, "You need to give it indices as [a Variable containing a] LongTensor".