Let input is a tensor, dim is a dimension to mask, and mask is a ByteTensor. And, the following statement is true:
len(mask.size())==1 and input.size(dim)==mask.size(0)
I wrote this simple function for this task,
def masked_index(input, dim, mask):
assert(len(mask.size())==1 and input.size(dim)==mask.size(0))
sizes = input.size()
for i in xrange(len(sizes)-1):
mask = mask.unsqueeze(1)
mask = mask.expand_as(input)
return input[mask].view(-1, sizes[1], sizes[2])
, however, I don’t know if there is a better solution for this.
The gist is that sometimes we want to select indices on a dimension using a mask (ByteTensor), which usually comes from comparison ops (e.g. torch.eq()), instead of indices (LongTensor).
I used thoses expression and encountered some problem.
my mask is a Variable, when I use mask.nonzero() it errors as Variable object has no attribute 'nonzero';
when I use mask.data.nonzero(),it shows {RuntimeError}invalid argument 3: expecting vector of indices at /opt/conda/conda-bld/pytorch_1502006348621/work/torch/lib/THC/generic/THCTensorIndex.cu:405
so I used input.index_select(dim, mask.data.nonzero().suqeeze(1)), but It throwed another error: {RuntimeError}save_for_backward can only save input or output tensors, but argument 0 doesn't satisfy this condition, Here my input is a variable .
Indeed, this works well with a Tensor but Variable hasn’t got the nonzero method.
To do that, you need to get the tensor with mask.data, then apply the nonzero method, and then convert back to a Variable since it expects a Variable as input.
Also, it’s good to note that when you get the error below, it is often due to the fact that you input a Tensor in places where you should input a Variable:
{RuntimeError} save_for_backward can only save input or output tensors, but argument 0 doesn't satisfy this condition,
I just did that exactly as what you’ve said, it works!
But there is another question came with it, would this conversion: Variable -> Tensor -> Variable destroy the chain which conduct the gradient of mask to its creator? Is there some specific cases that the mask is part of the network and this operation should apply gradient to the mask?
Yes, indeed, it destroys the chain. I guess in many contexts this is not a problem.
I think a context where you want to optimize on the mask is more likely to be some kind of RL problem where you have a discrete action space.
Variable.nonzero is not implemented yet as discussed in the link below. However, if it was, I wonder how the backward would be implemented, since it outputs indices…