masked_fill_
seems to behave differently on cpu and gpu. It also produces an output different from its out-of-place counterpart masked_fill
on cpu. Here is a minimum working example to demonstrate the behavior on cpu:
>>> import torch
>>> torch.__version__
'1.5.0'
>>> tensor_cpu = torch.LongTensor([[0], [1]]).expand(2, 4)
>>> tensor_cpu
tensor([[0, 0, 0, 0],
[1, 1, 1, 1]])
>>> mask_cpu = torch.BoolTensor(
... [[False, True, False, False],
... [False, False, False, False]]
... )
>>> mask_cpu
tensor([[False, True, False, False],
[False, False, False, False]])
>>> tensor_cpu.masked_fill(mask_cpu, 3) # expected behavior
tensor([[0, 3, 0, 0],
[1, 1, 1, 1]])
>>> tensor_cpu.masked_fill_(mask_cpu, 3) # unexpected behavior?
tensor([[3, 3, 3, 3],
[1, 1, 1, 1]])
and here is the gpu equivalent:
>>> tensor_cuda = torch.LongTensor([[0], [1]]).expand(2, 4).to('cuda')
>>> tensor_cuda
tensor([[0, 0, 0, 0],
[1, 1, 1, 1]], device='cuda:0')
>>> mask_cuda = torch.BoolTensor(
... [[False, True, False, False],
... [False, False, False, False]]
... ).to('cuda')
>>> mask_cuda
tensor([[False, True, False, False],
[False, False, False, False]], device='cuda:0')
>>> tensor_cuda.masked_fill(mask_cuda, 3) # expected behavior
tensor([[0, 3, 0, 0],
[1, 1, 1, 1]], device='cuda:0')
>>> tensor_cuda.masked_fill_(mask_cuda, 3) # expected behavior
tensor([[0, 3, 0, 0],
[1, 1, 1, 1]], device='cuda:0')
Apparently, the in-place masked_fill_
gives a different output on the cpu. I’m guessing this has something to do with expand
while initializing tensor_cpu
; if I do not use expand
, I get the expected output:
>>> other_cpu = torch.LongTensor(
... [[0, 0, 0, 0],
... [1, 1, 1, 1]]
... )
>>> other_cpu
tensor([[0, 0, 0, 0],
[1, 1, 1, 1]])
>>> other_cpu.masked_fill_(mask_cpu, 3) # expected behavior
tensor([[0, 3, 0, 0],
[1, 1, 1, 1]])
Am I missing something here?