Masked_fill_ behaves differently on cpu and gpu

masked_fill_ seems to behave differently on cpu and gpu. It also produces an output different from its out-of-place counterpart masked_fill on cpu. Here is a minimum working example to demonstrate the behavior on cpu:

>>> import torch
>>> torch.__version__
'1.5.0'
>>> tensor_cpu = torch.LongTensor([[0], [1]]).expand(2, 4)
>>> tensor_cpu
tensor([[0, 0, 0, 0],
        [1, 1, 1, 1]])
>>> mask_cpu = torch.BoolTensor(
...     [[False,  True, False, False],
...      [False, False, False, False]]
...     )
>>> mask_cpu
tensor([[False,  True, False, False],
        [False, False, False, False]])
>>> tensor_cpu.masked_fill(mask_cpu, 3)        # expected behavior
tensor([[0, 3, 0, 0],
        [1, 1, 1, 1]])
>>> tensor_cpu.masked_fill_(mask_cpu, 3)       # unexpected behavior?
tensor([[3, 3, 3, 3],
        [1, 1, 1, 1]])

and here is the gpu equivalent:

>>> tensor_cuda = torch.LongTensor([[0], [1]]).expand(2, 4).to('cuda')
>>> tensor_cuda
tensor([[0, 0, 0, 0],
        [1, 1, 1, 1]], device='cuda:0')
>>> mask_cuda = torch.BoolTensor(
...     [[False,  True, False, False],
...      [False, False, False, False]]
...     ).to('cuda')
>>> mask_cuda
tensor([[False,  True, False, False],
        [False, False, False, False]], device='cuda:0')
>>> tensor_cuda.masked_fill(mask_cuda, 3)      # expected behavior
tensor([[0, 3, 0, 0],
        [1, 1, 1, 1]], device='cuda:0')
>>> tensor_cuda.masked_fill_(mask_cuda, 3)     # expected behavior
tensor([[0, 3, 0, 0],
        [1, 1, 1, 1]], device='cuda:0')

Apparently, the in-place masked_fill_ gives a different output on the cpu. I’m guessing this has something to do with expand while initializing tensor_cpu; if I do not use expand, I get the expected output:

>>> other_cpu = torch.LongTensor(
...     [[0, 0, 0, 0],
...      [1, 1, 1, 1]]
...     )
>>> other_cpu
tensor([[0, 0, 0, 0],
        [1, 1, 1, 1]])
>>> other_cpu.masked_fill_(mask_cpu, 3)        # expected behavior
tensor([[0, 3, 0, 0],
        [1, 1, 1, 1]])

Am I missing something here?

Also tracked here.