PyTorch Index Error for Boolean Tensors

I saw this unexpected error for expanded boolean tensor.

expanded_mask shape: torch.Size([1105, 10, 2]), i: 0, j: 0
aten/src/ATen/native/cuda/IndexKernel.cu:84: operator(): block: [22,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.

I have the following code:

def create_mask(sizes: torch.Tensor, mask_dim: int) -> torch.Tensor:
    batch_size = sizes.shape[0]
    mask = torch.arange(mask_dim, dtype=sizes.dtype, device=sizes.device).expand(
        batch_size, mask_dim
    ) < sizes.unsqueeze(1)
    return mask


old_mask = create_mask(sizes, 10) # old_mask has shape (1105, 10)
batch_size, mask_dim = old_mask.shape
expanded_mask = old_mask.reshape(batch_size, mask_dim, 1).expand(batch_size, mask_dim, 2)

for i in range(batch_size):
    for j in range(mask_dim):
       print(float(expanded_mask[i][j][0]))-> this will raise error about index

I am pretty sure it should not be out of bound. Not sure if this is the real error or some potential bug. This only happens when I convert other float tensors into half precision, but this should not touch anything about creating mask here. If I don’t do any conversion, then I don’t see this error which is very strange.

Based on the code snippet it looks indeed a bit fishy.
Could you post an executable code snippet, so that we could reproduce this issue and debug it, please?

Also, which Pytorch and CUDA versions are you using?