"invalid shape dimension <huge negative number>" on tensor masking operation

Algomorph · March 12, 2021, 4:28pm

I’m (mostly) someone-else’s code in combination with PWC-Net with (their) existing inputs. I’m getting an error in one of the cases (not every case). Here is all the code that I think is relevant:

    print("Shape of mask is:", mask.shape)
    s = (mask > 0.999).shape
    print("Shape of (mask > 0.999) is:", s)
    mask[mask > 0.999] = 1.0  # <-- Error occurs here!

Here, mask is a PyTorch tensor. The output is:

Shape of mask is: torch.Size([1, 1, 28, 40])
Shape of (mask > 0.999) is: torch.Size([1, 1, 28, 40])
Traceback (most recent call last):

File “/home/algomorph/Workbench/NeuralTracking/model/pwcnet.py”, line 41, in Backward
mask[mask > 0.999] = 1.0 # ← Error occurs here!
RuntimeError: invalid shape dimension -1096498240

The only clue I found online is the C++ header file that raises this error in PyTorch.

Any clues as to what might be happening here?

Aside: I’m using PyTorch 1.7.1 compiled from source, i.e. version ‘1.7.0a0+57bffc3’ [also tested with the official 1.8.1 CUDA 11.1 release for pip]
Another aside: mask is of type torch.cuda.FloatTensor and is located on the cuda:0 device.

Here is the mask tensor serialized via torch.save(…) . When I load it in a separate python shell and try to perform the operation on it, the error does not occur.

Algomorph · March 12, 2021, 4:54pm

Big thanks to Ahmed Taha, who (in another thread / forum) suggested to replace the
mask[mask > 0.999] = 1.0
with
mask = torch.where(mask > 0.999, torch.ones_like(mask), mask)

This solution didn’t produce any errors so far, so I think it is valid. Why the original code produced an error for some cases remains a mystery to me.

Algomorph · March 12, 2021, 6:56pm

My intuitions is: this is a bug in PyTorch. Consider this code:

        object_estimate = self.moduleSix(pyramid_first[-1], pyramid_second[-1], None)
        flow6 = object_estimate['flow']

        mask2 = torch.load("mask.pt")
        mask2[mask2 > 0.999] = 1.0 # <-- No error here...

        object_estimate = self.moduleFiv(pyramid_first[-2], pyramid_second[-2], object_estimate)
        flow5 = object_estimate['flow']

        mask2 = torch.load("mask.pt")
        mask2[mask2 > 0.999] = 1.0 # <-- Error here...

mask2, in both cases, is just a copy of the mask tensor I provide & talk about above loaded from disk. moduleFiv and moduleSix are Decoder objects based on torch.nn.Module, which vary only by in_channels counts to their convolutional layers.

ShirAmir · August 3, 2021, 8:37pm

@Algomorph I too encountered this issue when loading a tensor from the disk and trying to filter it with a mask.
Your solution helped me as well, but it is indeed confusing why…

If anyone has any insights pls let us know.

anonymous_anonymous · November 10, 2021, 6:48am

I met this error as well… I used the padding mask tensor to filter out non-padding result: tensor[padding_mask], after I changed the padding_mask device to cuda as well, the error disappeared, but the final result seems wrong.

anonymous_anonymous · November 10, 2021, 7:06am

In another code, I just do the same operation, but no error occurred.

rdesh26 · March 15, 2022, 4:47pm

I ran into the same issue with indexing recently, and had to fix it using torch.where(). Is there a relevant issue created on the repo to track this?