Error in torch.where: "Trying to create tensor with negative dimension -2133151913: [-2133151913]"

Algomorph · March 22, 2021, 12:54pm

Hi, all!

Trying to circumvent another apparent bug in PyTorch.
This might be related to this post (which was never fully resolved).

Relevant code fragment (with print statements for debugging, mask is also made for debugging):

mask = torch.where(graph_clusters_i == cluster_id, torch.ones_like(graph_clusters_i), torch.zeros_like(graph_clusters_i))
index_equals_cluster_id = torch.where(graph_clusters_i == cluster_id)
print("mask.sum()......................: ", mask.sum())
print("type(graph_clusters_i)..........: ", type(graph_clusters_i))
print("graph_clusters_i.shape..........: ", graph_clusters_i.shape)
print("graph_clusters_i.device.........: ", graph_clusters_i.device)
print("type(cluster_id)................: ", type(cluster_id))
print("cluster_id......................: ", cluster_id)
print("len(index_equals_cluster_id)....: ", len(index_equals_cluster_id))
print("type(index_equals_cluster_id)...: ", type(index_equals_cluster_id))
print("type(index_equals_cluster_id[0]): ", type(index_equals_cluster_id[0]))
print("index_equals_cluster_id[0].shape: ", index_equals_cluster_id[0].shape)

x = index_equals_cluster_id[0].tolist()  # <-- error here!

The output is:

mask.sum()…: tensor(939, device=‘cuda:0’)
type(graph_clusters_i)…: <class ‘torch.Tensor’>
graph_clusters_i.shape…: torch.Size([1165, 1])
graph_clusters_i.device…: cuda:0
type(cluster_id)…: <class ‘int’>
cluster_id…: 0
len(index_equals_cluster_id)…: 2
type(index_equals_cluster_id)…: <class ‘tuple’>
type(index_equals_cluster_id[0]): <class ‘torch.Tensor’>
index_equals_cluster_id[0].shape: torch.Size([-2133151913])
Traceback (most recent call last):
(A few lines omitted for brevity>)
File “/home/algomorph/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/algomorph/Workbench/NeuralTracking/model/model.py”, line 437, in forward
x = index_equals_cluster_id[0].tolist() # ← error here!
RuntimeError: Trying to create tensor with negative dimension -2133151913: [-2133151913]

Any idea about how to fix or avoid this?

Algomorph · March 22, 2021, 1:04pm

The answer to “how to circumvent” was staring me right in the face: just use torch.nonzero on the mask. Here’s the unelegant/dirty solution, which involves an extra sum operation that shouldn’t really be there to circumvent the bug:

mask = torch.where(graph_clusters_i == cluster_id, torch.ones_like(graph_clusters_i), torch.zeros_like(graph_clusters_i))
dummy = mask.sum()  # TODO: figure out why there is some kind of lazy-evaluation bug in PyTorch without this line...
del dummy
x = torch.nonzero(mask, as_tuple=True)[0].tolist()
```

Algomorph · March 22, 2021, 1:50pm

Finally, I got PyTorch itself to admit there is some kind of bug here. Not sure how to report it though, because it’s a hell of a lot of code to get down to a minimal reproducible example.

The code:

index_equals_cluster_id = torch.where(graph_clusters_i == cluster_id)
dummy = index_equals_cluster_id[0].sum()  # This triggers some kind of bug

Output:

Traceback (most recent call last):
(Some intermediate calls omitted for brevity).
File “/home/algomorph/.local/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/algomorph/Workbench/NeuralTracking/model/model.py”, line 442, in forward
dummy = index_equals_cluster_id[0].sum() # This triggers some kind of bug
RuntimeError: iter.numel() > 0 && iter.ntensors() - iter.noutputs() == 1 && iter.noutputs() >= 1 INTERNAL ASSERT FAILED at “/pytorch/aten/src/ATen/native/cuda/Reduce.cuh”:896, please report a bug to PyTorch.