Assigning tensor to multiple rows on GPU

SpontaneousDuck · June 17, 2022, 3:37pm

Hello!

So I recently updated to Pytorch 1.11 from 1.9 and the below code started throwing errors (also shown below) on GPU only. When using CPU, this works fine. The weird part of this is I am doing boolean indexing. Integer list indexing also throws the same error. When slicing, this works perfectly. If I modify the tensor I am using to set the rows to be the same shape as the number of rows I am assigning to, it works again. Is there a proper way to use this method to assign multiple rows in a tensor without creating an exactly matching size tensor? I assume this is supposed to work since it works on CPU.

x = torch.zeros((5,4), device=torch.device('cuda:0'))
x[[False,True,False,True,True]] = torch.tensor([1.0, 1.0, 1.0, 1.0], device=torch.device('cuda:0'), dtype=torch.float32)

RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == expandedValue.numel()INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1646755897462/work/aten/src/ATen/native/cuda/Indexing.cu":268, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor: 12 vs 4

KFrank · June 17, 2022, 5:37pm

Hi Ken!

SpontaneousDuck:

So I recently updated to Pytorch 1.11 from 1.9 and the below code started throwing errors
…

x = torch.zeros((5,4), device=torch.device('cuda:0'))
x[[False,True,False,True,True]] = torch.tensor([1.0, 1.0, 1.0, 1.0], device=torch.device('cuda:0'), dtype=torch.float32)

RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == expandedValue.numel()INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1646755897462/work/aten/src/ATen/native/cuda/Indexing.cu":268, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor: 12 vs 4

This appears to a known issue, but specifically in the context of using
“deterministic” cuda algorithms. It looks like some work-arounds are
discussed in the relevant (closed?!) github issue:

github.com/pytorch/pytorch

Deterministic integer indexing operation fails in indices size check / missing broadcast

opened 04:13PM - 17 Nov 21 UTC

closed 05:58AM - 01 Dec 21 UTC

dbalchev

module: cuda triaged module: advanced indexing module: determinism

## 🐛 Bug When I run advanced integer indexing on cuda with deterministic algo…rithms I get ``` RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Indexing.cu":250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor21 ``` This bug is similar to https://github.com/pytorch/pytorch/issues/61032, but it affects integer indexing instead of boolean indexing. ## To Reproduce Steps to reproduce the behavior: 1. Run the following script ```python import torch torch.use_deterministic_algorithms(True) x = torch.zeros(5).cuda() x[torch.tensor([1, 3]).cuda()] = 2 print(x) ``` 1. Get the error ``` Traceback (most recent call last): File "Distilled Example.py", line 6, in <module> x[torch.tensor([1, 3]).cuda()] = 2 RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Indexing.cu":250, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor21 ``` ## Expected behavior Not getting a runtime error ## Environment ``` Collecting environment information... PyTorch version: 1.10.0+cu102 Is debug build: False CUDA used to build PyTorch: 10.2 ROCM used to build PyTorch: N/A OS: Ubuntu 18.04.5 LTS (x86_64) GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0 Clang version: Could not collect CMake version: version 3.10.2 Libc version: glibc-2.26 Python version: 3.7.9 (default, Nov 15 2021, 19:06:29) [GCC 7.5.0] (64-bit runtime) Python platform: Linux-5.4.141-67.229.amzn2.x86_64-x86_64-with-debian-buster-sid Is CUDA available: True CUDA runtime version: 10.2.89 GPU models and configuration: GPU 0: Tesla K80 Nvidia driver version: 460.73.01 cuDNN version: /usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5 HIP runtime version: N/A MIOpen runtime version: N/A Versions of relevant libraries: [pip3] mypy==0.910 [pip3] mypy-extensions==0.4.3 [pip3] numpy==1.18.0 [pip3] torch==1.10.0 [pip3] torch-scatter==2.0.9 [pip3] torchvision==0.11.0 [conda] Could not collect ``` ## Additional context I originally get this error when I try to run `torchvision` `FasterRCNN` on cuda with deterministic algorithms: ```python import torch torch.use_deterministic_algorithms(True) from torchvision.models.detection.faster_rcnn import fasterrcnn_resnet50_fpn model = fasterrcnn_resnet50_fpn(pretrained=False, pretrained_backbone=False).cuda() model( torch.zeros(1, 3, 800, 800).cuda(), [{ 'boxes': torch.tensor([[100, 200, 300, 400]]).cuda(), 'labels': torch.tensor([1]).cuda(), }]) ``` cc @ngimel @mruberry @kurtamohler

I can reproduce your issue, both in version 1.11 and in a recent nightly
(1.13.0.dev20220604), but only if I set:

torch.use_deterministic_algorithms (True)

>>> import torch
>>> torch.__version__
'1.11.0'
>>> torch.version.cuda
'11.3'
>>> torch.cuda.get_device_name()
'GeForce GTX 1050 Ti'
>>> x = torch.zeros((5,4), device=torch.device('cuda:0'))
>>> x[[False,True,False,True,True]] = torch.tensor([1.0, 1.0, 1.0, 1.0], device=torch.device('cuda:0'), dtype=torch.float32)
>>> x
tensor([[0., 0., 0., 0.],
        [1., 1., 1., 1.],
        [0., 0., 0., 0.],
        [1., 1., 1., 1.],
        [1., 1., 1., 1.]], device='cuda:0')
>>> torch.use_deterministic_algorithms (True)
>>> x[[False,True,False,True,True]] = torch.tensor([1.0, 1.0, 1.0, 1.0], device=torch.device('cuda:0'), dtype=torch.float32)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == expandedValue.numel()INTERNAL ASSERT FAILED at "/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/Indexing.cu":268, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor: 12 vs 4

Best.

K. Frank

ptrblck · June 17, 2022, 9:14pm

CC @eqy could you take a look at this failure as you’ve worked on the last fix?
This might be a new issue (previously untested) or the same error popped up again.