Expected output.numel() <= std::numeric_limits<int32_t>::max() to be true, but got false

How to avoid this issue? It appears that it is related to the type of tensor. Does it mean the tensor is float64 instead of float32?
Below is the stack trace.

File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/runpy.py”, line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/runpy.py”, line 87, in _run_code
exec(code, run_globals)
File “/home/csgrad/mbhosale/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/…/…/debugpy/launcher/…/…/debugpy/main.py”, line 39, in
cli.main()
File “/home/csgrad/mbhosale/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/…/…/debugpy/launcher/…/…/debugpy/…/debugpy/server/cli.py”, line 430, in main
run()
File “/home/csgrad/mbhosale/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/adapter/…/…/debugpy/launcher/…/…/debugpy/…/debugpy/server/cli.py”, line 284, in run_file
runpy.run_path(target, run_name=“main”)
File “/home/csgrad/mbhosale/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py”, line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File “/home/csgrad/mbhosale/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py”, line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File “/home/csgrad/mbhosale/.vscode-server/extensions/ms-python.python-2023.4.1/pythonFiles/lib/python/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py”, line 124, in _run_code
exec(code, run_globals)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/mainvm.py”, line 250, in
mp.spawn(run_parallel,
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 240, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method=‘spawn’)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 198, in start_processes
while not context.join():
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 160, in join
raise ProcessRaisedException(msg, error_index, failed_process.pid)
torch.multiprocessing.spawn.ProcessRaisedException:

– Process 1 terminated with the following error:
Traceback (most recent call last):
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 69, in _wrap
fn(i, *args)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/mainvm.py”, line 132, in run_parallel
train.run()
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 210, in run
self.train_epoch(optimizer, scheduler, epoch)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 107, in train_epoch
loss, trdice = self.trainIter(fixed, moving, fixed_label, moving_label, fixed_nopad=fixed_nopad)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 34, in trainIter
sim_loss, grad_loss, seg_loss, dice = self.model.forward(fix, moving,
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/parallel/distributed.py”, line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/parallel/distributed.py”, line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/models.py”, line 191, in forward
unet_out = self.unet(x)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/models.py”, line 149, in forward
y = self.upsample(y)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/upsampling.py”, line 153, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/functional.py”, line 3912, in interpolate
return torch._C._nn.upsample_nearest3d(input, output_size, scale_factors)
RuntimeError: Expected output.numel() <= std::numeric_limits<int32_t>::max() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/multiprocessing/resource_tracker.py:216: UserWarning: resource_tracker: There appear to be 26 leaked semaphore objects to clean up at shutdown
warnings.warn('resource_tracker: There appear to be %d ’

Appears its not the value but the size of the tensor that is causing the issue. But I checked its well below int32 max value.

Hi Mahesh!

I don’t know how to fix your issue, but …

This error message does seem a bit odd. Could you tell what version
of pytorch you are using?

My understanding is that pytorch supports tensors that are quite large
(assuming adequate memory) and pytorch indexes tensors with int64s
(longs).

Here is an illustration (using python, rather than drilling down into c++
to get std::numeric_limits<int32_t>::max()):

>>> import torch
>>> torch.__version__
'1.13.1'
>>> torch.iinfo (torch.int32).max
2147483647
>>> torch.zeros (2**32, dtype = torch.int8).numel()
4294967296

If your pytorch version doesn’t explain this, could you post a minimal,
fully-self-contained, runnable script that reproduces the error, preferably
without multiprocessing, unless you think that that is part of the cause
of the error?

Best.

K. Frank

Based on the stacktrace it seems F.interpolate fails, but I cannot reproduce it on the CPU/CUDA via:

x = torch.randn(1, 2**10, 2**10, 2**10)
out = F.interpolate(x, size=(2**11+1, 2**11))
out.numel() > 2**32
# True

I don’t know how, but the cache error is resolved after emptying.

Could you give me more information which “cache error” was raised and what exactly you were emptying?

The error is the same as that I posted earlier. In addition, I also sometimes got the memory error -

CUDA out of memory. Tried to allocate 6.85 GiB (GPU 0; 23.69 GiB total capacity; 9.79 GiB already allocated; 2.73 GiB free; 16.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I cleared the cache with the below,

torch.cuda.empty_cache()

Are you seeing this error:

RuntimeError: Expected output.numel() <= std::numeric_limits<int32_t>::max() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

when running my code snippet?

I also encounter this problem when I use torch.nn.interpolate function,but why?

I don’t know as I’m not able to reproduce the issue using my provided code snippet and the author did not follow up after my last question.

Apologies for the delayed reply. Caught up with deadlines.

I ran the script that @ptrblck provided; I don’t see any error and the output is True.

Okay, so I am getting this error again.

– Process 1 terminated with the following error:
Traceback (most recent call last):
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 69, in _wrap
fn(i, *args)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/mainvm.py”, line 135, in run_parallel
train.run()
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 159, in run
self.train_epoch(optimizer, scheduler, epoch)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 119, in train_epoch
loss, trdice = self.trainIter(fixed, moving, fixed_label, moving_label, fixed_nopad=fixed_nopad, seg_f=seg_fname)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 43, in trainIter
sim_loss, grad_loss, seg_loss, dice = self.model.forward(fix, moving,
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/parallel/distributed.py”, line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/parallel/distributed.py”, line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/models.py”, line 244, in forward
unet_out, enc_out = self.unet(x)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/models.py”, line 157, in forward
y = self.upsample(y)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/upsampling.py”, line 153, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/functional.py”, line 3912, in interpolate
return torch._C._nn.upsample_nearest3d(input, output_size, scale_factors)
RuntimeError: Expected output.numel() <= std::numeric_limits<int32_t>::max() to be true, but got false. (Could this error message be improved? If so, please report an enhancement request to PyTorch.)

Oddly enough, when I change to the single GPU setup, I rather get out of memory error.

– Process 0 terminated with the following error:
Traceback (most recent call last):
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/multiprocessing/spawn.py”, line 69, in _wrap
fn(i, *args)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/mainvm.py”, line 135, in run_parallel
train.run()
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 159, in run
self.train_epoch(optimizer, scheduler, epoch)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 119, in train_epoch
loss, trdice = self.trainIter(fixed, moving, fixed_label, moving_label, fixed_nopad=fixed_nopad, seg_f=seg_fname)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/train.py”, line 43, in trainIter
sim_loss, grad_loss, seg_loss, dice = self.model.forward(fix, moving,
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/parallel/distributed.py”, line 1008, in forward
output = self._run_ddp_forward(*inputs, **kwargs)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/parallel/distributed.py”, line 969, in _run_ddp_forward
return module_to_run(*inputs[0], **kwargs[0])
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/models.py”, line 244, in forward
unet_out, enc_out = self.unet(x)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/Image_registration/registration_copy/registration/models.py”, line 157, in forward
y = self.upsample(y)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/module.py”, line 1130, in _call_impl
return forward_call(*input, **kwargs)
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/modules/upsampling.py”, line 153, in forward
return F.interpolate(input, self.size, self.scale_factor, self.mode, self.align_corners,
File “/home/csgrad/mbhosale/anaconda3/envs/registration/lib/python3.8/site-packages/torch/nn/functional.py”, line 3912, in interpolate
return torch._C._nn.upsample_nearest3d(input, output_size, scale_factors)
RuntimeError: CUDA out of memory. Tried to allocate 8.59 GiB (GPU 0; 23.69 GiB total capacity; 10.37 GiB already allocated; 1.84 GiB free; 12.23 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

May be this really is the wrong error message in the multi-GPU training case? Also, it should explain why after increasing the memory, I was able to resolve the issue.

Let me know.

I ran into the same problem (image segmentation, Pytorch: 1.13.1+cu117, Ubuntu 20.04).
Basically, I ran a try-except loop with decreasing input sizes to find optimal inference parameters.

I can confirm that this problem occurred (rarely) when I ran out of memory. I used a single gpu in a subprocess.

More info:

I added this error to my excepted errors and everything worked fine as before. I also monitored ram usage and it was easy to determine the problem. However, using Pytorch 1.9, I did not have this error message.

Here are further related errors I faced over the time, indicating a CUDA out of memory error (strating from Pytorch 1.6 to 1.13):

NON_INTUITIVE_RAM_ERRORS = [
    'CUDA out of memory',  # some are of course intuitive
    'CUDA error: out of memory',
    'Unable to find a valid cuDNN algorithm to run convolution',
    'cuDNN error: CUDNN_STATUS_NOT_SUPPORTED',
    'DefaultCPUAllocator: not enough memory:',
    'Expected output.numel() <= std::numeric_limits<int32_t>::max() to be true, but got false',
]

Try-except loop was:

model, optimizer, loss_func = model_loader.setup_model_optimizer_loss()
device = model_loader.device
channels = model.in_channels
model.eval()
# ram usage with workers seems stable after 3 iterations
iterations = 3
num_samples = iterations * batch_size
data_loader = get_data_loader(model, num_samples, batch_size, channels, input_size, input_depth)
with torch.no_grad():
    with torch.cuda.amp.autocast(enabled=constants.ENABLE_AMP):
        try:
            for i, (input_batch, target_batch) in enumerate(data_loader):
                input_batch = input_batch.to(device)
                out_logits = model(input_batch)
                out_logits = out_logits.float().cpu().numpy()
                ram_used, ram_total, _ = get_memory(device)
        except (RuntimeError, ValueError) as e:
            for non_intuitive_error in constants.NON_INTUITIVE_RAM_ERRORS:
                if non_intuitive_error in e.args[0]:
                    model_loader.wipe_memory()
                    return False, None, None
            return False, None, e.args[0]
model_loader.wipe_memory()
return True, ram_used, None