Torch deterministic algorithms error

Hi, when I am using torch.use_deterministic_algorithms(True) and I run this code, I got this error.
sum_img are images coverted into a torch tensor of dimension: batch_size x channel x img_size (128x128)

If I move sum_img to cpu instead of cuda, I have got no error

The code gives me error even if I put a number instead a list of number
If I run this code with deterministic algorithms false run without a problem, how can I fix it? thank you in advice

list_n = [0.85,0.5,0.3] 
for i in range(3):
		channel = sum_img[:,i,:,:]
		channel[channel>=5] = list_n[i]   




RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Indexing.cu":253, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor2684801

Could you post the output of python -m torch.utils.collect_env as well as the shapes for all tensors, please?
Based on the error message it could be a valid error, which we would have to debug and fix (if it’s still observed in the current nightly release).

thank you for answering the output is:

Collecting environment information...
PyTorch version: 1.9.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.12.0
Libc version: glibc-2.26

Python version: 3.7 (64-bit runtime)
Python platform: Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 11.0.221
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.9.0+cu102
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.10.0
[pip3] torchvision==0.10.0+cu102
[conda] Could not collect

the shape are:

sum_img = torch.Size([64, 3, 128, 128])
channel = torch.Size([64, 128, 128])

Do you need more information?

Hi, I am assuming that you want to make the output of your model deterministic. I had exactly the similar issue in my project and solved it using the following solution:

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

Avoid use_deterministic_algorithms(True). The above code solved the determinism issue for me. If you run your code on colab, a couple of additional lines are needed, let me know if that is the case so that I can provide it to you.

2 Likes

Yes, thank you, this fix my problem. I think torch.cuda.manual_seed and seed_all no need it cause torch.manual_seed should enable both cpu and gpu

It’s not sufficient to only seed the code and disable non-deterministic algorithms in cudnn as described in the reproducibility docs, so the suggestion is not a solution.
Do not avoid using torch.use_deterministic_algorithms(True), as it’s the proper way to get a full deterministic code.

In any case, thanks for the additional information. We are tracking the issue here and will fix it.

oh ok thank you for making clear, I’ll wait when the fix is ready

Thanks! Solved my issue too