Torch deterministic algorithms error

Nexkira · June 27, 2021, 9:32pm

Hi, when I am using torch.use_deterministic_algorithms(True) and I run this code, I got this error.
sum_img are images coverted into a torch tensor of dimension: batch_size x channel x img_size (128x128)

If I move sum_img to cpu instead of cuda, I have got no error

The code gives me error even if I put a number instead a list of number
If I run this code with deterministic algorithms false run without a problem, how can I fix it? thank you in advice

list_n = [0.85,0.5,0.3] 
for i in range(3):
		channel = sum_img[:,i,:,:]
		channel[channel>=5] = list_n[i]   




RuntimeError: linearIndex.numel()*sliceSize*nElemBefore == value.numel()INTERNAL ASSERT FAILED at "/pytorch/aten/src/ATen/native/cuda/Indexing.cu":253, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor2684801

ptrblck · June 28, 2021, 8:42am

Could you post the output of python -m torch.utils.collect_env as well as the shapes for all tensors, please?
Based on the error message it could be a valid error, which we would have to debug and fix (if it’s still observed in the current nightly release).

Nexkira · June 28, 2021, 8:57am

thank you for answering the output is:

Collecting environment information...
PyTorch version: 1.9.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.5 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: 6.0.0-1ubuntu2 (tags/RELEASE_600/final)
CMake version: version 3.12.0
Libc version: glibc-2.26

Python version: 3.7 (64-bit runtime)
Python platform: Linux-5.4.104+-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: 11.0.221
GPU models and configuration: GPU 0: Tesla P100-PCIE-16GB
Nvidia driver version: 460.32.03
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.6.5
/usr/lib/x86_64-linux-gnu/libcudnn.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_adv_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_cnn_train.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_infer.so.8.0.4
/usr/lib/x86_64-linux-gnu/libcudnn_ops_train.so.8.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.9.0+cu102
[pip3] torchsummary==1.5.1
[pip3] torchtext==0.10.0
[pip3] torchvision==0.10.0+cu102
[conda] Could not collect

the shape are:

sum_img = torch.Size([64, 3, 128, 128])
channel = torch.Size([64, 128, 128])

Nexkira · June 29, 2021, 8:04am

Do you need more information?

ssykiotis · June 30, 2021, 8:15am

Hi, I am assuming that you want to make the output of your model deterministic. I had exactly the similar issue in my project and solved it using the following solution:

torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
torch.manual_seed(seed)
torch.cuda.manual_seed(seed)
torch.cuda.manual_seed_all(seed)

Avoid use_deterministic_algorithms(True). The above code solved the determinism issue for me. If you run your code on colab, a couple of additional lines are needed, let me know if that is the case so that I can provide it to you.

Nexkira · June 30, 2021, 8:36am

Yes, thank you, this fix my problem. I think torch.cuda.manual_seed and seed_all no need it cause torch.manual_seed should enable both cpu and gpu

ptrblck · June 30, 2021, 9:12am

It’s not sufficient to only seed the code and disable non-deterministic algorithms in cudnn as described in the reproducibility docs, so the suggestion is not a solution.
Do not avoid using torch.use_deterministic_algorithms(True), as it’s the proper way to get a full deterministic code.

In any case, thanks for the additional information. We are tracking the issue here and will fix it.

Nexkira · June 30, 2021, 9:31am

oh ok thank you for making clear, I’ll wait when the fix is ready

Chuyang · March 7, 2022, 5:31am

Thanks! Solved my issue too