Use_deterministic_algorithms INTERNAL ASSERT FAILED

I get the following internal error when I try to use torch.use_deterministic_algorithms(True). The code runs fine without that line.

RuntimeError: linearIndex.numel()sliceSizenElemBefore == value.numel()INTERNAL ASSERT FAILED at “/pytorch/aten/src/ATen/native/cuda/”:253, please report a bug to PyTorch. number of flattened indices did not match number of elements in the value tensor71

At this line of code:
l1_values[torch.arange(len(max_idxs), device="cuda"), max_idxs] = 1

l1_values.shape is torch.Size([71, 1500])
max_idxs.shape is torch.Size([71])

My environment:
$ python3 -m torch.utils.collect_env
Collecting environment information…
PyTorch version: 1.9.1+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.13.4
Libc version: glibc-2.25

Python version: 3.6 (64-bit runtime)
Python platform: Linux-5.4.0-87-lowlatency-x86_64-with-Ubuntu-18.04-bionic
Is CUDA available: True
CUDA runtime version: Could not collect
GPU models and configuration:
GPU 0: GeForce GTX 980 Ti

Nvidia driver version: 460.73.01
cuDNN version: Probably one of the following:
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] torch==1.9.1+cu111
[pip3] torchaudio==0.9.1
[pip3] torchtext==0.8.1
[pip3] torchvision==0.10.1+cu111
[conda] blas 1.0 mkl
[conda] mkl 2020.2 256
[conda] mkl-service 2.3.0 py38he904b0f_0
[conda] mkl_fft 1.2.0 py38h23d657b_0
[conda] mkl_random 1.1.1 py38h0573a6f_0
[conda] numpy 1.19.2 py38h54aff64_0
[conda] numpy-base 1.19.2 py38hfa32c7d_0
[conda] numpydoc 1.1.0 pyhd3eb1b0_1

This issue should have been already fixed in the current master branch. Could you update to the nightly release and rerun your code? The upcoming 1.10 release would also ship with the fix.