CUDA error when trying to perform backward pass on `cdist`

I have a loss function that performs cdist between a predicted and ground truth vector. However, it seems to run into a CUDA error when trying to perform a backward pass:

Traceback (most recent call last):  File "/home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)  File "/home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)  File "/home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/__main__/atg/experimental/rnd/upsnet/upsnet/upsnet_end2end_train.py", line 441, in <module>    upsnet_train()
  File "/home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/__main__/atg/experimental/rnd/upsnet/upsnet/upsnet_end2end_train.py", line 250, in upsnet_train
    loss.backward()  File "/home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/tensor.py", line 198, in backward
    torch.autograd.backward(self, gradient, retain_graph, create_graph)  File "/home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/autograd/__init__.py", line 100, in
backward    allow_unreachable=True)  # allow_unreachable flag
RuntimeError: CUDA error: invalid configuration argument (cdist_backward_kernel_impl at /torch_dir/build.work/pytorch.build_pytorch_wheel/aten/src/ATen/native/cuda/DistanceKernel.cu:361)frame #0: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x47 (0x7f77bedb3ad7 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: at::native::(anonymous namespace)::cdist_backward_kernel_impl(at::Tensor&, at::Tensor const&, at::Tensor const&, at::Tensor const&, double, at::Tensor const&) + 0x10de (0x7f774dcbd79e in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cuda.so)
frame #2: at::native::_cdist_backward(at::Tensor const&, at::Tensor const&, at::Tensor const&, double, at::Tensor const&) + 0x309 (0x7f773cecff39 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #3: <unknown function> + 0x144f7e1 (0x7f773d56c7e1 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #4: <unknown function> + 0x14c13e6 (0x7f773d5de3e6 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #5: <unknown function> + 0x250e05a (0x7f773e62b05a in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #6: <unknown function> + 0x250df04 (0x7f773e62af04 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #7: <unknown function> + 0x32964e2 (0x7f773f3b34e2 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #8: <unknown function> + 0x14c13e6 (0x7f773d5de3e6 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #9: <unknown function> + 0x250e05a (0x7f773e62b05a in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #10: <unknown function> + 0x250df04 (0x7f773e62af04 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #11: torch::autograd::generated::CdistBackward::apply(std::vector<at::Tensor, std::allocator<at::Tensor> >&&) + 0x2cb (0x7f773f09470b in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #12: <unknown function> + 0x36c030c (0x7f773f7dd30c in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #13: torch::autograd::Engine::evaluate_function(std::shared_ptr<torch::autograd::GraphTask>&, torch::autograd::Node*, torch::autograd::InputBuffer&) + 0x5c2 (0x7f773f7d3f52 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #14: torch::autograd::Engine::thread_main(std::shared_ptr<torch::autograd::GraphTask> const&, bool) + 0x2e5 (0x7f773f7d3205 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #15: torch::autograd::Engine::thread_init(int) + 0x9b (0x7f773f7d2e1b in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_cpu.so)
frame #16: torch::autograd::python::PythonEngine::thread_init(int) + 0x2e (0x7f77c7f6ff9e in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/execroot/__main__/bazel-out/k8-fastbuild/bin/atg/experimental/rnd/upsnet/upsnet_end2end_train.runfiles/pythonhome_pypi/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #17: <unknown function> + 0xf9493 (0x7f77ce047493 in /home/justin.liang/.cache/bazel/_bazel_justin.liang/2da55068597246a7ff0741296fd2e52a/external/python3/bin/../.libs/libstdc++.so.6)
frame #18: <unknown function> + 0x76db (0x7f77cd5a66db in /lib/x86_64-linux-gnu/libpthread.so.0)
frame #19: clone + 0x3f (0x7f77ccb2ab2f in /lib/x86_64-linux-gnu/libc.so.6)

Anyone know what the cause is? I am running Python 3.7.7 with Pytorch 1.5.0+cu101.post2

Could you update to the latest nightly binary?
The launch configs for pdist and cdist should have been recently fixed for large tensors.

If I want to get the latest binary, do you know how I can install it through requirements.txt? I’m not sure what the name needs to be for torch==xxx.