3D convolution non contiguous for 256x256x256

Hi, I am trying to do 3D convolution on a 256x256x256 volume, I noticed that this error happens on only 256x256x256 and not below that. I wanted to know if there is any workaround ? I have called the .contiguous() method for all the tensors but still it gives me a cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

below link has the minimal example where I reproduce the bug

Can someone please suggest a workaround ?