The problem arised form concatenating outputs of convolution along second dimension. This lead to calling conv’s backward with non-contiguous gradients, and we were overly smart about reusing cuDNN descriptors, so the backend thought that it’s contiguous. The fix is to either disable cuDNN or rebuild pytorch.
I’m sorry if you that also affected your network