Hi. I am running the code on four L4 GPUs on gcp with distributed training using torch 1.8 and cuda 11.8.
Following code gives
self.proj = nn.Conv3d(
in_channels=in_chans,
out_channels=embed_dim,
kernel_size=(self.tubelet_size, patch_size[0], patch_size[1]),
stride=(self.tubelet_size, patch_size[0], patch_size[1]))
x = self.proj(x).flatten(2).transpose(1, 2)
RuntimeError: cuDNN error: CUDNN_STATUS_NOT_INITIALIZED
Same code works fine if I don’t use distributed training. Please let me know what could be the issue.