My code is simple and as follows:
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
m = torch.nn.Linear(20, 30).to(device)
input = torch.randn(128, 40).to(device)
output = m(input)
In this code, it is expected to return an error due to shape mismatch on gpus. However, it could work properly and return the output.shape = (128,30). But when shifted to cpu, it returns the mismatch error. And if the input shape is shifted to (128, N), where N < 20, then it returns an error on GPUs.
My torch.version = 1.9.0+cu111 & torch.version.cuda = 11.1 , My GPUs are NVIDIA DGX A100 640GB