Strange Behavior of nn.Linear()

Hi,
My code is simple and as follows:
import torch
device = torch.device(“cuda” if torch.cuda.is_available() else “cpu”)
m = torch.nn.Linear(20, 30).to(device)
input = torch.randn(128, 40).to(device)
output = m(input)
print(output.shape)

In this code, it is expected to return an error due to shape mismatch on gpus. However, it could work properly and return the output.shape = (128,30). But when shifted to cpu, it returns the mismatch error. And if the input shape is shifted to (128, N), where N < 20, then it returns an error on GPUs.
My torch.version = 1.9.0+cu111 & torch.version.cuda = 11.1 , My GPUs are NVIDIA DGX A100 640GB

Could you update PyTorch to the latest stable or nightly version, please?
This was a known issue which was fixed in a patch release a while ago if I’m not mistaken.