When testing matrix multiplication with pytorch,
If the scale of matrix multiplication is m=10240,n=5120,k=5120.The cuda kernel used by pytorch matrix multiplication is:
but when the scale of matrix multiplication is m=40960,n=20480,k=10240,the result is:
Question:
when m=40960,n=20480,k=10240,the cuda kernel not in use?
the code is:
import torch
import time
torch.backends.cuda.matmul.allow_tf32 = True
m = 40960
n = 20480
k = 5120
input = torch.randn(m, k, dtype=torch.float32,device=‘cuda’)
weight = torch.randn(k, n, dtype=torch.float32,device=‘cuda’)
output = torch.matmul(input, weight)