I use functional.conv2d to do cross-correlation operations. But I found the same function become unacceptably slow after calling it multiple times.
For test and simplicity, I created zero tensors the same size as my data. And run the exact same function in a loop.
The former hundred was done instantly. However, after 500-ish times it takes about 0.3s.
I noticed that the GPU is still 100% usage after the finish of the code and GPU memory wasn’t run out.
I tried to run this function in nn.Module according to this post. But the same problem stays.
Could anyone help me with this?
import torch
import datetime
xa = torch.zeros(1, 1, 1785, 1785, device=torch.device('cuda:0'))
kernel = torch.zeros(1, 1, 129, 129, device=torch.device('cuda:0'))
for i in range(1000):
start = datetime.datetime.now()
torch.nn.functional.conv2d(xa, kerel, padding='same')
end = datetime.datetime.now()
print((end - start))