Hi everyone,
I’m currently experiencing a weird behavior when using the nn.Conv2d module.
When I’m running the following code on a more or less empty GPU I get way slower computation time (~10x) than when the GPU Memory is almost full.
import time
import torch
import torch.nn as nn
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
with torch.no_grad():
filter = nn.Conv2d(
in_channels=1,
out_channels=1,
kernel_size=3,
padding=1,
bias=False,
device=device)
total_start = time.time()
for i in range (400):
a = torch.rand((1, 1, 128, 128), device=device)
b = torch.rand((4096, 1, 128, 128), device=device)
with torch.no_grad():
c= filter(a)
d= filter(b)
print(f'Total execution time {time.time()-total_start:.6f} seconds')
The execution of the code can be accelerated by 10x by previously blocking GPU Memory running the following code in a different python terminal. The Batchsize has to be adjusted to get a joint memory consumption of more than 90% GPU Memory. For a Nvidia GeForce RTX 3090 with running display drivers 260000 seems to be a good fit.
import torch
a = torch.rand((260000, 1, 128, 128), device='cuda')
I tested the behavior on various different GPU memory utilization and found that once the combined utilization surpasses ~90% of GPU memory, the execution speed will increase tremendously. I have also tested the behavior on two different systems both equipped with a Nvidia GeForce RTX 3090 and both showed the same behavior.
With 8656MiB / 24576MiB Memory usage I get the following output:
Total execution time 20.186182 seconds
With 22908MiB / 24576MiB Memory usage I get the following output:
Total execution time 2.597108 seconds
Has anybody experienced this before and can lead me to how I can get the fast execution speed on low GPU Memory utilization?
My torch version is 1.11.0
My cudatoolkit version is 11.3.1
Thank you in advance!