I’ve run into a bit of a problem when trying to utilise a pre-trained model. I’ve written the model myself based around a UNet architecture.
When iterating inference on a GPU (Nvidia GeForce GTX 750 Ti), I find that it infers incredibly quickly for the first ~25 iterations, then slows dramatically for the remaining iterations.
For example, running the following:
from convDiff_model import convDiff
import torch
import time
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
net = convDiff()
net.to(device)
ran = torch.rand(1, 2, 256, 256).to(device)
with torch.no_grad():
tic = time.process_time()
for i in range(48):
net(ran.float())
if i % 5 == 4:
toc = time.process_time()
print('Iter. %2d to %2d: Mean time: %.3f' % (i-4, i+1, (toc - tic) / 5.) )
tic = time.process_time()
produces the output:
Iter. 0 to 5: Mean time: 0.004
Iter. 5 to 10: Mean time: 0.002
Iter. 10 to 15: Mean time: 0.002
Iter. 15 to 20: Mean time: 0.002
Iter. 20 to 25: Mean time: 0.084
Iter. 25 to 30: Mean time: 0.442
Iter. 30 to 35: Mean time: 0.442
Iter. 35 to 40: Mean time: 0.442
Iter. 40 to 45: Mean time: 0.444
so you can see that there’s a factor of ~200x slowdown between the first twenty iterations and the last twenty. Oddly, it seems that including/removing with torch.no_grad():
makes no difference. Inference remains fast if I empty_cache()
every, say, 15 iterations, but that in itself uses 0.45 second, which negates it as a working solution.
[To clarify, in the above code snippet, I’ve removed the loading of the trained model, as this particular behaviour doesn’t seem to depend on the weightings.]
The full model architecture is available at this GitHub repo: https://github.com/jrmullaney/ConvDiff/tree/master/code
Is this behaviour normal? I’d have thought each iteration would be independent.
Apologies if this has already been addressed in another post; I had a good look round but couldn’t find any solutions.
Thanks for your help.