I’ve run into a bit of a problem when trying to utilise a pre-trained model. I’ve written the model myself based around a UNet architecture.
When iterating inference on a GPU (Nvidia GeForce GTX 750 Ti), I find that it infers incredibly quickly for the first ~25 iterations, then slows dramatically for the remaining iterations.
For example, running the following:
from convDiff_model import convDiff import torch import time device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu") net = convDiff() net.to(device) ran = torch.rand(1, 2, 256, 256).to(device) with torch.no_grad(): tic = time.process_time() for i in range(48): net(ran.float()) if i % 5 == 4: toc = time.process_time() print('Iter. %2d to %2d: Mean time: %.3f' % (i-4, i+1, (toc - tic) / 5.) ) tic = time.process_time()
produces the output:
Iter. 0 to 5: Mean time: 0.004 Iter. 5 to 10: Mean time: 0.002 Iter. 10 to 15: Mean time: 0.002 Iter. 15 to 20: Mean time: 0.002 Iter. 20 to 25: Mean time: 0.084 Iter. 25 to 30: Mean time: 0.442 Iter. 30 to 35: Mean time: 0.442 Iter. 35 to 40: Mean time: 0.442 Iter. 40 to 45: Mean time: 0.444
so you can see that there’s a factor of ~200x slowdown between the first twenty iterations and the last twenty. Oddly, it seems that including/removing
with torch.no_grad(): makes no difference. Inference remains fast if I
empty_cache() every, say, 15 iterations, but that in itself uses 0.45 second, which negates it as a working solution.
[To clarify, in the above code snippet, I’ve removed the loading of the trained model, as this particular behaviour doesn’t seem to depend on the weightings.]
The full model architecture is available at this GitHub repo: https://github.com/jrmullaney/ConvDiff/tree/master/code
Is this behaviour normal? I’d have thought each iteration would be independent.
Apologies if this has already been addressed in another post; I had a good look round but couldn’t find any solutions.
Thanks for your help.