Slow forward inference for traced model on second infrence, data dependant or independant?

drydenwiebe · September 27, 2021, 7:05pm

Hello,

I have a traced model that runs very slowly on the second inference run on torch 1.7 and greater. From these posts and others, this seems to be the expected behavior.

I have one question about the exact speed of the jit/graph optimisation run (the second run). Say that my model can take in different sized tensors as data during the forward pass, say [batch_size, c, h, w] or and then [batch_size, c * 10, h, w] both being valid inputs. Would be it expected that two forward passes (including the second, longer one) would be faster with input [batch_size, c, h, w] than input sized [batch_size, c * 10, h, w]? My experiments show that the second run (that optimises the graph) is the same (slower) speed with both inputs. I just want to confirm that this is the expected behavior.

Thank you!

ptrblck · September 28, 2021, 5:25am

If I’m not mistaken, the default JIT backend uses up to 3 steps to optimize the graph for each new input shape. I.e. the new and larger batch would trigger the optimization again. The nvfuser backend would relax this slowdown a bit, as it’s more flexible when it comes to different input shapes (in case the generated kernels cannot be reused, they would be regenerated and optimized again).