I have one question about the exact speed of the jit/graph optimisation run (the second run). Say that my model can take in different sized tensors as data during the forward pass, say [batch_size, c, h, w] or and then [batch_size, c * 10, h, w] both being valid inputs. Would be it expected that two forward passes (including the second, longer one) would be faster with input [batch_size, c, h, w] than input sized [batch_size, c * 10, h, w]? My experiments show that the second run (that optimises the graph) is the same (slower) speed with both inputs. I just want to confirm that this is the expected behavior.