PyTorch dynamo is unstable in the 1st epoch and took more time than without compile

I tested compile feature of PyTorch 2.0 on my project.

Although 2nd epoch and after is a bit faster, PyTorch dynamo is unstable in the 1st epoch and took more time than without compile.


Is this an expected behavior?
If it isn’t, how to debug this?

In the first epoch, lots of warning messages:

to diagnose recompilation issues, see https://github.com/pytorch/torchdynamo/blob/main/TROUBLESHOOTING.md.torch._inductor.ir: [WARNING] Using FallbackKernel: torch.ops.aten.randint.default

Could you create an issue on GitHub, please, as the code owners of TorchDynamo are rarely active here?

I see. Should I post issue on pytorch/torchdynamo instead of pytorch/pytorch?

I would create the issue directly in pytorch/pytorch as dynamo should now be part of the core framework. The code owners could cross-post it into the pytorch/torchdynamo repo if needed.

1 Like

I see. However, since I found the trouble shooting guide on this page, so I will read this first, and will post issue if needed.
https://pytorch.org/docs/master/dynamo/troubleshooting.html

Thank you for the advice.

1 Like

You may be encountering dynamic shapes in your model, causing recompilation. The periods of low gpu utilization could be compilation time, and the second epoch could be better if the majority of input shapes have been seen before. You can use .explain or recompilation profiler (see Troubleshooting guide)

Posting issues on pytorch/pytorch is also ok, we may end up redirecting all issues there soon, but it’s true we originally asked for issues to be filed on pytorch/torchdynamo since that’s where we developed before launch.

1 Like