I’m encountering an issue with enabling compiled_autograd
in PyTorch, which is causing my model’s training behavior to differ significantly.
Problem Description
Under the same environment, model, and dataset, toggling the following configuration:
torch._dynamo.config.compiled_autograd = True
produces a noticeably different trend in loss and gradient norm compared to setting it to False
.
Observations
- When
compiled_autograd
is off (False
), the loss and gradient norm trends behave as expected. - However, turning it on (
True
) results in abnormal trends.
Visual Comparison
I’ve attached the plots below:
- Red:
compiled_autograd = False
- Gray:
compiled_autograd = True
Environment Details
- PyTorch version: 2.5.1
- CUDA version: 12.1
- GPU: NVIDIA GeForce RTX 3090
- Operating System: ubuntu 22.04
Any insights or suggestions would be greatly appreciated!