Hi PyTorch community,
We are evaluating distributed training for PT 2.0 with compilation. We noticed that compiling a ~ 1B model will cause the first few steps to be slower and it can take ~10 mins for training to reach stable and full throughput state. I am wondering if a compiled model can be saved as some intermediate format so that re-launching training with the same model will take less time.
we do cache the compiles, so that when you run the script again, you shouldn’t be running into recompiles (unless the cache got full).
But we can do a lot more, including having a saveable cache, as well as a moveable (or distributed) cache. At the moment, we don’t have this.
Thanks @smth ! I wonder where is the cache saved to? Also a related question is that we saw this warning when compiling fairseq roberta 1.3B,
4: function: 'gelu' (/fairseq/fairseq/modules/gelu.py:24)
4: reasons: tensor 'x' size mismatch at index 0. expected 492, actual 440
4: to diagnose recompilation issues, see https://pytorch.org/docs/master/dynamo/troubleshooting.html.
2: [2023-01-07 09:23:10,654] torch._dynamo.convert_frame: [WARNING] torch._dynamo hit config.cache_size_limit (64)
2: function: 'gelu' (/fairseq/fairseq/modules/gelu.py:24)
2: reasons: tensor 'x' size mismatch at index 0. expected 368, actual 506
2: to diagnose recompilation issues, see https://pytorch.org/docs/master/dynamo/troubleshooting.html.
Would you share some insights on what this means? Is this an error or a warning? thanks!