Torch.compile cache_size_limit best practice

rickyx · April 11, 2024, 10:24pm

Hey, I am getting the below error in my testing which I have various shapes of inputs run through my model.

msg = 'cache_size_limit reached'

    def unimplemented(msg: str) -> NoReturn:
        assert msg != os.environ.get("BREAK", False)
>       raise Unsupported(msg)
E       torch._dynamo.exc.Unsupported: cache_size_limit reached

../../anaconda3/lib/python3.9/site-packages/torch/_dynamo/exc.py:193: Unsupported

I understand this happens when recompilation happens. I could get around it by setting a higher value.

But my questions are:

What’s the expected best practice if I am hosting in a production env? Should I just set this value to a high value?

marksaroufim · April 12, 2024, 12:02am

Quite the opposite my suggestion would be to figure out why you’re getting so many recompilations in the first place with TORCH_LOGS="recompiles" python your_script.py