Hello,
I have a use case for torch.compile()
for a model with varying input shapes, where trying to compile using dynamic shapes fails due to some of the operations in my model. To work around this, I am relying upon compiling multiple specializations of the graph, one for each of a small set of fixed input shapes, which I then pad my inputs to. This solution works well and speeds things up a lot, but it makes exporting the model tricky, forcing me to recompile all of the specializations ahead of time every time I want to do inference. I have two questions:
- Do multiple specializations of the compiled graph each store their own copy of the model weights on the GPU, or are they able to share the weights between them? Empirically I don’t seem to see the GPU memory usage increasing but it’s a little hard to tell since the memory usage varies depending on the input shape, so I wanted to check exactly what is going on here.
- If it is indeed the case that the multiple specializations are able to share the weights, is there any way to replicate this behaviour using
torch.export
? Presumably if I exported the model for each input shape in turn and loaded them all back into a process, I would end up with n copies of the weights on the GPU?
Many thanks,
Angus