Exporting multiple graph specializations

ajl95 · July 23, 2024, 9:23am

Hello,

I have a use case for torch.compile() for a model with varying input shapes, where trying to compile using dynamic shapes fails due to some of the operations in my model. To work around this, I am relying upon compiling multiple specializations of the graph, one for each of a small set of fixed input shapes, which I then pad my inputs to. This solution works well and speeds things up a lot, but it makes exporting the model tricky, forcing me to recompile all of the specializations ahead of time every time I want to do inference. I have two questions:

Do multiple specializations of the compiled graph each store their own copy of the model weights on the GPU, or are they able to share the weights between them? Empirically I don’t seem to see the GPU memory usage increasing but it’s a little hard to tell since the memory usage varies depending on the input shape, so I wanted to check exactly what is going on here.
If it is indeed the case that the multiple specializations are able to share the weights, is there any way to replicate this behaviour using torch.export? Presumably if I exported the model for each input shape in turn and loaded them all back into a process, I would end up with n copies of the weights on the GPU?

Many thanks,
Angus

ajl95 · October 11, 2024, 12:25pm

Just bumping this in case anybody has any insight – some ability to export multiple graph specialisations at once would be really handy.