Why TorchScript module does not take less GPU memory than Pytorch model?

ptrblck · July 27, 2022, 12:29am

You might not expect to see memory savings right now depending which utilities you are using.
E.g. once the model is scripted, utils. such as AOTAutograd would try to cut down the memory usage by avoiding to store unneeded activations (and allow to recompute them instead). But these utils. are still in development so experimental.

That’s not entirely true. By default PyTorch is “eager” i.e. each line of Python code will be executed as it’s written. Scripting a model could fuse operations into a larger block and thus avoid the expensive memory reads and writes. In the latest 1.12.0 release our nvFuser backends is enabled by default for CUDA workloads, which is able to fuse pointwise operations etc.
We are working on a blog post and tutorial to explain these fusions in more detail, but you could also take a look at e.g. this topic which shows a speedup of ~3.8x compared to the eager execution for a custom normalization layer.