Computation graph optimization during training

Hi, is it possible or neccessary to optimize the dynamic computation graph generated during training for a higher throughput?If it is, then what is the recommended way to achieve that? Thanks in advance.

Hi,

This is not necessary in general.
If you really want to try and get the best, you should use torchscript model with cpp inference to strip away the python interpreter.

Thank you for the reply. But my use case is to improve the training throughput, if I understand it right, torchscript can only improve performance for network inference rather than training(forward&backward).And do you have any advice about how to improve pytorch forward&backward efficiency?

You can actually perform training with torchscript.
You can try to torchscript your python code during training.

That being said, if your network is a regular architecture, we try to make sure that the performance for these are as good as possible out of the box.

Hi Dale,have you slove this problem, and TorchScript works?

Did torchscript can actually optimize computation graph during traing?

Yes, touchscript does optimize the graph at train time. See :
https://pytorch.org/blog/optimizing-cuda-rnn-with-torchscript/#writing-custom-rnns.