I think there’s some way to do it in PyTorch 0.3, but I don’t remember exactly since it has been quite some time ago since I used 0.3, sorry
It may have been the volatile=True setting you mentioned. In any case, due to this and various other improvements, I really recommend switching to PyTorch 1.0 if you can.
EDIT: if you don’t want to make any big code modification to your 0.3 code, you could also call .backward() on the model after each forward pass during inference. It will probably slow down the inference (because you have an addition backprop step) but only by a constant, because backward() will free the graph after each call (so you don’t let the graph grow and grow). However, i really recommend upgrading and using the context managers.