[SOLVED] What memory is released by using loss.backward()? Why more memory is used for model.eval()

Hi,
As stated in the title, what memory is released by using loss.backward()? Why more memory is used for model.eval().
I know torch.no_grad may be able to reduce the memory for evaluation, but I couldn’t update my pytorch version yet. (I am using pytorch 0.4.0). From the documents, this function should be there, but it cannot be imported.

Is there any other way to release the memory in the way what loss.backward has done?
I found the following one, but it seems only works for parameters not requiring grad.

I found the answer from the following thread. It is because of the scoping rules in python and two graphs will be generated.

Thanks.