Hi,
From this question RuntimeError when using a += b but not when doing a= a + b
I got this answer:
The difference between a += b and a = a + b is that in the first case, b is added in a inplace (so the content of a is changed to now contain a+b). In the second case, a brand new tensor is created that contains a+b and then you assign this new tensor to the name a.
To be able to compute gradients, you sometimes need to keep the original value of a, and so we prevent inplace operation from being done because otherwise, we wonāt be able to compute gradients
from @albanD
However, I have some other questions:

For a = a+b, a brand new tensor is created that contains a+b and then you assign this new tensor to the name a . So, where is the tensor of the original a after assigning this new to tensor to the name a ?
Can Pytorch ārememberā the original tensor of a for back propagation? 
if I do this:
def forward(self, x):
x = self.pool(F.relu(self.conv1(x))) >stageā
x = self.pool(F.relu(self.conv2(x))) >stageā”
x = x.view(1, 16 * 5 * 5) >stageā¢
x = F.relu(self.fc1(x)) >stageā£
x = F.relu(self.fc2(x)) >stageā¤
x = self.fc3(x)
return x
though I use the variableās name x at all stages, the pytorch can remember all x at every stageļ¼ in order to calculate gradient during back propagation. Am I right?
thank you in advance!