The reason is that expand does not allocate more memory but just make the tensor look as if you expanded it.
You should call .clone() on the resulting tensor if you plan on modifying it to avoid such behaviours.
Thank you so much for your explain and instructions, the problem is solved. I missed the doc file of expand before. By the way, as @Oli said, due to the memory saving mechanism, does the following code look good?
for i in range(self.nstack):
preds_instack = []
# return 5 scales of feature maps
hourglass_feature = self.hourglass[i](x)
if i == 0: # cache for smaller feature maps produced by hourglass block
features_cache = [torch.zeros_like(hourglass_feature[scale]) for scale in range(5)]
for s in range(5): # channel attention before heatmap regression
hourglass_feature[s] = self.channel_attention[i][s](hourglass_feature[s])
else: # residual connection across stacks
for k in range(5):
hourglass_feature_attention = self.channel_attention[i][k](hourglass_feature[k])
hourglass_feature[k] = hourglass_feature_attention + features_cache[k]
# feature maps before heatmap regression
features_instack = self.features[i](hourglass_feature)
I save the cache named hourglass_feature and use and reset it in the next iteration. Even though I just place the cache in the same location hourglass_feature[s] = self.channel_attention[i][s](hourglass_feature[s]), the graph connection is correct and is just what I desire after I draw the network using some visualization tool such as Netron.