Very strange behavior, change one element of a tensor will influence all elements

jia_lee · March 29, 2019, 8:25am

Strange things happened, look at the simple code:

       mask_miss = mask_miss.expand_as(sxing)             # type: torch.Tensor
       mask_miss[:, :, -2, :, :] = 1

I want to reset a channel of mask_miss to 1, but whenever I did this, all the elements will become 1.

        print(torch.max(mask_miss), torch.min(mask_miss))

I print the range of mask_miss and I get tensor(1), tensor(1). If I change mask_miss[:, :, -2, :, :] = 0.5, I get tensor(0.5), tensor(0.5).

Oli · March 29, 2019, 9:39am

Pytorch tensors can share the same memory. If you create a tensor once and put that many times into mask_miss this sort of thing can happen.

It works the same way in numpy if you’re curious. Could you show us how you create the mask_miss tensor?

albanD · March 29, 2019, 10:16am

The reason is that expand does not allocate more memory but just make the tensor look as if you expanded it.
You should call .clone() on the resulting tensor if you plan on modifying it to avoid such behaviours.

jia_lee · March 30, 2019, 2:15am

Hi, thank you for your help. The mask_miss is created from numpy array and I have checked it already.

jia_lee · March 30, 2019, 2:37am

Thank you so much for your explain and instructions, the problem is solved. I missed the doc file of expand before. By the way, as @Oli said, due to the memory saving mechanism, does the following code look good?

for i in range(self.nstack):
            preds_instack = []
            # return 5 scales of feature maps
            hourglass_feature = self.hourglass[i](x)

            if i == 0:  # cache for smaller feature maps produced by hourglass block
                features_cache = [torch.zeros_like(hourglass_feature[scale]) for scale in range(5)]
                for s in range(5):  # channel attention before heatmap regression
                    hourglass_feature[s] = self.channel_attention[i][s](hourglass_feature[s])
            else:  # residual connection across stacks
                for k in range(5):
                
                    hourglass_feature_attention = self.channel_attention[i][k](hourglass_feature[k])

                    hourglass_feature[k] = hourglass_feature_attention + features_cache[k]
            # feature maps before heatmap regression
            features_instack = self.features[i](hourglass_feature)

I save the cache named hourglass_feature and use and reset it in the next iteration. Even though I just place the cache in the same location hourglass_feature[s] = self.channel_attention[i][s](hourglass_feature[s]), the graph connection is correct and is just what I desire after I draw the network using some visualization tool such as Netron.