Gpu memory cost

      out = self.conv1(x)
      out = self.norm1(out)
      out = self.relu(out)

      out = self.conv2(out)
      out = self.norm2(out)
      out = self.relu(out)

      out = self.conv3(out)
      out = self.norm3(out)
  1. training of conv and norm is false
  2. track_running_stats of norm is True

When I use pycharm to debug the code, from conv1 to conv3, the gpu memory not increase. But I exec norm3, the gpu memory increase.
If training of conv and norm is True, I test that I exec every sentence, the gpu memory increase.

When I exec every sentence, that create new out, why the gpu memory not increase?
But why I exec norm3 the gpu memory increase?

I see the requires_grad of weight and bias in norm is False.

The code in mmdet/models/backbones/ of mmdetection.


Keep in mind that the GPU api is asynchronous. So you might want to add a torch.cuda.syncrhonize() after the line to make sure it finished executing.

Also since, you reuse the out variable, the old Tensor that out was pointing to is not reachable anymore and can be deleted when you’re not training (when training, it needs to be kept around to be able to compute the backward).