Does deleting a tensor to free up GPU memory also deletes the corresponding portion of the computational graph?

coarsesand · August 26, 2019, 2:30am

Hi,

I have a pytorch module, whose intermediate variables occupy a lot of memory.
In order to prevent the “CUDA [from being] out of memory” I’ve deleted one of the intermediate tensors dist to free up GPU memory via del dist. I was wondering whether deleting this tensor in such a way also deletes / disfigures the corresponding portion of the computational graph that autograd is building up in the forward pass?

class E_step(nn.Module):

   def __init__(self, std):
      super(E_step, self).__init__()
      self.std = std

   def forward(self, pi, mean, features):
      # uses current parameter estimates to evaluate the responsibilities
      # 
      # INPUTS:
      # 1) pi:        (tensor: shape = [N, K])
      # 2) mean:      (tensor: shape = [N, K, F])
      # 3) features:  (tensor: shape = [N, D, F])
      # 
      # RETURNS:
      # 1) log_posteriors: (tensor: shape = [N, D, K])
      
      N, K, F = mean.shape
      N, D, F = features.shape
      
      dist = torch.empty([N, D, K], out = torch.cuda.FloatTensor())
      for k in range(K):
         dist[:,:,k] = torch.norm(mean[:, k, :] - features, p = 2, dim = 2)

      log_probs_plus_log_pi = (-0.5 * F * torch.cuda.FloatTensor([2 * math.pi * self.std**2] + dist).log() + pi.log()
      
      ############### deleting this tensor to free up GPU memory ###############
      del dist
      #################################################################
      
      log_probs_plus_log_pi = log_probs_plus_log_pi.transpose(0, 1)
      
      return log_probs_plus_log_pi - torch.logsumexp(log_probs_plus_log_pi, dim = 2, keepdim = True)

ptrblck · August 26, 2019, 10:51am

It looks like you are never using dist, so deleting it won’t affect the computation graph.
However, you could also just remove the computation in the first place, if you don’t need it for further computation.

coarsesand · August 26, 2019, 12:07pm

My bad, I missed out the dist when declaring the variable log_probs_plus_log_pi:

log_probs_plus_log_pi = (-0.5 * F * torch.cuda.FloatTensor([2 * math.pi * self.std**2] + dist).log() + pi.log()

Anyways, dist is definitely a part of the computational graph so what would happen if I deleted it from memory - would the backward pass be affected?

arjds · March 2, 2020, 6:14pm

any updates on this?