Hi Pytorch Team,
I have written down an over-simplified form of my framework.
When I call the “framework_eval_func” function, in each iteration, a single input is moved to the GPU at a time. Then, the network performs computations on the GPU, and the evaluation loss is calculated based on it. What I hoped to see next is the amount of GPU memory being consumed was the same no matter what the size of the dataset was. This is because I always work with one example at a time on the GPU. However, when I increased the size of the dataset, I ran out of memory.
Few things I already checked are-
- Use “with torch.no_grad():” to avoid computation of the gradients
- Use “loss_vect.append(loss.cpu().item())” to avoid saving references to tensors in GPU memory
- Use “x.detach()” to avoid computing a computation graph
I am now out of ideas. I wanted to know if this is an expected behavior.
class framework(): def __init__(self, net): self.net = net.cuda() . . def framework_eval_func(self, x, y): # Number of samples in the data N = x.shape Ids = np.arange(N) # Shuffle the set of indices; np.random.shuffle(Ids) # Evaluating the entire validation set loss_vect =  with torch.no_grad(): for index in range(Ids ): bx = x[index,:] # the input by = y[index,:] # the labels # move it to the gpu; bx = bx.cuda() # <~~~ A single example is moved by = by.cuda() # <~~~ to the GPU at each iteration loss = eval_loss(bx.detach(), by.detach(), self.net) loss_vect.append(loss.cpu().item()) loss_avg = np.mean(loss_vect) return loss_avg def framework_train_func(self, x, y): . . . .