Hi Pytorch Team,
I have written down an over-simplified form of my framework.
When I call the “framework_eval_func” function, in each iteration, a single input is moved to the GPU at a time. Then, the network performs computations on the GPU, and the evaluation loss is calculated based on it. What I hoped to see next is the amount of GPU memory being consumed was the same no matter what the size of the dataset was. This is because I always work with one example at a time on the GPU. However, when I increased the size of the dataset, I ran out of memory.
Few things I already checked are-
- Use “with torch.no_grad():” to avoid computation of the gradients
- Use “loss_vect.append(loss.cpu().item())” to avoid saving references to tensors in GPU memory
- Use “x.detach()” to avoid computing a computation graph
I am now out of ideas. I wanted to know if this is an expected behavior.
class framework():
def __init__(self, net):
self.net = net.cuda()
.
.
def framework_eval_func(self, x, y):
# Number of samples in the data
N = x.shape[0]
Ids = np.arange(N)
# Shuffle the set of indices;
np.random.shuffle(Ids)
# Evaluating the entire validation set
loss_vect = []
with torch.no_grad():
for index in range(Ids ):
bx = x[index,:] # the input
by = y[index,:] # the labels
# move it to the gpu;
bx = bx.cuda() # <~~~ A single example is moved
by = by.cuda() # <~~~ to the GPU at each iteration
loss = eval_loss(bx.detach(), by.detach(), self.net)
loss_vect.append(loss.cpu().item())
loss_avg = np.mean(loss_vect)
return loss_avg
def framework_train_func(self, x, y):
.
.
.
.