Hello,
I’m trying to understand why my loss increases memory on GPU, I try every solution I found on internet (del, gc.convert()) but still increases…
def customized_loss(pred, target, x):
'''
This function takes the output of the model (pred), the target values (target), and the input of the model (x)
There are two functions:
- the first computes all jacobians of pred w.r.t x
- the second calculate the regularizer with the jacobians founded
Args :
- pred : batch_size x num_classes matrix, is the output of model for input x
- taget : batch_size x num_classes matrix, is the target values for x
- x : batch_size x input_size, is the current batch
Returns :
- the customized loss, float
'''
def getJacobians(pred, x):
'''
This function will calculate the jacobians of the model (pred) w.r.t each instance in x
Args :
- pred : batch_size x num_classes matrix, is the output of model for input x
- x : batch_size x input_size, is the current batch
Returns :
- the jacobians : batch_size x input_size x num_classes matrix
'''
jacobians = Variable(torch.zeros(batch_size, input_size, num_classes)).type(dtype)
for x_ in range(batch_size):
for jrc in range(len(jacobian_rows_construct)):
pred[x_].backward(torch.Tensor([jacobian_rows_construct[jrc]]).type(dtype), create_graph=True)
jacobians[x_, :, jrc] = x.grad[x_]
x.grad.data.zero_()
net.zero_grad()
return jacobians
def regularizer(jacobians):
'''
This function will calculate the regularizer, only using tensor operations,
it'll use the indices founded before and the value of similarity for each indices.
First using gather we keep all derivatives we need, the we compute all norms,
we multiply by our similarity vector and we sum all elements
Args :
- jacobians : 3D matrix, contains all jacobians for the current batch_size
Returns :
- the regularizer, float
'''
norms = torch.norm(jacobians.gather(1, ind_i)-jacobians.gather(1, ind_j),2,2)
return (norms*similarities).sum()
# calculate loss
cost = criterion(pred, target)
# calculate regularizer
reg = regularizer(getJacobians(pred, x))
# put them together and return
return cost + jac.sum()
I think the problem comes from getJacobians but I don’t see where… In my trainig loop I have put del loss, outputs, batch_x, batch_y just after loss.backward().