Variable causing GPU meomory full

Jaideep_Valani · May 18, 2020, 1:57pm

Below loss_func returns variable for whom require grad has to be explicitly made requires grad=True or else i get an error message during .backward saying there is grad fn for this.
But few epochs this variable results cuda meomory error,i keep very low batch size.

CustomLossfunc(nn.Module):

def forward 
loss=torch.tensor(loss_func(torch.round(inputs), targets,  )-1).requires_grad_(True).cuda()
        return  -loss

ptrblck · May 19, 2020, 9:15am

Are you storing the loss tensor in a list or another container?
This would increase the memory usage. However, the tensor should not be attached to the computation graph based on your description, which seems to be another unrelated issue.

Could you remove the new tensor creation and just try to use:

loss = loss_func(torch.round(inputs), targets) - 1

Also, I’m not sure about the derivative of round, but I assume it should be zeros almost everywhere?

Jaideep_Valani · May 20, 2020, 3:04am

@ptrblck thanks I was doing it the way u suggested but the thing is variable returned by function which is sckcit metric function I think is not having require grad as true, so loss.backward threw exception saying no grad or grad fn existing, so that is why I had to add this ,it resolved the error but is now always resulting into GPU meomory error with just 2 to 3 iterations of epoch .

ptrblck · May 20, 2020, 5:02am

If you are using functions of other libraries, such as numpy or sklearn, you would have to implement the backward method manually, since Autograd cannot track these operations.
Wrapping the loss into a new tensor with requires_grad=True creates a new computation graph from this point, so that all preceding operations are detached (basically your model’s forward pass) and your model won’t get any valid gradients.

You could write a custom autograd.Function with the manual backward method as explained here or you would have to use PyTorch methods only.

Jaideep_Valani · May 21, 2020, 5:07pm

is this same for every function or specific to Relu ? i want to call cohenkappa in function of skcit. in fwd

grad_input[input < 0] = 0

ptrblck · May 22, 2020, 8:41am

The MyReLU implementation is just an example in the tutorial and you could of course use the built-in nn.ReLU() module.
You would need to implement a custom autograd.Function e.g. if you are leaving PyTorch.
If you want to call the kappa method from sklearn, you would have to write the backward pass manaully.
However, if I’m not mistaken Cohen’s Kappa score is not contiguous and not differentiable, is it?
(There might be approximations which are differentiable, so you should check out the current research.)