Hello, i am working on a model that uses huge tensors and I am constantly battling memory consumption. I have written a custom autograd function to reduce memory requirements in the forward pass for some operation. I am now looking at the backward pass and I wonder whether I can modify the gradient of tensors I modified inplace in the forward-pass inplace in the backward-pass.
Can you share a minimal reproducible example?
oh course, this is an condensed example:
class LeakyReluLike(torch.autograd.Function):
@staticmethod
def forward(ctx, input1, input2, negative_slope, inplace):
ctx.save_for_backward(input1)
ctx.negative_slope = negative_slope
cond = (input1 < 0.0)
if inplace:
input2_out = input2.masked_scatter_(cond, negative_slope* input2[cond])
ctx.mark_dirty(input2)
else:
input2_out = input2.masked_scatter(cond, negative_slope* input2[cond])
return input2_out
@staticmethod
def backward(ctx, g2):
input, = ctx.saved_tensors
negative_slope = ctx.negative_slope
cond = (input < 0.0)
if False:
g2_I = g2.masked_scatter_(cond, negative_slope* g2[cond])
else:
g2_I = g2.masked_scatter(cond, negative_slope*g2[cond])
return None, g2_I, None, None
this functions is similar to a leaky ReLU in that it multiplies the second tensor with the negative slope if the first tensor is smaller than zero. input1
is small, input2
is very big (multiple gigabytes!). I would like to modify the g2
inplace (assuming inplace=True), since i often get OOM errors at exactly this line: g2_I = g2.masked_scatter(cond, negative_slope*g2[cond])
in the backward-step.
I have another question related to the original:
I have two autograd functions where I would like to modify inplace in the backward-step. But in one the saved tensor is not contiguous? Here’s what I see:
ctx.save_for_backward(inp_copy.grad.contiguous(), jac_copy.grad.contiguous())
(...)
g_input, g_jac = ctx.saved_tensors
g_jac.is_contiguous() => False
What’s going on? I don’t want to copy the tensor, it’s huge! If I set a breakpoint where I save the tensors both are contiguous…
EDIT: resolved, it was a simple typo! I had overwritten g_jac
!
Ok, another question related to inplace-modifiactions and autograd Functions, but now during the forward-function.
I have the following Error:
File "/scratch/lkurscheidt48/mnt_problems_temp/models/Memory_Saving_Forward_AD.py", line 481, in forward
jac[cond].mul_(negative_slope)
RuntimeError: CUDA out of memory. Tried to allocate 4.66 GiB (GPU 3; 31.75 GiB total capacity; 21.17 GiB already allocated; 3.47 GiB free; 26.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
The thing is, jac[cond].mul_(negative_slope)
should not allocate! negative_slope
is a float and the whole thing is in the forward-function for a custom autograd-function, so the autograd-mechanism should also not allocate something. Is it that the error is not in sync with the line? I remember something like this but I don’t find anything right now when I search for it.