Hello, i am working on a model that uses huge tensors and I am constantly battling memory consumption. I have written a custom autograd function to reduce memory requirements in the forward pass for some operation. I am now looking at the backward pass and I wonder whether I can modify the gradient of tensors I modified inplace in the forward-pass inplace in the backward-pass.
Can you share a minimal reproducible example?
oh course, this is an condensed example:
class LeakyReluLike(torch.autograd.Function): @staticmethod def forward(ctx, input1, input2, negative_slope, inplace): ctx.save_for_backward(input1) ctx.negative_slope = negative_slope cond = (input1 < 0.0) if inplace: input2_out = input2.masked_scatter_(cond, negative_slope* input2[cond]) ctx.mark_dirty(input2) else: input2_out = input2.masked_scatter(cond, negative_slope* input2[cond]) return input2_out @staticmethod def backward(ctx, g2): input, = ctx.saved_tensors negative_slope = ctx.negative_slope cond = (input < 0.0) if False: g2_I = g2.masked_scatter_(cond, negative_slope* g2[cond]) else: g2_I = g2.masked_scatter(cond, negative_slope*g2[cond]) return None, g2_I, None, None
this functions is similar to a leaky ReLU in that it multiplies the second tensor with the negative slope if the first tensor is smaller than zero.
input1 is small,
input2 is very big (multiple gigabytes!). I would like to modify the
g2 inplace (assuming inplace=True), since i often get OOM errors at exactly this line:
g2_I = g2.masked_scatter(cond, negative_slope*g2[cond]) in the backward-step.
I have another question related to the original:
I have two autograd functions where I would like to modify inplace in the backward-step. But in one the saved tensor is not contiguous? Here’s what I see:
ctx.save_for_backward(inp_copy.grad.contiguous(), jac_copy.grad.contiguous()) (...) g_input, g_jac = ctx.saved_tensors g_jac.is_contiguous() => False
What’s going on? I don’t want to copy the tensor, it’s huge! If I set a breakpoint where I save the tensors both are contiguous…
EDIT: resolved, it was a simple typo! I had overwritten
Ok, another question related to inplace-modifiactions and autograd Functions, but now during the forward-function.
I have the following Error:
File "/scratch/lkurscheidt48/mnt_problems_temp/models/Memory_Saving_Forward_AD.py", line 481, in forward jac[cond].mul_(negative_slope) RuntimeError: CUDA out of memory. Tried to allocate 4.66 GiB (GPU 3; 31.75 GiB total capacity; 21.17 GiB already allocated; 3.47 GiB free; 26.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON
The thing is,
jac[cond].mul_(negative_slope) should not allocate!
negative_slope is a float and the whole thing is in the forward-function for a custom autograd-function, so the autograd-mechanism should also not allocate something. Is it that the error is not in sync with the line? I remember something like this but I don’t find anything right now when I search for it.