Custom Autograd Function - inplace grad modification?

Hello, i am working on a model that uses huge tensors and I am constantly battling memory consumption. I have written a custom autograd function to reduce memory requirements in the forward pass for some operation. I am now looking at the backward pass and I wonder whether I can modify the gradient of tensors I modified inplace in the forward-pass inplace in the backward-pass.

Can you share a minimal reproducible example?

oh course, this is an condensed example:

class LeakyReluLike(torch.autograd.Function):

    @staticmethod
    def forward(ctx, input1, input2, negative_slope, inplace):
        ctx.save_for_backward(input1)
        ctx.negative_slope = negative_slope
        cond = (input1 < 0.0)
        if inplace:
            input2_out = input2.masked_scatter_(cond, negative_slope* input2[cond])
            ctx.mark_dirty(input2)
        else:
            input2_out = input2.masked_scatter(cond, negative_slope* input2[cond])
        return input2_out

    @staticmethod
    def backward(ctx, g2):

        input, = ctx.saved_tensors
        negative_slope = ctx.negative_slope

        cond = (input < 0.0)

        if False:
            g2_I = g2.masked_scatter_(cond, negative_slope* g2[cond])
        else:
            g2_I = g2.masked_scatter(cond, negative_slope*g2[cond])

        return None, g2_I, None, None

this functions is similar to a leaky ReLU in that it multiplies the second tensor with the negative slope if the first tensor is smaller than zero. input1 is small, input2 is very big (multiple gigabytes!). I would like to modify the g2 inplace (assuming inplace=True), since i often get OOM errors at exactly this line: g2_I = g2.masked_scatter(cond, negative_slope*g2[cond]) in the backward-step.

I have another question related to the original:

I have two autograd functions where I would like to modify inplace in the backward-step. But in one the saved tensor is not contiguous? Here’s what I see:

ctx.save_for_backward(inp_copy.grad.contiguous(), jac_copy.grad.contiguous())

(...)

g_input, g_jac = ctx.saved_tensors

g_jac.is_contiguous() => False

What’s going on? I don’t want to copy the tensor, it’s huge! If I set a breakpoint where I save the tensors both are contiguous…

EDIT: resolved, it was a simple typo! I had overwritten g_jac!

Ok, another question related to inplace-modifiactions and autograd Functions, but now during the forward-function.

I have the following Error:

File "/scratch/lkurscheidt48/mnt_problems_temp/models/Memory_Saving_Forward_AD.py", line 481, in forward
    jac[cond].mul_(negative_slope)
RuntimeError: CUDA out of memory. Tried to allocate 4.66 GiB (GPU 3; 31.75 GiB total capacity; 21.17 GiB already allocated; 3.47 GiB free; 26.85 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CON

The thing is, jac[cond].mul_(negative_slope) should not allocate! negative_slope is a float and the whole thing is in the forward-function for a custom autograd-function, so the autograd-mechanism should also not allocate something. Is it that the error is not in sync with the line? I remember something like this but I don’t find anything right now when I search for it.