Difference between apply an call for an autograd function

cdancette · February 20, 2018, 2:25pm

I created a simple autograd function, let’s call it F (based on torch.autograd.Function).

What’s the difference between calling

a = F.apply(args)

and instantiating, then calling, like this :

f = F()
a = f(args)

The two versions seem to be used in pytorch code, and in examples

albanD · February 20, 2018, 2:27pm

Hi,

This difference is that instantiating + calling the Function works with “old style” functions (which are going to be deprecated in the future).
Using .apply is for the “new style” functions. You can differentiate the two easily: new style functions are defined with only @staticmethod, while old style ones have an __init__.

cdancette · February 20, 2018, 2:30pm

Ok, I see, thanks. Since I want to have a scalar parameter for my function, if I want to use the new style, I need to pass it as an argument to the forward then, and save it for backwards ?

I did this :

class F(torch.autograd.Function):
    def __init__(self, gamma=0.1):
        super().__init__()
        self.gamma = gamma

    def forward(self, args):
        pass
    def backward(self, args):
        pass

albanD · February 20, 2018, 2:33pm

Hi,
What you wrote is an old style function.

The equivalent new style would be:

class F_new(torch.autograd.Function):
    @staticmethod
    def forward(ctx, args, gamma):
        ctx.gamma = gamma
        pass

    @staticmethod
    def backward(ctx, args):
        pass

# Using your old style Function from your code sample:
F(gamma)(inp)
# Using the new style Function:
F_new.apply(inp, gamma)

cdancette · February 20, 2018, 2:35pm

Thank you very much !

And so everytime I call F.apply, a new “instance” will be created with its own context ?

Can I save intermediary results with ctx.intermediary = intermediary ?

If I call several times the function before doing a backward pass, will it override the gamma attribute each time, and the intermediary results I saved with ctx ?

albanD · February 20, 2018, 2:37pm

Each call to .apply will have a different context. So you can save everything you need in it without risk.
Note that if you need to save input or outputs, you should use the ctx.save_for_backward.

cdancette · February 20, 2018, 2:39pm

Ok, why do we need to use save_for_backward for this ? Is it just a convention ? Or does it performs additional checks ?
But intermediary tensors are fine to save, right ? I tried to save some with save_for_backwards but it failed, so I saved them as attributes in self (ctx now)

albanD · February 20, 2018, 2:46pm

Yes save_for_backward is just for input and outputs, it allows to perform additional checks (make sure that you don’t create non-collectable cycles). For intermediary results, you can save them as attribute of the context yes.

cdancette · February 20, 2018, 2:46pm

Thanks, it much clear now !

acgtyrant · October 28, 2018, 12:05pm

What version does the new style appear?

albanD · October 29, 2018, 2:02pm

If I remember correctly, it was for 0.2.0 to support higher order derivatives.

cosmozhang1988 · January 27, 2019, 5:58pm

Hi Dear AlbanD, @albanD

It seems the staticmethod is faster than the old sttyle function (ctx comparing with self).
But what if I need to output some auxiliary output in the forward pass, which does not need to receive its gradient?
In the old style, what I do is to create another instance method which is separate from the forward pass.
A more concrete example:

class F(torch.autograd.Function):
    def __init__(self):
        super(F).__init__()

    def some_func(self, x2):
         //self.x is also needed here
         return z2

    def forward(self, x):
         self.x = x
         z1 = self.some_func(self.x)
         return y
                 
    def backward(self, y.grad):
         return x.grad

y = F()(x)
z2 = F()(x2)
y.backward()

Is there a way that this old styled can be re-rewriten into the new style (using ‘ctx’)?
Thanks a lot!

albanD · January 28, 2019, 10:21am

Hi,

This is not what Function are made for. All arguments must be given as input to the forward function.
To get this behavior, you can simply create an nn.Module which can do whatever you want.

cosmozhang1988 · January 28, 2019, 5:58pm

Yeah, my current solution is using nn.module, but comparing with ctx, it is much more slower…The reason I use torch.autograd.Function is because I need to write my own derivative.

cosmozhang1988 · January 28, 2019, 6:21pm

@albanD
Maybe my description was not very precise. Taking torch.autograd.Function as an example, what I am seeking to do is:

class F(torch.autograd.Function):

   @staticmethod
    def some_func(z1, x2):
         //x2 is also needed here
         return z2

   @staticmethod
    def forward(ctx, x, x2):
         z1 = F.some_func3(x)
         y = z1**2
         z2 = F.some_func(z1, x)

         u = F.some_func2(y) \\derivative wrt x
         ctx.save_for_backward(u)
         return y, z2
                 
    @staticmethod
    def backward(ctx, y.grad, dummygrad):

         u, = ctx.saved_tensors
         return u, None

In this case, I do not need x2’s derivative, but I need to output z2 in the forward pass to save some computation cost in the intermediate values.

cosmozhang1988 · January 28, 2019, 8:44pm

Well, my way of writing actually seems working.

s_n · June 2, 2020, 10:41pm

Hi @albanD

Inside this forward function can I use register buffers?
I can’t use self.register_buffer for sure. Also ctx.register_buffer doesn’t work.
What is correct way to define register buffer in forward?

Thanks!

albanD · June 2, 2020, 11:02pm

Hi,

.register_buffer() is a function on nn.Modules, not autograd.Function.
These are two completely different constructs. And you won’t need any buffer for custom Function as it is not a class instance, you should just create the Tensor you need during the forward.

s_n · June 3, 2020, 1:10am

Hi @albanD ,
If I will just create a tensor how will that tensor be moved to GPU?
I don’t want to use .to(device) because that won’t work with DataParallel.

Are you saying that all the tensors defined under autograd.Function will be automatically taken to device when I will do model.cuda() or torch.nn.DataParallel(model)?

Example:
class Myclass(Function):
@staticmethod
def forward(ctx,input):
mytensor = torch.tensor([1])

So a model defined using above class (Myclass) when taken to device (by model.cuda() or torch.nn.DataParallel(model)) will take all the tensors in forward to the device?

s_n · June 3, 2020, 1:18am

I think I might have to create a tensor using register_buffer outside this forward function and then pass it to forward so that my tensors can be used with torch.nn.DataParallel and can be taken to device.