This difference is that instantiating + calling the Function works with “old style” functions (which are going to be deprecated in the future).
Using .apply is for the “new style” functions. You can differentiate the two easily: new style functions are defined with only @staticmethod, while old style ones have an __init__.
Ok, I see, thanks. Since I want to have a scalar parameter for my function, if I want to use the new style, I need to pass it as an argument to the forward then, and save it for backwards ?
class F_new(torch.autograd.Function):
def forward(ctx, args, gamma):
ctx.gamma = gamma
def backward(ctx, args):
# Using your old style Function from your code sample:
# Using the new style Function:
F_new.apply(inp, gamma)
And so everytime I call F.apply, a new “instance” will be created with its own context ?
Can I save intermediary results with ctx.intermediary = intermediary ?
If I call several times the function before doing a backward pass, will it override the gamma attribute each time, and the intermediary results I saved with ctx ?
Each call to .apply will have a different context. So you can save everything you need in it without risk.
Note that if you need to save input or outputs, you should use the ctx.save_for_backward.
Ok, why do we need to use save_for_backward for this ? Is it just a convention ? Or does it performs additional checks ?
But intermediary tensors are fine to save, right ? I tried to save some with save_for_backwards but it failed, so I saved them as attributes in self (ctx now)
Yes save_for_backward is just for input and outputs, it allows to perform additional checks (make sure that you don’t create non-collectable cycles). For intermediary results, you can save them as attribute of the context yes.
It seems the staticmethod is faster than the old sttyle function (ctx comparing with self).
But what if I need to output some auxiliary output in the forward pass, which does not need to receive its gradient?
In the old style, what I do is to create another instance method which is separate from the forward pass.
A more concrete example:
class F(torch.autograd.Function):
def __init__(self):
def some_func(self, x2):
//self.x is also needed here
return z2
def forward(self, x):
self.x = x
z1 = self.some_func(self.x)
return y
def backward(self, y.grad):
return x.grad
y = F()(x)
z2 = F()(x2)
Is there a way that this old styled can be re-rewriten into the new style (using ‘ctx’)?
Thanks a lot!
This is not what Function are made for. All arguments must be given as input to the forward function.
To get this behavior, you can simply create an nn.Module which can do whatever you want.
Yeah, my current solution is using nn.module, but comparing with ctx, it is much more slower…The reason I use torch.autograd.Function is because I need to write my own derivative.
Maybe my description was not very precise. Taking torch.autograd.Function as an example, what I am seeking to do is:
class F(torch.autograd.Function):
def some_func(z1, x2):
//x2 is also needed here
return z2
def forward(ctx, x, x2):
z1 = F.some_func3(x)
y = z1**2
z2 = F.some_func(z1, x)
u = F.some_func2(y) \\derivative wrt x
return y, z2
def backward(ctx, y.grad, dummygrad):
u, = ctx.saved_tensors
return u, None
In this case, I do not need x2’s derivative, but I need to output z2 in the forward pass to save some computation cost in the intermediate values.
Inside this forward function can I use register buffers?
I can’t use self.register_buffer for sure. Also ctx.register_buffer doesn’t work.
What is correct way to define register buffer in forward?
.register_buffer() is a function on nn.Modules, not autograd.Function.
These are two completely different constructs. And you won’t need any buffer for custom Function as it is not a class instance, you should just create the Tensor you need during the forward.
Hi @albanD ,
If I will just create a tensor how will that tensor be moved to GPU?
I don’t want to use .to(device) because that won’t work with DataParallel.
Are you saying that all the tensors defined under autograd.Function will be automatically taken to device when I will do model.cuda() or torch.nn.DataParallel(model)?
class Myclass(Function): @staticmethod
def forward(ctx,input):
mytensor = torch.tensor([1])
So a model defined using above class (Myclass) when taken to device (by model.cuda() or torch.nn.DataParallel(model)) will take all the tensors in forward to the device?
I think I might have to create a tensor using register_buffer outside this forward function and then pass it to forward so that my tensors can be used with torch.nn.DataParallel and can be taken to device.