I created a simple autograd function, let’s call it F (based on torch.autograd.Function).
What’s the difference between calling
a = F.apply(args)
and instantiating, then calling, like this :
f = F()
a = f(args)
The two versions seem to be used in pytorch code, and in examples
This difference is that instantiating + calling the
Function works with “old style” functions (which are going to be deprecated in the future).
.apply is for the “new style” functions. You can differentiate the two easily: new style functions are defined with only
@staticmethod, while old style ones have an
Ok, I see, thanks. Since I want to have a scalar parameter for my function, if I want to use the new style, I need to pass it as an argument to the forward then, and save it for backwards ?
I did this :
def __init__(self, gamma=0.1):
self.gamma = gamma
def forward(self, args):
def backward(self, args):
What you wrote is an old style function.
The equivalent new style would be:
def forward(ctx, args, gamma):
ctx.gamma = gamma
def backward(ctx, args):
# Using your old style Function from your code sample:
# Using the new style Function:
Thank you very much !
And so everytime I call F.apply, a new “instance” will be created with its own context ?
Can I save intermediary results with ctx.intermediary = intermediary ?
If I call several times the function before doing a backward pass, will it override the gamma attribute each time, and the intermediary results I saved with ctx ?
Each call to
.apply will have a different context. So you can save everything you need in it without risk.
Note that if you need to save input or outputs, you should use the
Ok, why do we need to use
save_for_backward for this ? Is it just a convention ? Or does it performs additional checks ?
But intermediary tensors are fine to save, right ? I tried to save some with
save_for_backwards but it failed, so I saved them as attributes in self (ctx now)
save_for_backward is just for input and outputs, it allows to perform additional checks (make sure that you don’t create non-collectable cycles). For intermediary results, you can save them as attribute of the context yes.
Thanks, it much clear now !
What version does the new style appear?
If I remember correctly, it was for 0.2.0 to support higher order derivatives.
Hi Dear AlbanD, @albanD
It seems the staticmethod is faster than the old sttyle function (ctx comparing with self).
But what if I need to output some auxiliary output in the forward pass, which does not need to receive its gradient?
In the old style, what I do is to create another instance method which is separate from the forward pass.
A more concrete example:
def some_func(self, x2):
//self.x is also needed here
def forward(self, x):
self.x = x
z1 = self.some_func(self.x)
def backward(self, y.grad):
y = F()(x)
z2 = F()(x2)
Is there a way that this old styled can be re-rewriten into the new style (using ‘ctx’)?
Thanks a lot!
This is not what Function are made for. All arguments must be given as input to the forward function.
To get this behavior, you can simply create an
nn.Module which can do whatever you want.
Yeah, my current solution is using
nn.module, but comparing with
ctx, it is much more slower…The reason I use
torch.autograd.Function is because I need to write my own derivative.
Maybe my description was not very precise. Taking
torch.autograd.Function as an example, what I am seeking to do is:
def some_func(z1, x2):
//x2 is also needed here
def forward(ctx, x, x2):
z1 = F.some_func3(x)
y = z1**2
z2 = F.some_func(z1, x)
u = F.some_func2(y) \\derivative wrt x
return y, z2
def backward(ctx, y.grad, dummygrad):
u, = ctx.saved_tensors
return u, None
In this case, I do not need x2’s derivative, but I need to output z2 in the forward pass to save some computation cost in the intermediate values.
Well, my way of writing actually seems working.
Inside this forward function can I use register buffers?
I can’t use self.register_buffer for sure. Also ctx.register_buffer doesn’t work.
What is correct way to define register buffer in forward?
.register_buffer() is a function on nn.Modules, not autograd.Function.
These are two completely different constructs. And you won’t need any buffer for custom Function as it is not a class instance, you should just create the Tensor you need during the forward.
Hi @albanD ,
If I will just create a tensor how will that tensor be moved to GPU?
I don’t want to use .to(device) because that won’t work with DataParallel.
Are you saying that all the tensors defined under autograd.Function will be automatically taken to device when I will do model.cuda() or torch.nn.DataParallel(model)?
mytensor = torch.tensor()
So a model defined using above class (Myclass) when taken to device (by model.cuda() or torch.nn.DataParallel(model)) will take all the tensors in forward to the device?
I think I might have to create a tensor using register_buffer outside this forward function and then pass it to forward so that my tensors can be used with torch.nn.DataParallel and can be taken to device.