sbaio
(Sbaio)
April 6, 2020, 1:16pm
1
Hi,
I have a situation that looks like this:
model = resnet50()
classifier = nn.Linear(1000, 20)
x = model(input)
out1 = classifier(x)
out2 = classifier(x.detach()) # Computation to avoid
where the classifier is expensive and would like to avoid recomputing out2_d = classifier(x.detach())
Can we use out1 to backward with x detached ?
Thank you
albanD
(Alban D)
April 6, 2020, 3:28pm
2
Hi,
You mean that you want to get the gradients for the elements in the classifier but not the ones in model?
If s, you can use autograd.grad
to specify what you want the gradients for: grads = autograd.grad(loss, classifier.parameters())
.
1 Like
sbaio
(Sbaio)
April 6, 2020, 4:01pm
3
Thank you for the quick response, actually my example was simplifying my case too much. Let me provide a better example.
I’m not trying to compute the gradient of the loss w.r.t to my classifier parameters only.
x1, x2 = model(x)
out = heavy_module(x1, x2)
loss = Loss1(out)
loss.backward()
out_d = heavy_module(x1.detach(), x2)
loss_d = Loss2(out_d)
loss_d.backward()
Can we avoid the second heavy_module computation ? while having access to both out and out_d for different losses.
Thank you
albanD
(Alban D)
April 6, 2020, 4:04pm
4
You won’t be able to have two Tensors if you do a single forward
Why can’t you reuse out
to compute loss_d
?
sbaio
(Sbaio)
April 6, 2020, 4:15pm
5
If I use out
to compute loss_d
, then x1 won’t be detached and Loss2 will have gradients at the parameters that produced x1 (which I want to avoid).
albanD
(Alban D)
April 6, 2020, 5:58pm
6
Hi,
You would need two completely different graphs here I’m afraid. So you won’t be able to do that without re-doing the forward.
1 Like