I was wondering, how does .clone() function interact with backpropagation?
For example I had to pieces of code and I don’t understand what their difference is, can someone explain me the difference (maybe so I understand why clone is even needed for backprop to work “properly”, whatever properly means):
E.g1:
import torch
from torch.autograd import Variable
x = Variable(torch.rand(2,1),requires_grad=False) #data set
w = Variable(torch.rand(2,2),requires_grad=True) #first layer
v = Variable(torch.rand(2,2),requires_grad=True) #last layer
w[:,1] = v[:,0]
y = torch.matmul(v, torch.matmul(w,x) )
vs
import torch
from torch.autograd import Variable
x = Variable(torch.rand(2,1),requires_grad=False) #data set
w = Variable(torch.rand(2,2),requires_grad=True) #first layer
v = Variable(torch.rand(2,2),requires_grad=True) #last layer
w[:,1] = v[:,0]
y = torch.matmul(v, torch.matmul(w.clone(),x) )
why do we need clone in the second one? What does “backprop work properly” mean?
RuntimeError: a leaf Variable that requires grad has been used in an in-place operation.
If you want to perform in-place operations (like w[:, 1] = v[:, 0]), you can clone the variable before the in-place operation. This makes it not be a leaf variable anymore.
Using in-place Variables is very tricky in many cases. A lot of frameworks doesn’t support them, so they just perform copies instead. PyTorch support in-place operations, but because other operations can require the content of the Variable to perform backpropagation, you can’t modify it inplace or else you will have wrong gradients.
If I get a copy of a Variable and extract a slice of the copy, then the gradient only back prop through the sliced dimension? what if this Variable is the model parameters?
Thank you for the help! One follow-up question: is there a way slice matrix in batch-wise?
For example:
output = model(input) , where output has a shape of 10 x 1 x 32 x 32 (batch size x channel x height x width)
Slicing operation will be different for each sample:
sample 1: output[0, :, :16, :16]
sample 2: output[0, :, :15, :15]
sample 3: output[0, :, :26, :26]…something like that.
Does PyTorch have a way to do the above operation efficiently? without using for loop.
I guess you can use advanced indexing only if the size of different slices matches. e.g.:
sample 1: output[0, :, 1:16, 1:16]
sample 2: output[0, :, :15, :15]
sample 3: output[0, :, 11:26, 11:26]