Copy slices of tensor values from previous version of network to current version

picklerick · September 1, 2020, 5:43pm

Hi,

So I’m trying to make a neural network that copies over trainable weights from an ‘ancestor’ network that is almost similar. This is how I currently do it:

list(self.parameters())[layer_number * 2][:-1, :] = params[layer_number * 2]

Here ‘params’ is a list of the trainable parameters of the ancestor network.
params = list(self.parameters())

My issue is that when I print out the tensors of the NEW network, I see grad_fn=<CopySlices> in the tensors that were copied over.
[Parameter containing: tensor([[-0.0237, 0.1563, 0.0842, -0.3308, -0.3521], [ 0.1558, -0.0448, -0.4337, -0.3235, -0.3658], [-0.1654, 0.1849, 0.1011, 0.2428, 0.3796]], grad_fn=<CopySlices>), Parameter containing: tensor([-0.4441, -0.3973, -0.4269], grad_fn=<CopySlices>), Parameter containing: ...

The parts of the new network that have completely new values (no copied over values at all) have ‘requires_grad=True’.
Parameter containing: tensor([ 0.1209, 0.0306, 0.5345, -0.3023], requires_grad=True), Parameter containing: tensor([[ 0.0771, -0.1399, -0.1571, 0.2052], [-0.1313, 0.0706, -0.0783, -0.3379], [-0.1491, 0.3440, 0.4240, 0.0006], [-0.1941, -0.0320, -0.3423, -0.4040], [-0.1626, 0.3006, -0.4289, -0.1277]], requires_grad=True), Parameter containing: tensor([ 0.1358, -0.0119, 0.4847, 0.1065, -0.2739], requires_grad=True) ...

Does this mean that the values that were copied over to the new network are untrainable? What can I do to make the copied over values also show ‘requires_grad=True’ (and not grad_fn=<CopySlices>)? And I want to get rid of the ancestor weights entirely from the computation graph. All I want is to simply transfer over the ancestor values and nothing else. Please help.

ptrblck · September 5, 2020, 6:11am

Try to wrap the parameter assignment into with torch.no_grad() and check the new parameters again.