Is nn.utils.vector_to_parameters() not differentiable?

Hi guys,

I have some difficulties with backpropagation and the function nn.utils.vector_to_parameters() .

Say I have a parameter mu, perform some operations on it and finally want to insert it as weights and biases into a network (net).
Then I pass a batch of data x through the network, compute a loss and backpropagate the error back to mu (the original, flattened parameter).

Here is a minimal code example:

import torch
import torch.nn as nn
import torch.optim as optim

x = torch.ones((1, 8)) # input example

net = nn.Sequential(nn.Linear(8, 8))
size=sum(p.numel() for p in net.parameters() if p.requires_grad)
mu = nn.Parameter(torch.ones((size,))*0.05, requires_grad=True)
# perform some operations on mu

nn.utils.vector_to_parameters(mu, net.parameters())

y = net(x)
optimizer = optim.SGD([mu], lr=1e-3)
loss = y.sum()
loss.backward()
optimizer.step()

print(mu)
print(mu.grad)


After the update step, mu remains the same as before, i.e. it is not getting optimized. Also, mu.grad returns None.
Is the function nn.utils.vector_to_parameters(mu, net.parameters()) stopping the gradient flow, and if so, is there an alternative way to insert mu as “external”, flattened parameter into the weights of the network?

Thank you all!

This method uses the .data attribute internally:

param.data = vec[pointer:pointer + num_param].view_as(param).data

and thus Autograd won’t capture this operation.
You could check if e.g. torch.nn.utils.parametrize might work as described in this post.

2 Likes

Thanks, this solved the problem!

I also found another solution (with maybe a little less overhead) here Hypernetwork implementation - #5 by ID56 .
Inserting the weights and biases manually via torch.nn.functional.linear also works!