Is nn.utils.vector_to_parameters() not differentiable?

Konstantin_H · June 21, 2023, 1:19pm

Hi guys,

I have some difficulties with backpropagation and the function nn.utils.vector_to_parameters() .

Say I have a parameter mu, perform some operations on it and finally want to insert it as weights and biases into a network (net).
Then I pass a batch of data x through the network, compute a loss and backpropagate the error back to mu (the original, flattened parameter).

Here is a minimal code example:

import torch
import torch.nn as nn
import torch.optim as optim

x = torch.ones((1, 8)) # input example

net = nn.Sequential(nn.Linear(8, 8))
size=sum(p.numel() for p in net.parameters() if p.requires_grad)
mu = nn.Parameter(torch.ones((size,))*0.05, requires_grad=True)
# perform some operations on mu

nn.utils.vector_to_parameters(mu, net.parameters())

y = net(x)
optimizer = optim.SGD([mu], lr=1e-3)
loss = y.sum()
loss.backward()
optimizer.step()

print(mu)
print(mu.grad)

After the update step, mu remains the same as before, i.e. it is not getting optimized. Also, mu.grad returns None.
Is the function nn.utils.vector_to_parameters(mu, net.parameters()) stopping the gradient flow, and if so, is there an alternative way to insert mu as “external”, flattened parameter into the weights of the network?

Thank you all!

ptrblck · June 22, 2023, 5:30am

This method uses the .data attribute internally:

param.data = vec[pointer:pointer + num_param].view_as(param).data

and thus Autograd won’t capture this operation.
You could check if e.g. torch.nn.utils.parametrize might work as described in this post.

Konstantin_H · June 22, 2023, 9:03am

Thanks, this solved the problem!

I also found another solution (with maybe a little less overhead) here Hypernetwork implementation - #5 by ID56 .
Inserting the weights and biases manually via torch.nn.functional.linear also works!