I am trying to use Hypernetwork to fine tune a pretrained model on a new set of speaker or data points.

The pretrained model consists of Conv 2d and TCN. Using Hypernetworks I am trying to predict the weights for the conv 2ds. The pretrained model is frozen and we are only updating the weights. The problem is that loss is not backpropagating through the parameters of the Primary Network. Primary Network is the Network that has parameters involving random embeddings supplied to the Hypernetwork as well as the weights and biases and other parameters of the two layer HyperNetwork.

I suspect the loss is not backpropagating due to the assign_weights function that we are using to modify the weights of the Original Model. Since our model is built with nn.conv 2d I find no other way to suuply the predicted weights to the parameters of the pretrained model.

The code having primary network and embeddings -

```
import torch
import torch.nn as nn
import torch.nn.functional as F
from torch.nn.parameter import Parameter
from hypernet.hypernetwork_modules import HyperNetwork
from lipreading.utils import load_model, CheckpointSaver
from kd.model import lrwmodel
import torch.nn as nn
import torch
class Embedding(nn.Module):
def __init__(self, z_num, z_dim):
super(Embedding, self).__init__()
self.z_list = nn.ParameterList()
self.z_num = z_num # kernel size
self.z_dim = z_dim # dimension of hypernetwork layer... nothing to do with primary network
h,k = self.z_num # kernel size is stored inside h and k where h is heigt and k is width(???)
# Initializing embedding for each index in a filter. If filter is 1x1 embedding is 1 x 64, if it is 4 x 4, embedding is 16 x 64
for i in range(h):
for j in range(k):
self.z_list.append(Parameter(torch.fmod(torch.randn(self.z_dim).cuda(), 2)))
def forward(self, hyper_net):
ww = []
_in, _out = self.z_num
c=0
for i in range(_out):
w = []
for j in range(_in):
h_out = hyper_net(self.z_list[c])
c+=1
w.append(h_out)
temp_w = torch.cat(w, dim=1)
ww.append(temp_w)
temp_ww = torch.cat(ww, dim=0)
return temp_ww
class PrimaryNetwork(nn.Module):
def __init__(self, z_dim=64,residual_hypernet=False,load_checkpoint_path=None,alpha=0.3):
super(PrimaryNetwork, self).__init__()
device='cuda'
self.alpha=alpha
self.residual_hypernet=residual_hypernet
self.z_dim = z_dim
self.hope = HyperNetwork(z_dim=self.z_dim)
self.hope=self.hope.to(device)
#represents the in and out channels of the network. Multiples of (64,64) since thats the model size.
self.zs_size = [[1, 1],[1,2],[2,2],[2,4],[4,4],[4,8]]
self.filter_size = [[64,64],[64,128],[128,128],[128,256],[256,256],[256,512]]
# List that contains the embeddings to be given to the hypernetwork. We will preferably have to save this list somewhere.
self.zs = nn.ModuleList()
for i in range(len(self.zs_size)):
self.zs.append(Embedding(self.zs_size[i], self.z_dim))
self.model=lrwmodel()
self.model=self.model.to(device)
if(load_checkpoint_path):
self.model = load_model(load_checkpoint_path, self.model, allow_size_mismatch=False)
print("Loaded checkpoint and model from -- ",load_checkpoint_path)
#self.final = nn.Linear(500,500)
#print("Residual Hypernet is ",residual_hypernet)
#self.model.eval()
# for param in self.model.parameters():
# param.requires_grad = False
self.weights_list=[]
def assign_weights(self,weights_list):
#print("length of weight ----",len(weights_list))
layer_name = 'mini_resnet'
i=0
for name, param in self.model.named_parameters():
if layer_name in name and 'weight' in name and len(list(param.shape))==4 and list(param.shape)[1] != 1 :
if self.residual_hypernet == True:
param.data= self.alpha*weights_list[i]+ param.data
else:
param.data=weights_list[i]
i+=1
## PLAN FOR FORWARD -
## Get the embedding. Send the embeddings to the Hypernetwork. Let it return the weights. Keep the weights to a list and send it alltogeer in the forward function.
def forward(self, x,lengths):
self.weights_list=[]
for i in range(6):
w1 = self.zs[i](self.hope)
self.weights_list.append(w1)
self.assign_weights(self.weights_list)
with torch.no_grad():
_,x=self.model(x,lengths)
return x
```

There might be problems with the way we are assigning weights at each forward call due to which the loss is not backpropagating through the Primary Net parameters.

I want help to understand if there are wayouts like gradient copy or smarter weight updation that will keep the loss intact.

Most of the people using Hypernetworks do so in training but we want to do it in the pretrained stage. The simplest way is to assign weights through torch.functional.conv2d but that will make us to change the entire architecture as well as difficulty in loading the pretrained model.

Looking forward to the help.

Thanks.