Copying weight betweeen last conv layer and first dense

teja · April 7, 2020, 10:22am

I am trying to copy the weights matrix between last conv layer and first dense layer to a new architecture. In the original network, the output shape of the last conv layer was 256x6x6 and the number of nodes in the first dense layer were 4096. Now I have a new architecture in which the output shape of the last conv layer is 200x6x6 and number of nodes in the first dense layer are same i.e. 4096.

The shape of the original weight matrix is 4096, 9216 (4096 nodes in the dense layer and 256x6x6=9216) and the shape of the new matrix is 4096, 7200 (4096 nodes in the dense layer and 200x6x6=7200).

I tried to copy the weight like this, but I not able to figure it out whether I am copying the weights of those specific 200 filters.The code I tries is here

if isinstance(m0, nn.Linear): # m0 is the original network layer from module.modules()
              filterindex=[] # filled with indexes of the filters to be copied (200 out of 256)
              print(m0.weight.data.shape[0]) # gives 4096
              weights=torch.ones(4096, 7200) # create new tensor of size 4096x7200
              for filter_index in filterindex:
                for f in range(0, m0.weight.data.shape[0]): # iterate through all 4096 rows
                  weights[f][filter_index*36:36] = m0.weight.data[f][filter_index*36:36]
              print("data[0]",m0.weight.data[0][:36]) # print 36 values
              m1.weight.data = weights.clone().cuda() # set new weights to m1 (new network)

Basically the output of the original ntework last conv layer will be 256x6x6 after flattening it will be 9216. So, I want to copy only those weights of filter indexes which are present in filterindex list, 7200 values out of 9216. Each filter will have 6x6x1 values= 36.

Any suggestion on these, how can I achieve it.

ptrblck · April 8, 2020, 7:19am

The indexing looks wrong:

[filter_index*36:36]

This will copy everything from filter_index*36 to the fixed index 36.
Could you try to use [filter_index*36:(filter_index+1)*36] instead and compare the layers afterwards?

teja · April 9, 2020, 12:59pm

Thank you so much @ptrblck for point out the error in the code. Along with your suggestion, I also change the index number in the left hand side to make it work, plus change the line to

m1.weight.data[f][count*36:(count+1)*36] = m0.weight.data[f][filter_index*36:(filter_index+1)*36].clone()

I have one more question, as far as the structure of the weight matrix is considered, will it copy the actual weight what I am targeting (weights of remaining filters between last conv and dense layer)?

ptrblck · April 9, 2020, 8:00pm

I assume m0 contains the flattened conv filter weights?
If so, the function might work, but I would need to see the shapes to be sure.
Could you post a minimal code snippet to run your code?
I.e. a dummy conv layer you would like to copy to the linear layer?

teja · April 11, 2020, 6:59pm

I assume m0 contains the flattened conv filter weights? YES

Shapes are OK.

Right now I do not have a dummy working code, will post later
Again thanks @ptrblck for the valuable suggestions.