Replicate feature from two feature map and do backprop using variable

Hi I want to replicate features coming from top two parallel layer and pass it linear at bottom and still use it as Variable for backward pass.

transition_matrix = np.eye(10,10)
transition_matrix[4,:] = 1
transition_matrix[5,:] = 1
transition_matrix[6,:] = 1
N = len(np.nonzero(transition_matrix))
#N and transition_matrix are predefined
Let’s say fm1 and fm2 are two feature maps of size (5,10,256) and (5,10,256) for batch size of 5
final_fm = torch.tensor(5,N,256*2)
count = 0
for i in range(transition_matrix .shape[0]):
for j in range(transition_matrix .shape[0]):
if transition_matrix[i,j]>0:
final_fm[:, count, :256] = fm1[:, i, :]
final_fm[:, count, 256:] = fm2[:, j, :]
count += 1

#N and transition_matrix are predefined
l_layer= nn.linear(N,20)
out = l_layer(final_fm) # genertare 5x20 output
loss = loss_function(out,gt)

I want to do backward pass

loss.backword()

Replace final_fm = torch.tensor(5,N,256*2)
with final_fm = Variable(torch.tensor(5,N,256*2)).
Then this should work.

Thanks for the reply.
I tried doing final_fm = Variable(torch.tensor(5,N,256*2), require_grad=True)
That gives me an error about ‘in place modification’. Apperently, you are not allowed to do that. If set require_grad=False then forward pass is fine but I am concerned about the gradients. Will this allow gradients to flow back properly at backword pass?

requires_grad should be False. Don’t worry about the gradient. I’ve tested this before.
If you don’t believe, you can register hook on fm1 and fm2, and print out the gradient.

Thanks alot. I will try and let you know, it is a part of very big network. I didn’t understand the part of registering hooks.
thanks

@ruotianluo, thanks a lot for the help. It works. The only problem I have now is that it is very slow. Do you have any idea how to speed it up?

It’s slow because you are doing this I think. Loops in python are slow.