Hi all. I have 3 different convolution blocks each with channel number 64. I would like to make an element wise summation with trainable weights for each of the convolution blocks, i.e.
let conv_1 , conv_2 and conv_3 be the convolution blocks.
conv_final = lambda_1 * conv_1 + lambda_2* conv_2 + lambda_3* conv_3 (+ here means element wise summation)
I want to train these lambda’s while i train my whole CNN using the loss function. Can somebody please help me in implementing this?
Such things are hard to optimize. Consider scalar equation
x1*w1+x2*w2=y, even restricting weights to a convex combination (0<w1<1, w2=1-w1), you could fit (x1,x2) for ANY w1. And this problem remains when x1,x2 are vectors in training; without some additional constraint, gradient descent will find w1 that is good for initial x1,x2 vectors and may stick to it.
Ensembles solve this with multiple losses, for x1=y, x2=y, x1w1+x2w2=y, i.e. parts must predict y independently, and ensemble is usually trained later, with contributing models possibly frozen.
That being said, doing weighted sum itself is pretty straightforward. Assuming global per-channel combination weights, create a parameter:
self.raw_lambda = nn.Parameter(torch.zeros(NumParts,NumChannels,1,1)) #for conv2d
In forward(), transform it (as you never want any lambda to become zero)
lambda = torch.softmax(self.raw_lambda,0)
Now, just use arithmetic ops and slices, e.g.:
conv_final = lambda * conv_1 + lambda * conv_2