Weighted Element Wise Summation

ashfaque · June 13, 2020, 11:34am

Hi all. I have 3 different convolution blocks each with channel number 64. I would like to make an element wise summation with trainable weights for each of the convolution blocks, i.e.

let conv_1 , conv_2 and conv_3 be the convolution blocks.

conv_final = lambda_1 * conv_1 + lambda_2* conv_2 + lambda_3* conv_3 (+ here means element wise summation)

I want to train these lambda’s while i train my whole CNN using the loss function. Can somebody please help me in implementing this?

googlebot · June 13, 2020, 5:44pm

Such things are hard to optimize. Consider scalar equation x1*w1+x2*w2=y, even restricting weights to a convex combination (0<w1<1, w2=1-w1), you could fit (x1,x2) for ANY w1. And this problem remains when x1,x2 are vectors in training; without some additional constraint, gradient descent will find w1 that is good for initial x1,x2 vectors and may stick to it.

Ensembles solve this with multiple losses, for x1=y, x2=y, x1w1+x2w2=y, i.e. parts must predict y independently, and ensemble is usually trained later, with contributing models possibly frozen.

That being said, doing weighted sum itself is pretty straightforward. Assuming global per-channel combination weights, create a parameter:

self.raw_lambda = nn.Parameter(torch.zeros(NumParts,NumChannels,1,1)) #for conv2d

In forward(), transform it (as you never want any lambda to become zero)

lambda = torch.softmax(self.raw_lambda,0)

Now, just use arithmetic ops and slices, e.g.:

conv_final = lambda[0] * conv_1 + lambda[1] * conv_2