Linear projection in residual networks


If the dimensions of F(X) and X are different in H(X) = F(X) + X, X must be linearly projected. Is matrix W_s used at this time a parameter to be learned? or Is it a defined(or fixed) special matrix?

Linear projection is like the below
y = F(x, {Wi}) + W_s*x


The parameters (if applicable) in the shortcut path are trainable.

Actual implementation of shortcut path in ResNet is here:

1 Like

Thanks for your kind response!