Where f is a simple elementwise non-linearity such relu function, applied before summation.
I can easily do this via setting group=C_in and doing it one-by-one for each output channel. However, this gives memory error for large C_in. Is there a more efficient way to do this?
I need to apply the non-linearity on the result of each single convolution result (weight*image[i]) before summation. That is f(weight*image[i]). This isn’t the same as f(weights)*image[i], which you suggest.
Or, which is a bit more involved, making an own Conv2DSpec layer, by making a copy of the existing convolution at THTensorConv.cpp and then making a copy of each Conv2D call (naming it Conv2DSpec or so in the library).