Currently, I’m studying different approximation schemes in NN propagations.
Suppose I have input feature maps like 100 X 3 X 28 X 28 and kernels like 32 X 3 X 3 X3.
Because of approximate computing, I’d like to adapt the IFMs for different OFMs.
This means I have 100 X 32 X 3 X 28 X 28 inputs, and I need to conv2 the inputs with 32 X 3 X 3 X 3 one by one correspondingly. Is there a way to parallelize the process? Right now I use a loop implementation which is too slow.
Thank you in advanced!