this is strictly about inference rather than training
let’s say i want to produce a stack of 50 filtered images from one 1 channel input image. is it faster to convolve with a stack of 50 filters or to convolve with (e.g.) 10 stacks of 5 filters and then
right now i’m performing the former - building a convolutional unit with 50 filters and then convolving with the image. i profiled and i’m surprised by how slow it is: ~3s for one inference pass. does that seem right?
just in case anyone XYs me: i have a very specific reason for doing this and I am in fact needing 50 filtered images.
Edit: I should probably mention that the kernels are pretty wide as well ~50x50 - ~100x100