I’m building a model that does the following: (i) breaks an image up into small square windows, (ii) applies a network to each window, transforming each into an output window of the same size, and (iii) reassembles the output windows into an output image of the same size as the input image. I’m wondering if there’s an efficient way to do the last step without loops.
The stack of windows output by the network has size (batch * num_windows x num_channels x window_side x window_side) and the output image should have size (batch x num_channels x image_side x image_side).
The naive approach of using just a singe .view() doesn’t work here, but I imagine that there is a combination of transposes and views that will do the job. Any tips?