The image is represented with a stack of layers which are equidistant in depth (depth layers), and instead of feeding the whole image (every pixel) i want to feed R random pixels (from 0 to HxW). So I have an array of shape (N,D,C,R,1) where N is the batch, D is the number of layers, C channel and R is the number of random pixels. I want to combine the layers in a final image (N, 1, C, R, 1). What conv is best to use for this network?
Hi Pam!
Sampling random pixels seems like a bad idea because doing so
throws away the important two-dimensional structure that your image
has.
If for memory or computation-time reasons you need to reduce the
amount of data in your images, I would recommend down-sampling
in a way that preserves that two-dimensional structure. This might
be as easy as using pytorch’s AvgPool2d
or MaxPool2d
(across
the HxW dimensions).
If you do that, I would then recommend applying Conv3d
across the
depth and the down-sampled H and W dimensions.
To answer your original question, if you randomly sample pixels
independently from each depth layer, I don’t think any convolution
would make sense – there would be no “nearest-neighbor” spatial
structure left for the convolution to be applied to.
If instead you sample pixels from each depth layer using the same
randomly-chosen locations in HxW, you could conceivably use
Conv1d
, applied across the depth dimension (because you have
preserved the nearest-neighbor structure when you move from one
depth layer to the next while keeping the location in your R dimension
fixed).
Best.
K. Frank