Hello,
I am doing a regression project where I have to predict 512 values of a vector from 3D images. The image dimension is 86,110,78.
Now I am confused about using the Maxpool3D layer or not. If I use maxpool then I have to pad the images to shape 88,112,80… since there are 3 pooling layers the final shape of the featured image will be 11,14,10.
And then using a fully connected layer(with tanh activation) I will shrink the output to 512.
are there better ways to do this without pooling, I know about strides…I am not familiar with how to do that.
Keeping in mind the residual connection/identity mapping.
Even if I zero-pad the image it will change the zero-mean unit variance normalization of the spatial values.