hi,
i’m trying to build a network which gets an image as input
and outputs different angle depth map views for the input.
like here: SketchModeling
so far i have:
o Encoder (Down half of an Unet)
o Decoder (Up half of an Unet)
so the encoder takes the image and for each view there is a decoder, which gets the encoder output, skip_connections as input.
out, skip = encoder(image)
view1_out = view1_decoder(out,skip)
view2_out = view2_decoder(out,skip)
my questions now is, if i have multiple decoder outputs, do i use them as channel or as extra conv2d?
my current outputs are:
o binary mask (1 channel, 0 or 1, float)
o depth image (1 channel, 0 - 1, float)
o depth exr-format (1 channel, 0 - 10, float)
o normal map (3 channels, 0 - 1, float)
i could join them together as channels:
decoder_out = 64
final_out = 6 (all out channels joined)
final_conv = Conv2d(decoder_out, final_out , kernel_size=1)
or have an extra conv2d for each one of them:
final_convs = ModuleList()
final_convs.append(Conv2d(decoder_out, 1, kernel_size=1))
…
final_convs.append(Conv2d(decoder_out, 3, kernel_size=1))
which approach would be better for training and grads?
greetings and happy holydays
nolan