UNET multiple outputs: channels vs separate layers


i’m trying to build a network which gets an image as input
and outputs different angle depth map views for the input.

like here: SketchModeling

so far i have:
o Encoder (Down half of an Unet)
o Decoder (Up half of an Unet)

so the encoder takes the image and for each view there is a decoder, which gets the encoder output, skip_connections as input.

out, skip = encoder(image)
view1_out = view1_decoder(out,skip)
view2_out = view2_decoder(out,skip)

my questions now is, if i have multiple decoder outputs, do i use them as channel or as extra conv2d?

my current outputs are:

o binary mask (1 channel, 0 or 1, float)
o depth image (1 channel, 0 - 1, float)
o depth exr-format (1 channel, 0 - 10, float)
o normal map (3 channels, 0 - 1, float)

i could join them together as channels:

decoder_out = 64
final_out = 6 (all out channels joined)

final_conv = Conv2d(decoder_out, final_out , kernel_size=1)

or have an extra conv2d for each one of them:

final_convs = ModuleList()
final_convs.append(Conv2d(decoder_out, 1, kernel_size=1))

final_convs.append(Conv2d(decoder_out, 3, kernel_size=1))

which approach would be better for training and grads?

greetings and happy holydays

Splitting your Conv2d on the final output layer for different output channels would just be semantics and not change the math(however, it would slow down the forward pass).

Let me prove it:

import torch
import torch.nn.functional as F



#making image and copy
x = torch.rand(batch_size,in_channels,4,4)
y = x.clone()

#instantiate the weights
weights = torch.rand(out_channels, in_channels, kernel,kernel)

#initial path, 1 conv2d to rule them all
output=F.conv2d(x, weights, padding=1)

#split path
output1=F.conv2d(x, weights[:1,:,:,:], padding=1)
output2=F.conv2d(y, weights[-1:,:,:,:], padding=1)
output1_2=torch.cat([output1,output2], dim=1)

#confirm they are identical
print(torch.allclose(output, output1_2))


thanks for the reply. thats what i throught, but was not sure.

in the project mentioned above, they calculate the loss and check if the result against a discriminator.

so far i calculate the loss like this:

target and output = (B,6,H,W)
gen_criterion = L1Loss()
disc_criterion = BCELoss()
alpha = 1.0
beta = 0.01

the discriminator currently implented takes the stack ouf view outputs and returns
range 0-1

train disc

output = generator(input).detach()

out_real = discriminator(target)
out_fake = discriminator(output)

loss_real = disc_criterion(out_real, torch.ones((B)))
loss_fake = disc_criterion(out_fake, torch.zeros((B)))
disc_loss = loss_real + loss_fake


train gen

output = generator(input)
out_fake = discriminator(output)
loss_gen = gen_criterion(output ,target)
loss_disc = disc_criterion(out_fake, torch.ones((B)))

total_loss = loss_gen * alpha + loss_disc * beta


typical GAN setup, but i dont think thats the right way.
in the other project they somehow subtract the inverted negative loss of the discriminator
from the loss of the generator to reward it for ‘REAL’ outputs.

so if it is a total fake use the full generator loss, but if it seems real decrement the loss.

total fake = 0
seems real = 1

total_loss = loss_gen - loss_disc

they used tensorflow, but how to implement it in pytorch?