I’m currently working on a VAE that does regression with semi-supervised learning. So there is labeled and unlabeled data, and the goal is to achieve acceptable labels for the unlabeled data.
The data (images) is being normalized to the [0,1] range and Adam is used as optimizer.
The architecture is:
- Encoder with convolutions and ReLUs
- Three fully connected layers with the bottleneck in the middle
3a) Decoder with ConvTranspose, BatchNorms, ReLUs and Sigmoid
3b) Regressor with one linear layer, dropout and Softplus
My questions are:
- Does it make sense to have 3b separated from the decoder and to feed the last fully connected layer (i.e. the one behind the bottleneck, not the bottleneck itself) to it? I also tried to put the result in an additional image layer of the decoder, taking the average of that layer as the prediction result.
- Does it make sense to just add the MSE loss of the regressor to the BCE+KLD loss of the VAE and scale it by some amount, or should in this case MSE also be used for the decoder loss?
- Are there any suggestions or tips to this architecture? At the moment it looks strange to me with the MSE loss becoming relatively small compared to the BCE loss, which is not really going down, since the data isn’t binary, but spread over the whole greyscale.
Thanks in advance.