Currently, I am working on an architecture to enhance semi-supervised problems by utilizing two models simultaneously. These models are the UNet 3D and the Swin-Unet 3D, which is a relatively new model. When training this architecture, I have discovered that using the UNet 3D with any other model works seamlessly. However, when I use the UNet 3D with the Swin-Unet 3D, it only produces the background class from the first iteration, resulting in a zero score during validation testing. As the Swin 3D model is transformer-based, my assumption is that I require pre-training, but no pre-trained models are available yet due to its novelty. Is there any way to improve the prediction quality?
Thank you for your help.