I would like to keep an encoder and decoder as seperate classes, bur i would also like to use maxpool & maxunpool.
When the network trains and I try and explore the latent domain by placing random vectors into the trained decoder section I get an error (no indices for the maxunpool). Removing these layers means that the network does not perform as well.
I have tried to replace with avgpool2d but this doesn’t improve the situation, so the question is there an alternative to Maxpool/unpool or is there a way to use these without passing the indices=true switch?
Can i save the indices from the training model, so that i can reload them in the trained model?
About alternatives for pooling, you can do it using stride in Conv layers or using different poolings but I do not think that is good idea to skip a method just because of syntax error!
About the MaxPool and MaxUnpool, something I should mention is that you have to provide the indices to MaxUnpool. When you are using MaxPool in your encoder, it returns a tuple (values, indices) and you have to save those and pass them to your decoder.
I think this post explains the situation very well.
About point 1, there is no way as the definition of maxunpool needs the indices.
About second point, you can get the indices. You need to define a class on top of the pretrained model and change the line in the forward function to return indices too. It won’t hurt the weights as pooling layers do not have any parameters. (you already have encoder so it is possible.)
Concerning point 3, ConvTranspose2d may be able to help you but still it needs training and may not perform well.
As you have access to encoder and maxpooling layers, the easiest way still is to use indices.
You cannot use the maxpool2d & unpool2d in a VAE or CVAE if you want to explore the latent space ‘z’ in the decoder module independetly of the encoder, becayuse there is no way of generating the indices tensors independently for each input into the decoder module.
This is because the indices tensors are different for each input into the encoder module.
Can you please share the resources that lead you to the idea of using pooling/unpooling in autoencoders?
I have read few renown papers in this area and they use conv2d and transposedConv2d for the exact same reason we are discussing. As you mentioned we may need to treat decoder independently from encoder so the only reasonable choice is TransposedConv as it has been used in Unet too.
Yes, there may be some other problems with your defined network. Because in almost all of the VAE papers, Conv2d and TransposedConv2d has been used for changing channel size and dimension at the same time and it will work fine if training procedure goes well.
The reason is that maxpooling is a simple operation and a Conv2d matrix can learn weights to even put more attention on some particular features. I think UNet is the big example that works very well in different tasks.