I’m trying to replicate a network presented in a paper. Its a encoder decoder network similar to the U-Net.

All the information provided for each step are the input and output channels for each convolution layer, which is usually the case with networks presented in papers. Now this is fine and easily implemented but the problem always arises of a size mismatch when adding layers at the decoder stage like the feed forward operations in U-Net.

for example c1.shape = [1, 64, 38, 64, 64] and c2.shape = [1, 64, 32, 64, 64]

To get them to add (or use torch.cat) they have to be the same size and what I end up doing is keep changing parameters such as the stride, padding, kernel_size until I get the desired output size.

Is this the correct or recommended way of dealing with such a situation? I’m asking because I’m sure using a different kernel size and stride would give different outputs but without these parameters I cannot replicate the network and get the system to work.

The kernel size is often given in the papers. If that’s not the case, it would be quite hard to replicate the results, since as you already explained, there are often a lot of possible combinations to get the desired output shape.
Do the authors mention any reference implementation, e.g. using the same configuration as VGG16?

The authors only mention using two kernel size [1,3,3] and [3,3,3]. With the given parameters I was getting a size mismatch at some layers.

I found three implementations of this network on github as well: 1, 2, and 3. All of them are using different parameters at different stages.

Similarly I also tried to change parameters to get the network to work, at one layer the only way I could get the size to match was by changing the kernel from [3,3,3] to [9,3,3].

With this practice and all these different parameters for the same network on github is the reason I asked this question.

I already email the author of the paper 3 weeks back but have yest to receive a response. I may have to scrap this network and go with another.

My question was more about good practices or recommended workarounds when such a situation arises. If in future a size mismatch like this one occurs then what is the way to fix this that would the least deviation from the original presented work. I understand I cannot change kernel sizes if I am replicating any published network, but what about padding and stride?