Architecture inconsistency in Zoo models in different frameworks

Hi, I am comparing few zoo models implementation in DL4J with Pytorch zoo models and found
that the padding in Convolution layers does not match most of time ?

For Resnet50 and SqueezeNet :

In DL4J, they do not apply padding : [0, 0] ; while in PyTorch they have padding [1, 1].
This results in different output in layers

In DL4J, they apply Bias in Conv layers while in PyTorch, they do not.

Why such irregularities in Network structure across frameworks ?

I guess PyTorch might have used the original Caffe implementation as the base?
As you can see in the prototxt, the conv layers do not use the bias, as it would be cancelled by the following batchnorm layers.

1 Like

Alright ! But about the inconsistency in padding ?

I don’t know which reference DL4J used. Did you compare the Caffe model with the torchvision implementation and did you find any mismatches?

1 Like

See :
DL4J Resnet50: deeplearning4j/deeplearning4j/deeplearning4j-zoo/src/main/java/org/deeplearning4j/zoo/model/ResNet50.java at master · deeplearning4j/deeplearning4j · GitHub

Torch Resnet50 :
vision/torchvision/models/resnet.py at main · pytorch/vision · GitHub

I got the layers information and printed them on console. As you can see in the image

- the stride mismatch in second convolution operation (Torch uses 1 while **dl4j uses 2! ) . This results in decreasing the output volume rapidly in dl4j.

As I said, I’m not familiar with the DL4J implementation, so you would need to ask the authors in their discussion board of on GitHub etc.

If you think that the torchvision model doesn’t reflect the expected implementation from the paper, please let us know and we’ll look into it.

1 Like

Okay. Thanks !! :slight_smile: