Vgg16 imagenet weights in pytorch is not same as Vgg16 in keras

I am trying to convert pytorch model to keras. I am using vgg16 pretrained model and 2 dense layers on top of it. I noticed very big gap between the pytorch and keras resuls, so while debugging I found that vgg16 pretrained model gives very different results in pytorch and keras (with the same input image).

here is my code:

Pytorch code

vgg16 = models.vgg16(pretrained=True)
vgg16.eval()
for param in vgg16.parameters():
    param.requires_grad = False

from torchsummary import summary
summary(vgg16, (3, 224, 224))
----------------------------------------------------------------
        Layer (type)               Output Shape         Param #
================================================================
            Conv2d-1         [-1, 64, 224, 224]           1,792
              ReLU-2         [-1, 64, 224, 224]               0
            Conv2d-3         [-1, 64, 224, 224]          36,928
              ReLU-4         [-1, 64, 224, 224]               0
         MaxPool2d-5         [-1, 64, 112, 112]               0
            Conv2d-6        [-1, 128, 112, 112]          73,856
              ReLU-7        [-1, 128, 112, 112]               0
            Conv2d-8        [-1, 128, 112, 112]         147,584
              ReLU-9        [-1, 128, 112, 112]               0
        MaxPool2d-10          [-1, 128, 56, 56]               0
           Conv2d-11          [-1, 256, 56, 56]         295,168
             ReLU-12          [-1, 256, 56, 56]               0
           Conv2d-13          [-1, 256, 56, 56]         590,080
             ReLU-14          [-1, 256, 56, 56]               0
           Conv2d-15          [-1, 256, 56, 56]         590,080
             ReLU-16          [-1, 256, 56, 56]               0
        MaxPool2d-17          [-1, 256, 28, 28]               0
           Conv2d-18          [-1, 512, 28, 28]       1,180,160
             ReLU-19          [-1, 512, 28, 28]               0
           Conv2d-20          [-1, 512, 28, 28]       2,359,808
             ReLU-21          [-1, 512, 28, 28]               0
           Conv2d-22          [-1, 512, 28, 28]       2,359,808
             ReLU-23          [-1, 512, 28, 28]               0
        MaxPool2d-24          [-1, 512, 14, 14]               0
           Conv2d-25          [-1, 512, 14, 14]       2,359,808
             ReLU-26          [-1, 512, 14, 14]               0
           Conv2d-27          [-1, 512, 14, 14]       2,359,808
             ReLU-28          [-1, 512, 14, 14]               0
           Conv2d-29          [-1, 512, 14, 14]       2,359,808
             ReLU-30          [-1, 512, 14, 14]               0
        MaxPool2d-31            [-1, 512, 7, 7]               0
AdaptiveAvgPool2d-32            [-1, 512, 7, 7]               0
           Linear-33                 [-1, 4096]     102,764,544
             ReLU-34                 [-1, 4096]               0
          Dropout-35                 [-1, 4096]               0
           Linear-36                 [-1, 4096]      16,781,312
             ReLU-37                 [-1, 4096]               0
          Dropout-38                 [-1, 4096]               0
           Linear-39                 [-1, 1000]       4,097,000
================================================================
Total params: 138,357,544
Trainable params: 0
Non-trainable params: 138,357,544
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 218.78
Params size (MB): 527.79
Estimated Total Size (MB): 747.15
----------------------------------------------------------------

img = Image.open(img_path)
img = img.convert("RGB")

transform = transforms.Compose([transforms.Resize((224,224)),        
                                transforms.ToTensor()                 
                                ])

img = transform(img)
img = torch.tensor(img).unsqueeze(0)     # img shape (1,3,224,224)

torch_out = vgg16(img)

Vgg Keras

vgg_keras = VGG16(weights='imagenet', include_top=True, input_shape=(224,224,3))
vgg_keras.trainable = False
vgg_keras.summary()

Model: "vgg16"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 0
Non-trainable params: 138,357,544
_________________________________________________________________
img = Image.open(img_path)
img = img.convert("RGB")

img = img.resize((224,224), resample=PIL.Image.BILINEAR)
img = np.asarray(img) / 255.0
img = np.expand_dims(img, 0)      # img shape (1,224,224,3)

keras_out = np.array(vgg_keras(img))

I can’t print torch_out and keras_out because they are very big vectors but actually the have very different values.

so I don’t know how this happens from the same image with the same preprocessing!
how can I make pytorch and keras vgg16 model have the same weights or just copy the weights of the pytorch vgg16 to the keras vgg16 model?

please if anyone can help it would be appreciated.

torchvision models are trained using this script, which normalizes the tensors via:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406],
                                 std=[0.229, 0.224, 0.225])

while this is not the case in your script.

Thanks @ptrblck for your reply. I added this normalization transform but nothing changes. also the vgg16 from pytorch and keras returns very different values from the same image.

I want to use pytorch weights into the keras model. How can I share the weights between the two models?

Depending on the input, the different output values might be expected.
Both models were trained using different frameworks and most likely also (slightly) different hyperparameters, so that the parameters differ.
The reported validation accuracy should nevertheless be reproducible using the ImageNet dataset.

There might be tools to transform the weights between TF and PyTorch, but you could also use a manual approach as given in this post.

Thanks @ptrblck. I confused a little bit and want your advice. maybe I have different use case.

I trained a model using pytorch and it gave me very good results but I need the model in production to be tensorflow. so, I tried to convert the model using pytorch2keras repo but the resulted model gave different results. so I tried to build my keras model from scratch. it’s a simple model, just vgg16 pretrained model and two dense layers. after building it also I got different results from pytorch model and the model is overfitting on the same data. and while debugging I found that the vgg16 weights is different from pytorch and keras. so, I am trying to share the weights between the two models.

this is all what I did. Do you have any advice about how to solve this problem?

If you are using constant inputs to compare both models, the difference might be large, since the models contain different parameters and might even use different preprocessing steps.

I guess that either the preprocessing steps are different or the weight transfer didn’t work properly.

I don’t quite understand this. Is the Keras model