PyTorch vs TensorFlow Convolution

Hi,

I am trying to implement a single convolutional layer (taken as the first layer of SqueezeNet) in both PyTorch and TF to get the same result when I send in the same picture.

Below is my code:

from __future__ import print_function
import torch
import torch.nn as nn
import tensorflow as tf
import numpy as np
import pickle as pkl
from modified_squeezenet import SqueezeNet
from keras.models import Model
from keras.layers import Input, Convolution2D
from keras.engine.topology import get_source_inputs
from keras import backend as K

TF_CONV1_KERNEL = None
TF_CONV1_BIAS = None

def get_torch_image(image_number):
  '''
  Unpickle the specified image and return the tensor
  '''
  with open('imagenet_sample_0.pkl', 'rb') as f:
    data = pkl.load(f)
    # get the 3D array representation of the image
    X = data['X'][image_number]
    # Get integer representation of the label
    y = data['y'][image_number]
    # Print the word label
    print('Current image:', data['labels'][y])
    # construct mean image
    mean_image = np.zeros((227, 227, 3))
    mean_image[:, :, 0] = 103.939 # blue
    mean_image[:, :, 1] = 116.779 # green
    mean_image[:, :, 2] = 123.68 # red
    X_prep = X - mean_image
    # Convert array to D, W, H
    X_prep = torch.from_numpy(X_prep.transpose(2, 0, 1))
    # Tensor shape = 1, 3, 227, 227
    X_prep.unsqueeze_(0)
    X_prep.requires_grad = True
    # get the integer representation of the label
    return (X_prep.float(), y)

def init_conv1_kernel(shape, dtype=None):
  return TF_CONV1_KERNEL

def init_conv1_bias(shape, dtype=None):
  return TF_CONV1_BIAS

def get_tf_net():
  img_input = Input(shape=(227,227,3))
  x = Convolution2D(64, (3, 3), strides=(2, 2), padding='valid',\
                    name='conv1', kernel_initializer=init_conv1_kernel,\
                     bias_initializer=init_conv1_bias)(img_input)
  model = Model(img_input, x)
  return model

def run_tf_model(tf_net, prep_img):
  # prepare input for tf NET
  # convert 1 x D x W x H => 1 x W x H x D
  prep_img = prep_img.detach().numpy().transpose(0, 2, 3, 1).astype(np.float64)

  init_g = tf.global_variables_initializer()
  with tf.Session() as sess:
    sess.run(init_g)
    result = sess.run(tf_net.output, feed_dict={tf_net.input:prep_img})
    return result


if __name__ == "__main__":
  # Specify image
  target_example = 1 # 0 = monarch; 1 = llama; 2 = airedale
  prep_img, target_class = get_torch_image(target_example)
  tf_model = SqueezeNet("relu")
  tf_weights = [i for i in tf_model.get_weights()]

  TF_CONV1_KERNEL = tf_weights[0]
  TF_CONV1_BIAS = tf_weights[1]

  # get the simple tensorflow net
  tf_net = get_tf_net()

  # pytorch
  x = nn.Conv2d(3, 64, kernel_size=3, stride=2)
  # 3, 2, 0, 1 seems to be right transformation from TF to PyTorch weights
  x.weight = nn.Parameter(torch.from_numpy(TF_CONV1_KERNEL.transpose(3, 2, 0, 1)))
  x.bias = nn.Parameter(torch.from_numpy(TF_CONV1_BIAS))
  
  # FINAL COMPARISON
  pytorch_result = x(prep_img)
  tf_result = run_tf_model(tf_net, prep_img)
  print(pytorch_result.detach().numpy().transpose(0, 2, 3, 1))
  print(tf_result)

The results I am getting are different, and I am not sure why. The same weights are applied to both the PyTorch and the TensorFlow model and rotated from one to the other to account for differences in the frameworks. Any suggestions for what I am doing wrong? I am still very much a novice in both frameworks, so apologies for any badly designed or inefficient code.

Thanks in advance.

Hello! I know there is some difference in how Pytorch and Tensorflow pad their convolutions but that doesn’t seem to be the case here. Can you give us the printouts of the results? Is the difference very small, like each value is off by 0.00001 or is it totally out of whack?

It seems that you thought of the difference in how the input is represented differently in Pytorch & Tensorflow -convert 1 x D x W x H => 1 x W x H x D. It might help you to load (or rather create in numpy) a very small fake image instead, that is just 5x5 in height width to realize what is happening

1 Like

Thanks for your quick reply! The output is pretty out of whack.

PyTorch result:
[[[[-1.55727930e+01 6.61370039e+00 4.36168327e+01 … -6.25202560e+01
-3.68952179e+00 2.40714240e+00]
[-1.62554455e+01 7.00090361e+00 4.45125008e+01 … -6.14532433e+01
-3.93618011e+00 1.52107084e+00]
[-1.56152411e+01 8.09883404e+00 4.66250916e+01 … -6.18155365e+01
-3.79196930e+00 1.27307308e+00]

[-1.28283596e+01 9.92133713e+00 4.42730675e+01 … -5.85974274e+01
-4.47753334e+00 -2.33572006e+00]
[-1.29492254e+01 9.98927498e+00 4.47625046e+01 … -5.75144043e+01
-4.23507118e+00 -2.50432777e+00]
[-1.29025307e+01 1.04603415e+01 4.31288528e+01 … -5.65784950e+01
-5.23016644e+00 -1.43917739e+00]]

[[-1.81284542e+01 6.69566584e+00 4.51744881e+01 … -6.74558411e+01
-4.09748411e+00 3.80656385e+00]
[-1.69139614e+01 7.42382812e+00 4.62523918e+01 … -6.46885986e+01
-4.12603045e+00 5.13678980e+00]
[-1.61353226e+01 6.77596855e+00 4.64864159e+01 … -6.28667564e+01
-3.20863509e+00 2.91260934e+00]

[-1.30580931e+01 9.37584877e+00 4.25258522e+01 … -5.39457588e+01
-4.40542269e+00 -7.44404018e-01]
[-1.35420046e+01 8.23383141e+00 4.46373672e+01 … -5.46562843e+01
-4.25847816e+00 -7.47495651e-01]
[-1.44313879e+01 9.59775257e+00 4.42936440e+01 … -5.61043854e+01
-4.35188198e+00 -4.76904720e-01]]

[[-1.94012337e+01 8.61811733e+00 4.86200485e+01 … -7.11752090e+01
-3.94082189e+00 1.81865060e+00]
[-1.88735142e+01 8.37971401e+00 4.74069405e+01 … -7.03029938e+01
-4.67382908e+00 1.20623779e+00]
[-1.72195396e+01 6.08268976e+00 4.71951714e+01 … -6.74618149e+01
-2.36527705e+00 3.57859421e+00]

[-1.53524723e+01 7.79507494e+00 3.99389267e+01 … -5.71286507e+01
-2.56354046e+00 1.54735291e+00]
[-1.53006153e+01 7.52504778e+00 4.11591492e+01 … -5.58768501e+01
-2.99580860e+00 5.00386417e-01]
[-1.52324238e+01 8.16348171e+00 4.18217010e+01 … -5.52956238e+01
-3.33055973e+00 1.45770371e+00]]

[[-3.95771561e+01 2.25910244e+01 1.21415298e+02 … -1.51980209e+02
-6.98058891e+00 7.85283375e+00]
[-3.94587402e+01 2.41157417e+01 1.21488655e+02 … -1.56029785e+02
-8.40237141e+00 5.80845928e+00]
[-3.65274239e+01 2.46826572e+01 1.20590828e+02 … -1.52416183e+02
-6.41088676e+00 4.28248215e+00]

[-5.33665276e+00 7.19608605e-01 -5.96260643e+01 … 8.87334976e+01
-1.41563797e+01 -2.72411518e+01]
[-3.66656721e-04 -1.24967871e+01 -3.46930962e+01 … 3.56165390e+01
-1.30187769e+01 2.77730703e+00]
[ 2.65501442e+01 5.53524113e+00 -9.37129364e+01 … 9.93023453e+01
-1.02354088e+01 -2.51134968e+01]]

[[-3.87404022e+01 2.05795860e+01 1.23607132e+02 … -1.50887085e+02
-7.83956671e+00 8.54336548e+00]
[-3.85462494e+01 2.25298386e+01 1.25826469e+02 … -1.53816299e+02
-6.85490179e+00 7.11485195e+00]
[-3.71422806e+01 2.41298370e+01 1.23497955e+02 … -1.53054825e+02
-7.31813765e+00 6.59007502e+00]

[ 1.60658703e+01 2.13380361e+00 -8.33904877e+01 … 1.14979324e+02
7.65574217e+00 -3.57480774e+01]
[ 1.92091408e+01 -8.28013897e+00 -6.79464035e+01 … 1.14296410e+02
-1.95303402e+01 -1.98741550e+01]
[ 1.45242786e+01 -1.89359035e+01 -8.69550552e+01 … 7.34474716e+01
-3.82573938e+00 2.19409275e+00]]

[[-3.88492966e+01 2.17711983e+01 1.22294746e+02 … -1.51570969e+02
-8.94998169e+00 7.86662531e+00]
[-3.83859596e+01 2.24098835e+01 1.24778725e+02 … -1.51290833e+02
-6.70756340e+00 7.64165878e+00]
[-3.62384071e+01 2.36939659e+01 1.23182175e+02 … -1.50518539e+02
-6.68640041e+00 9.41369820e+00]

[ 1.47001305e+01 -2.15699253e+01 -6.95455246e+01 … 6.40939713e+01
1.30296583e+01 -4.06633425e+00]
[ 3.88580894e+00 -2.36505947e+01 -8.05076523e+01 … 7.09654388e+01
3.65040359e+01 3.88569856e+00]
[ 2.37450743e+00 -1.84645958e+01 -7.89351654e+01 … 8.26435165e+01
-1.86660156e+01 -8.00209332e+00]]]]

TensorFlow result:

[[[[-8.69342232e+00 -4.46133842e+01 -4.11044598e+00 … -1.40601139e+01
-2.76327324e+00 -7.33619750e-01]
[-8.76875496e+00 -4.38285599e+01 -4.52047634e+00 … -1.21240826e+01
-4.52868342e-01 -4.85117912e+00]
[-3.16826677e+00 -4.33830757e+01 -2.89958030e-01 … -1.21671791e+01
-1.35957611e+00 -2.13636208e+00]

[ 2.93169546e+00 -4.30884819e+01 2.92113400e+00 … -1.24575596e+01
-3.64714503e-01 1.49559355e+00]
[ 2.20407248e+00 -4.20166359e+01 3.60060787e+00 … -1.30751553e+01
-2.07907557e-01 8.31942976e-01]
[ 6.61060929e-01 -4.18161583e+01 1.82578230e+00 … -1.31756878e+01
2.40392590e+00 -4.45742637e-01]]

[[-7.41690683e+00 -4.55650864e+01 -5.22514534e+00 … -1.43556290e+01
1.62857163e+00 -2.67087078e+00]
[-7.36708498e+00 -4.59332275e+01 -4.48502970e+00 … -1.38072262e+01
-2.25672603e-01 -1.33661842e+00]
[-5.73058271e+00 -4.50346756e+01 -1.89988089e+00 … -1.25236511e+01
-4.41305995e-01 -2.76651287e+00]

[ 7.31217718e+00 -3.93325424e+01 4.69743490e+00 … -1.17109795e+01
1.21233380e+00 2.71890259e+00]
[ 4.79426527e+00 -3.79848976e+01 6.33238983e+00 … -1.35663080e+01
4.55395818e-01 6.30443990e-01]
[ 3.11787081e+00 -3.92297249e+01 3.73610640e+00 … -1.36973915e+01
-6.89558864e-01 2.05143213e+00]]

[[-9.84649563e+00 -4.99152412e+01 -6.55583477e+00 … -1.51340437e+01
-1.18556011e+00 -3.41272831e+00]
[-6.99694395e+00 -4.89191284e+01 -4.55101347e+00 … -1.48093748e+01
-1.50882709e+00 -2.20925045e+00]
[-8.66922283e+00 -4.73839149e+01 -3.73310804e+00 … -1.54749346e+01
-7.22140074e-02 -2.81260395e+00]

[-2.57114363e+00 -4.07101402e+01 -4.36568308e+00 … -1.07023830e+01
-2.07685566e+00 5.59012353e-01]
[-3.43358183e+00 -3.94832306e+01 -2.55874109e+00 … -1.12538576e+01
-1.15153778e+00 -2.41037083e+00]
[-1.21432684e-01 -3.94448166e+01 -1.65169764e+00 … -1.09796534e+01
-9.95543361e-01 -2.13927299e-01]]

[[-2.27701616e+00 -1.12056534e+02 -2.71443224e+00 … -3.32335014e+01
-2.26287651e+00 -2.04962254e+00]
[-4.70457125e+00 -1.16180176e+02 -5.46409464e+00 … -3.20664177e+01
4.79503751e-01 -4.76655293e+00]
[-3.61207247e+00 -1.17574646e+02 -2.63787127e+00 … -3.61133766e+01
-5.41910052e-01 -2.34788609e+00]

[ 3.17560425e+01 1.02779106e+02 3.05767479e+01 … -1.56130552e+00
2.42486324e+01 -4.81192932e+01]
[ 3.23448410e+01 7.65223770e+01 5.38471069e+01 … 1.22197971e+01
-2.41868353e+00 2.01450157e+01]
[ 1.09505539e+02 4.30392570e+01 5.18104210e+01 … 1.88628922e+01
-4.10199623e+01 8.42294235e+01]]

[[-1.18653905e+00 -1.12090675e+02 1.42196798e+00 … -3.24247437e+01
-6.61200285e-02 -4.00063801e+00]
[ 1.60937890e-01 -1.15288605e+02 9.52351987e-01 … -3.43898773e+01
-2.41448593e+00 -9.69057381e-02]
[ 1.15533984e+00 -1.19076508e+02 4.45890874e-01 … -3.44546585e+01
-1.72078884e+00 -7.69213736e-01]

[-2.76143646e+01 8.25905914e+01 -1.23820171e+01 … -1.11927576e+01
1.05137939e+01 -6.30510445e+01]
[-3.78569269e+00 9.05888519e+01 2.42531643e+01 … 4.81149817e+00
6.85166779e+01 -6.71146545e+01]
[-8.93066692e+00 9.16900177e+01 1.09996519e+01 … -3.70835915e+01
1.91957741e+01 2.72852554e+01]]

[[ 1.91258776e+00 -1.12579613e+02 1.05847406e+00 … -3.12985153e+01
-1.81035411e+00 -1.66326427e+00]
[ 1.86406291e+00 -1.14508347e+02 2.45568037e+00 … -3.39790993e+01
5.43172956e-01 -2.64378834e+00]
[ 4.37815619e+00 -1.16999229e+02 2.99845552e+00 … -3.66793518e+01
-2.43606186e+00 4.48140621e+00]

[ 1.01585503e+01 5.27431793e+01 1.75840988e+01 … 1.97002945e+01
-7.98774490e+01 4.16897316e+01]
[ 1.25890836e-01 7.43166656e+01 -5.24134493e+00 … 6.40070248e+00
-7.61140060e+01 3.54184952e+01]
[ 2.31189346e+01 6.34546432e+01 -2.38203793e+01 … 9.63987579e+01
-4.76676941e+00 -1.75961056e+01]]]]

As can already be seen from the first entry (roughly -15 vs -8), the results are pretty different. I’ll try it again with a small fake image like you suggested.

1 Like

Thanks again for your reply! I was finally able to figure out my issue. As you said, trying it out on a small example did make me understand it better!

My issue was I had to change this:
x.weight = nn.Parameter(torch.from_numpy(TF_CONV1_KERNEL.transpose(3, 2, 0, 1)))
x.bias = nn.Parameter(torch.from_numpy(TF_CONV1_BIAS))

To this:
x.weight.data = torch.tensor(TF_CONV1_KERNEL.transpose(3, 2, 0, 1), dtype=torch.float)
x.bias.data = torch.tensor(TF_CONV1_BIAS, dtype=torch.float)

Meaning, you need to explicitly specify the type when assigning a tensor back to the convolutional layer in PyTorch.

1 Like

Thank you for sharing your solution. Did you end up with exactly the same outputs from pytorch and TensorFlow?
I am also converting my tensorflow code to pytorch and I am comparing output layer by layer. The problem is that the output of the first convolution is just a little bit different from TensorFlow output. but after several layers, this gap increase and end up with a garbage output.