TF to PT weight tranfer doesnt get same eval results

peter_vala · December 29, 2021, 5:49pm

Hey there,
I’m trying to transfer weights between pytorch and tensorflow and viceversa. The problem that I’m getting is that even having the same weights for all layers the results are different for evaluation of the model and i can’t understand why.

this is my tensorflow model:

num_classes = 10
input_shape = (28, 28, 1)
tf.random.set_seed(2)
model = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Conv2D(4, kernel_size=(3, 3), name = “conv2d”),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(name=“flatten”),
layers.Dense(num_classes, activation=“softmax”,name=“dense”),
]
)

and this is my pytorch model

torch.manual_seed(2)
class Network(nn.Module):
def init(self):
super().init()
self.conv2d = nn.Conv2d(1,4,kernel_size=(3,3))
self.hidden = nn.MaxPool2d((2,2))
self.dense = nn.Linear(676, 10)
self.softmax = nn.Softmax(dim=1)

def forward(self, x):
    in_size = x.size(0)
    x = self.hidden(self.conv2d(x))
    x = x.view(in_size, -1)
    x = self.dense(x)
    return self.softmax(x)

I also test it after training tensorflow model but even after the weight the tranfer the results are completly different:

Tensorflow acc: Test accuracy: 0.9228000044822693
Pytorch acc : Test set: Average loss: 0.0013, Accuracy: 972/10000 (10%)

These are the weights also after the training:
Tensorflow :
[array([[[[ 0.46365955, -0.12814473, 0.14183734, -0.5937301 ]],

     [[ 0.35576758,  0.5139991 ,  0.6756656 , -0.65329105]],

     [[ 0.48599064,  0.57487786, -0.11121935, -0.6914515 ]]],


    [[[ 0.27835783,  0.43169427,  0.33743912, -0.7600264 ]],

     [[ 0.547942  ,  1.0308589 ,  0.489442  , -0.67891765]],

     [[ 0.5344402 ,  0.6332875 ,  0.6738143 , -0.43144146]]],


    [[[-0.05558264,  0.7325994 ,  0.28004876, -0.17341942]],

     [[-0.1670951 ,  0.7585311 ,  0.54995114, -0.36898494]],

     [[ 0.3589517 ,  0.14311437,  0.41515586, -0.5163795 ]]]],
   dtype=float32),

array([ 0.27913818, -0.39000487, -0.23639436, -0.2614709 ], dtype=float32)]

Pytorch :
Parameter containing:
tensor([[[[ 0.4637, 0.3558, 0.4860],
[ 0.2784, 0.5479, 0.5344],
[-0.0556, -0.1671, 0.3590]]],

    [[[-0.1281,  0.5140,  0.5749],
      [ 0.4317,  1.0309,  0.6333],
      [ 0.7326,  0.7585,  0.1431]]],


    [[[ 0.1418,  0.6757, -0.1112],
      [ 0.3374,  0.4894,  0.6738],
      [ 0.2800,  0.5500,  0.4152]]],


    [[[-0.5937, -0.6533, -0.6915],
      [-0.7600, -0.6789, -0.4314],
      [-0.1734, -0.3690, -0.5164]]]], requires_grad=True)

Parameter containing:
tensor([ 0.2791, -0.3900, -0.2364, -0.2615], requires_grad=True)

Can someone give me some light on what is creating this big difference?

ptrblck · December 30, 2021, 2:44am

I don’t know how you’ve loaded the parameters from TF to PyTorch, but would recommend to check the layers in isolation and make sure e.g. the memory layout, flipping etc. are right.
In your PyTorch model you re using nn.Softmax as the last activation, which is usually wrong. Guessing you are working on a multi-class classification using nn.CrossEntropyLoss, remove the softmax as the criterion expect raw logits.

peter_vala · January 2, 2022, 3:05pm

hi @ptrblck thanks for your answer. I’m actually new and i’m still learning this concepts.
I did remove the softmax activation from the pytorch net and didn’t change anything.
This is the code I’m using to transfer weights from keras to pytorch its the code that i found in the github:

def keras_to_pyt(km, pm):
weight_dict = OrderedDict()
for layer in km.layers:
if type(layer) is keras.layers.Conv2D:
weight_dict[layer.get_config()[‘name’] + ‘.weight’] = np.transpose(layer.get_weights()[0], (3, 2, 0, 1))
print(layer.get_config()[‘name’])
#print(weight_dict[layer.get_config()[‘name’] + ‘.weight’])
weight_dict[layer.get_config()[‘name’] + ‘.bias’] = layer.get_weights()[1]
elif type(layer) is keras.layers.Dense:
weight_dict[layer.get_config()[‘name’] + ‘.weight’] = np.transpose(layer.get_weights()[0], (1, 0))
weight_dict[layer.get_config()[‘name’] + ‘.bias’] = layer.get_weights()[1]
pyt_state_dict = pm.state_dict()
#print(pyt_state_dict)
#print(weight_dict)
print(pyt_state_dict.keys())
for key in pyt_state_dict.keys():
print(key)
if “flatten” in key:
print(“flatten layer no weights or biases”)
continue

    key1 = remove_prefix(key,"layers.")
    pyt_state_dict[key] = torch.from_numpy(weight_dict[key1])
    #print(pyt_state_dict[key])
pm.load_state_dict(pyt_state_dict)
return pm