Hey there,
I’m trying to transfer weights between pytorch and tensorflow and viceversa. The problem that I’m getting is that even having the same weights for all layers the results are different for evaluation of the model and i can’t understand why.
this is my tensorflow model:
num_classes = 10
input_shape = (28, 28, 1)
tf.random.set_seed(2)
model = keras.Sequential(
[
keras.Input(shape=input_shape),
layers.Conv2D(4, kernel_size=(3, 3), name = “conv2d”),
layers.MaxPooling2D(pool_size=(2, 2)),
layers.Flatten(name=“flatten”),
layers.Dense(num_classes, activation=“softmax”,name=“dense”),
]
)
and this is my pytorch model
torch.manual_seed(2)
class Network(nn.Module):
def init(self):
super().init()
self.conv2d = nn.Conv2d(1,4,kernel_size=(3,3))
self.hidden = nn.MaxPool2d((2,2))
self.dense = nn.Linear(676, 10)
self.softmax = nn.Softmax(dim=1)
def forward(self, x):
in_size = x.size(0)
x = self.hidden(self.conv2d(x))
x = x.view(in_size, -1)
x = self.dense(x)
return self.softmax(x)
I also test it after training tensorflow model but even after the weight the tranfer the results are completly different:
Tensorflow acc: Test accuracy: 0.9228000044822693
Pytorch acc : Test set: Average loss: 0.0013, Accuracy: 972/10000 (10%)
These are the weights also after the training:
Tensorflow :
[array([[[[ 0.46365955, -0.12814473, 0.14183734, -0.5937301 ]],
[[ 0.35576758, 0.5139991 , 0.6756656 , -0.65329105]],
[[ 0.48599064, 0.57487786, -0.11121935, -0.6914515 ]]],
[[[ 0.27835783, 0.43169427, 0.33743912, -0.7600264 ]],
[[ 0.547942 , 1.0308589 , 0.489442 , -0.67891765]],
[[ 0.5344402 , 0.6332875 , 0.6738143 , -0.43144146]]],
[[[-0.05558264, 0.7325994 , 0.28004876, -0.17341942]],
[[-0.1670951 , 0.7585311 , 0.54995114, -0.36898494]],
[[ 0.3589517 , 0.14311437, 0.41515586, -0.5163795 ]]]],
dtype=float32),
array([ 0.27913818, -0.39000487, -0.23639436, -0.2614709 ], dtype=float32)]
Pytorch :
Parameter containing:
tensor([[[[ 0.4637, 0.3558, 0.4860],
[ 0.2784, 0.5479, 0.5344],
[-0.0556, -0.1671, 0.3590]]],
[[[-0.1281, 0.5140, 0.5749],
[ 0.4317, 1.0309, 0.6333],
[ 0.7326, 0.7585, 0.1431]]],
[[[ 0.1418, 0.6757, -0.1112],
[ 0.3374, 0.4894, 0.6738],
[ 0.2800, 0.5500, 0.4152]]],
[[[-0.5937, -0.6533, -0.6915],
[-0.7600, -0.6789, -0.4314],
[-0.1734, -0.3690, -0.5164]]]], requires_grad=True)
Parameter containing:
tensor([ 0.2791, -0.3900, -0.2364, -0.2615], requires_grad=True)
Can someone give me some light on what is creating this big difference?