Negative loss after transfering weights from Tensorflow to Pytorch

peter_vala · March 17, 2022, 3:29pm

Hey everyone,
I’m working in a translation between both frameworks and I already manage the translation of the weights on both of them. However i found one problem: Although I have the same accuracy on predictions, the loss is different but not just different it is negative on Pytorch side while is positive in Tensorflow side. Does anyone can help me figure it out what’s wrong?

tom · March 17, 2022, 6:52pm

If this is a loss that should not have negative values, quite likely something is up with the inputs to the loss.

There are many subtle differences between TF and PyTorch. I usually try to start with a fixed input and then successively make sure both computations have every intermediate result exactly the same. (Or expected change, like the ordering of dimensions, which can be different between the two, too.) This is a bit tricky when randomness is involved but usually gives good results.

Best regards

Thomas

peter_vala · March 19, 2022, 2:27am

@tom thanks for the reply.
So at first place the input I’m using the same images for both frameworks and the only thing I do is change the shape of the image to be suitable for each one.
I had some problems at the beginning for the conv2d layers and I have same intermediate results reallly close to be the same but they change in the order of ~= 1e-5 ,1e-6, values that I give the cause of round calculations or could I be mistaken?

tom · March 19, 2022, 3:33am

Yes, so I was missing a “up to numerical accuracy” in my original reply.
But so the output is similarly close then and just the loss wrecks havoc or do the intermediates stop being really close?

Best regards

Thomas

peter_vala · March 20, 2022, 3:36pm

Thanks again for the reply @tom. I think the best way to explain my problem is giving a little example of what’s going on.
So I made a small test with only 1 conv2d layer and 1 linear layer. The input_shape of the models is 1 channel image 5x5 resized from mnist hand digit dataset with the respective targets. Every layer is simple with ‘valid’ pad for tensorflow (1,1) stride and (3x3) kernel with only 2 filters output for conv2d and no activation functions just the softmax in the linear layer. The configuration is not important, at least every other configuration that i gave to the layers didnt drop the accuracy as the only problem being the loss of predictions.

The ouput after softmax from Tensorflow is the follow:

[1.1379382e-02 1.9076133e-05 4.2098956e-03 4.5368117e-03 5.9396513e-03
 7.0751053e-03 8.0966354e-05 9.4762051e-01 1.0554336e-02 8.5842358e-03], shape=(10,), dtype=float32)

As the Pytorch output is:

tensor([[1.1379386298e-02, 1.9076149329e-05, 4.2099035345e-03, 4.5368201099e-03,
         5.9396508150e-03, 7.0751085877e-03, 8.0966427049e-05, 9.4762051105e-01,
         1.0554354638e-02, 8.5842274129e-03]])

After the predictions overall dataset this are the results:

Tensorflow: 313/313 - 1s - loss: 1.2292 - accuracy: 0.5802 - 638ms/epoch - 2ms/step
Pytorch: Test set: Average loss: -0.4224, Accuracy: 5802/10000 (58%)

Just in case you want to give a look on intermediate step this are the results after conv2d:

Tensorflow

tf.Tensor(
[[[-2.04286    -0.03277442]
  [-1.8308406  -0.661551  ]
  [-1.6251079   0.14818816]]

 [[ 0.37748477 -0.05223539]
  [-0.0944322   3.0911071 ]
  [ 0.30482826 -0.00778666]]

 [[-0.10668626 -0.02989659]
  [-0.69553125  1.2320076 ]
  [-0.20329499 -0.3670036 ]]], shape=(3, 3, 2), dtype=float32)

Pytorch

tensor([[[[-2.0428609848, -1.8308403492, -1.6251083612],
          [ 0.3774845302, -0.0944317281,  0.3048284352],
          [-0.1066871583, -0.6955307126, -0.2032952011]],

         [[-0.0327757746, -0.6615512371,  0.1481878012],
          [-0.0522354990,  3.0911066532, -0.0077868253],
          [-0.0298968107,  1.2320076227, -0.3670037389]]]])

tom · March 20, 2022, 5:51pm

So the output is close enough to consider it the same.
But are you sure you are using the same loss?
In particular, the popular CrossEntropyLoss in PyTorch wants logits as inputs, not probabilities.

Best regards

Thomas

peter_vala · March 21, 2022, 1:55pm

Well, I was using nll_loss in Pytorch i changed to CrossEntropyLoss function and in Keras i use sparse_categorical_crossentropy.
The results are now positive, but they are not the same as the result to Pytorch i got 1.9976 and Keras 1.2292 is this normal or could be something else wrong?

Thank you once again @tom

tom · March 21, 2022, 8:50pm

It should be a lot closer if they were the same function. I think you would want to drop the softmax and use CrossEntropyLoss.

Best regards

Thomas

peter_vala · March 22, 2022, 5:12pm

It worked! Everything gives the same values. Thank you very much.