When trying to send an image through SqueezeNet loaded from the PyTorch models, I get a different output from when I send the same image through a SqueezeNet in TensorFlow. Currently, I am thinking that it has something to do with how the weights for the various layers are initialized, but I am not sure. Any suggestions?
If you are starting from randomly initialized weights, then it is completely natural to have different outputs.
If you load a pre-trained model, then the weights for both models (TF and PyTorch) should be the same, so in that case, you should not get different outputs.
Another thing to consider, it might be that the two pre-trained models need certain input normalization, for example, the range of pixel values be within [0,1] or [-1,1] instead of [0, 255]. Or possibly, they would also need centering the color channels in the input using certain mean values. So, if you are using a pre-trained model, you would need to figure out what normalization steps need to be applied to the input images.
Thanks for your quick reply! With regards to what you said above, is it possible that the normalization that you have to apply would have to be different for each implementation (TF and PyTorch) even though they are implementing the same version of the same model (SqueezeNet 1.1 in this case)?
So, are you using some pre-trained models? If so, then yes, it depends how those models are trained. They could have been trained by different groups and therefore, their initialization might be different.
Yes, both of the models are pre-trained. So if I understand correctly, if they would be trained by the same groups they should have the same weights, but if their training group was different then their weights would be different and so will be the resulting output when sending in the same input?
Yes, what you said could be correct. (but not necessarily) I can train one model in PyTorch and then transform the learned weights into a TensorFlow model. But, if I train two models separately, then obviously, the weights could be different.
Are the outputs reasonable? or they are completely off?
Thanks again for your quick reply. I just tried doing this (but instead transfer weights from TF model to PyTorch), and the outputs seem to be pretty far off. But that shouldn’t happen because now the models and the weights are supposedly the same. Any ideas what else could be causing this?
I remember once I transferred the weights from TF to PyTorch for a project that I first did in TF, and then switched to PyTorch. One thing that you need to pay attention to, is that convolution is implemented slightly differently in TF and PyTorch. Therefore, you have to do rotate the weights or something like that.
First, try to establish what transformation needs to be applied to the convolution weights to get the same results across the two. For this, it’s easier to do that with a small convolution layer. Then, you can transform the entire model weights.
Thanks again for your many suggestions. I was finally able to figure out my issue. As you said, I had to rotate the weights when transferring from TensorFlow. The rotation that I applied was transpose(3, 2, 0, 1). However, this alone was not enough, I also had to be very careful to specify the type of the tensor when assigning it to the PyTorch layer, as follows:
pytorch_layer.weight.data = torch.tensor(tf_weight, dtype=torch.float)
where tf_weight is the transposed weight from the corresponding TensorFlow layer.