Hi, I’m trying to convert a tensorflow pretrained model to pytorch.
The network is quite simple, just with one 1dconv and one 1dbn layer
The output is the same after the 1d conv layer. However, I cannot replicate the performance/output after the batchnorm layer
More specifically, there seems to be a bug in the original tensorflow code.
The author seems to forget to add the runningmean/variance of the batchnorm layer in tensorflow to the ‘trainable variables’, so they are never really updated in the model he provided. And the is_train flag is always set to True even in evaluation.
However, when I load the model in pytorch, then set the runningmean to all zeros and running variance to all ones, the output is different nomatter I set model to train or eval.
Nonetheless, the overall performance in the tensorflow model is great. Even if I remove all the bn layers, I can get the same output in my pytorch code, and when I add batchnorm back and retrain the network in pytorch, the performance is bad, about 20 ~ 30 percent lower,
Please check this original code in tensorflow for details
And here’s my code for the problem above