PyTorch model with trained Keras weights / bias gives worse performance

Greetings everyone,

I’m currently trying to implement a model originally developed with DeepTrack for particle tracking on video recordings. It’s built on top of Keras, so the model will use layers defined in Keras. The original model’s architecture is as follows (input image size is 64x64 but can be subject of changes):

_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 input_1 (InputLayer)        [(None, 64, 64, 1)]       0         
                                                                 
 conv2d (Conv2D)             (None, 64, 64, 16)        160       
                                                                 
 activation (Activation)     (None, 64, 64, 16)        0         
                                                                 
 max_pooling2d (MaxPooling2D  (None, 32, 32, 16)       0         
 )                                                               
                                                                 
 conv2d_1 (Conv2D)           (None, 32, 32, 32)        4640      
                                                                 
 activation_1 (Activation)   (None, 32, 32, 32)        0         
                                                                 
 max_pooling2d_1 (MaxPooling  (None, 16, 16, 32)       0         
 2D)                                                             
                                                                 
 conv2d_2 (Conv2D)           (None, 16, 16, 64)        18496     
                                                                 
 activation_2 (Activation)   (None, 16, 16, 64)        0         
                                                                 
 max_pooling2d_2 (MaxPooling  (None, 8, 8, 64)         0         
 2D)                                                             
                                                                 
 flatten (Flatten)           (None, 4096)              0         
                                                                 
 dense (Dense)               (None, 32)                131104    
                                                                 
 activation_3 (Activation)   (None, 32)                0         
                                                                 
 dense_1 (Dense)             (None, 32)                1056      
                                                                 
 activation_4 (Activation)   (None, 32)                0         
                                                                 
 dense_2 (Dense)             (None, 2)                 66        
                                                                 
=================================================================
Total params: 155,522
Trainable params: 155,522
Non-trainable params: 0
_________________________________________________________________

Note: The “Activation” layer is “ReLU”; convolution layers padding is set as “same”, kernel filter size is (3,3)

I generate synthetic data using DeepTrack data generator, which allows me to create images of particles recorded with a microscope. An example is the following:

image

So the model is trained to find the center of these particles as labels. After training this model with a random generated dataset, I obtain some really good predictions on my validation sample. For example, the following image shows the accuracy level I achieve with this model:

image

Now my problem is that I want to translate this model to PyTorch (first) by loading the weights and biases of my Keras model into the PyTorch equivalent. After I verify that the PyTorch model works fine I will then move to Brevitas to quantize this model and then create an IP for an FPGA using the FINN compiler. The model I built is the following (it should be equivalent to the Keras one)

import torch.nn as nn

convolution_sizes = [16, 32, 64]
dense_sizes = [32, 32]

class SPTModel(nn.Module):
    def __init__(self, in_channels: int, image_size: int, convolution_sizes: list, dense_sizes: list, output_size: int) -> None:
        super(SPTModel, self).__init__()
        self.layers = nn.Sequential()
        in_size = in_channels
        for idx, size in enumerate(convolution_sizes, 1):
            self.layers.add_module(
                name=f"conv_{idx}",
                module=nn.Conv2d(in_size, size, kernel_size=(3, 3), padding="same")
            )
            self.layers.add_module(
                name=f"conv_{idx}_relu",
                module=nn.ReLU()
            )
            self.layers.add_module(
                name=f"maxpool_{idx}",
                module=nn.MaxPool2d((2, 2))
            )
            in_size = size
        self.layers.add_module(name="flatten", module=nn.Flatten())

        # the convolution layers squeeze the image of a factor 2
        # after len(convolution_sizes) layers, the final size will be 
        # convolution_sizes[-1]*((image_size / (2**len(convolution_sizes)))**2)

        maxpool_final_size = image_size // (2**len(convolution_sizes))
        in_size = convolution_sizes[-1]*(maxpool_final_size*maxpool_final_size)
        for idx, size in enumerate(dense_sizes, 1):
            self.layers.add_module(
                name=f"linear_{idx}",
                module=nn.Linear(in_size, size)
            )
            self.layers.add_module(
                name=f"linear_{idx}_relu",
                module=nn.ReLU()
            )
            in_size = size
        self.layers.add_module(
            name=f"linear_output",
            module=nn.Linear(dense_sizes[-1], output_size)
        )

    def forward(self, x):
        x = self.layers(x)
        return x

model_torch = SPTModel(in_channels=1, image_size=IMAGE_SIZE, convolution_sizes=convolution_sizes, dense_sizes=dense_sizes, output_size=2)

Initially I tried to train this same model using the same dataset, but the accuracy was poor everytime I tried. After some time I decided to give up and try to load the weights and biases of my Keras model into the PyTorch one as follows:

keras_weights = {w.name: w for w in model.weights}
torch_weights = model_torch.state_dict()
new_weights = {}

for k_layer, t_layer in zip(keras_weights.keys(), torch_weights.keys()):
    if "conv" in k_layer and "kernel:0" in k_layer:
        if "conv" in t_layer and "weight" in t_layer:
            new_weights[t_layer] = torch.Tensor(np.moveaxis(keras_weights[k_layer], [-1, -2], [0, 1])).to(device)
    elif "dense" in k_layer and "kernel:0" in k_layer:
        if "linear" in t_layer and "weight" in t_layer:
            new_weights[t_layer] = torch.Tensor(np.transpose(np.array(keras_weights[k_layer]))).to(device)
    else:
        new_weights[t_layer] = torch.Tensor(np.array(keras_weights[k_layer])).to(device)

model_torch.load_state_dict(new_weights)

After doing this, I tried again on my validation dataset to see if this somehow managed to fix the problem. Instead, what I find is that the performance is still poor:

image

What’s going on here?

Some extra informations:

  • loss function used with the Keras model is MSE
  • optimizer used with Keras model is Adam (lr = 0.001)
  • batch size: 1
  • number of epochs: 128
  • particle position in the image is randomly generated and the data generator provided by DeepTrack allows to continously generate new pseudo-random particles

I would recommend to scale down the use case a bit and make sure that single layers are yielding the same outputs first. I.e. I see that you are already permuting the filters (which would be expected), but these small unit tests could still show where the model output diverges.

Hi @ptrblck, thanks for the reply. In the end I tried training again the PyTorch model making sure that the hyperparameters were the same as the one I used for the Keras model and that did the tric.