Basic PyTorch to Keras Model Converter (for CV models)

Hi,

I created a very basic model converter that converts PyTorch models into keras by first converting the model into onnx and using the onnx API and IR to compile and iteratively add keras layers.
The motivation was to allow PyTorch models to be exported to edgetpu. By directly exporting from onnx then to keras, transpose operations are added to each layer, which prevents models from being exported to the EdgeTPU. My project was motivated to overcome this issue.

Custom onnx operation converters can also be added / overriden using the @converter decorator and tests for each layers are automated if the output layer and output are returned as tuples. Alternatively, users can also write their own tests to override their own tests.

Project link: GitHub - JWLee89/pt2keras: A PyTorch To Keras Model Converter

I used some code from onnx2keras: GitHub - gmalivenko/onnx2keras: Convert ONNX model graph to Keras model format. when writing some converters. Right now, I am working on cleaning up the code and thinking of better ways to improve transparency.

Right now, I have tested the conversion for the following models:

  • EfficientNet
  • MobileNetV2
  • ResNet
  • AlexNet
  • Inception_v3 (warning: converted model shows relatively larger distance (Network output value does not fall within atol=1e-4))
  • Vgg
  • GoogleNet

Note that I only wrote converters for basic operations in common computer visions. There are way more onnx operator supporters that need support and I might write more converters based on needs. Right now, the current set of converters are more than enough for my use case.

Sample demos are included in the demo folder and below is a simple use case

from copy import deepcopy

import numpy as np
import tensorflow as tf
import torch
import torch.nn as nn

from pt2keras import Pt2Keras


class DummyModel(nn.Module):
    """
    Model will be converted to EdgeTPU.
    """

    def __init__(self):
        super().__init__()
        # These can all be found using named_modules() or children()
        self.conv = nn.Sequential(
            nn.Conv2d(3, 32, (3, 3), stride=(1, 1), padding=1, groups=1, bias=True),
            nn.Sigmoid(),
            # Downsample
            nn.Conv2d(32, 64, (3, 3), stride=(1, 1), padding=1, dilation=(1, 1), groups=1, bias=True),
            nn.Sigmoid(),
            nn.Conv2d(64, 128, (1, 1), stride=(2, 2), padding=1, groups=1, dilation=(1, 1), bias=False),
            # nn.Conv2d(3, 32, (3, 3), stride=(2, 2), padding=(1, 1), groups=1, bias=True),
            nn.Sigmoid(),
            nn.Conv2d(128, 256, (3, 3), stride=(2, 2), padding=2, groups=1, bias=True),
            nn.Sigmoid(),
            nn.Conv2d(256, 512, (3, 3), stride=(2, 2), padding=2, groups=1, bias=True),
            nn.Sigmoid(),
            nn.Conv2d(512, 1024, (2, 2), stride=(2, 2), padding=0, groups=1, bias=True),
            nn.Sigmoid(),
            nn.Conv2d(1024, 256, (2, 2), stride=(1, 1), padding=0, groups=1, bias=True),
            nn.Sigmoid(),
            nn.Conv2d(256, 128, (2, 2), stride=(1, 1), padding=0, groups=1, bias=True),
        )

    def forward(self, X):
        output = self.conv(X)
        output = torch.flatten(output, start_dim=1)
        return output


if __name__ == '__main__':
    shape = (1, 3, 64, 64)
    model = DummyModel()

    converter = Pt2Keras()

    x_pt = torch.randn(shape)
    # Generate dummy inputs
    x_keras = tf.convert_to_tensor(deepcopy(x_pt.numpy()))

    # input dimensions for PyTorch are BCHW, whereas TF / Keras default is BHWC
    if len(x_keras.shape) == 4:
        x_keras = tf.transpose(x_keras, (0, 2, 3, 1))

    print(f'pt shape: {x_pt.shape}, x_keras.shape: {x_keras.shape}')

    keras_model: tf.keras.Model = converter.convert(model, shape)

    # Make PT model the same input dimension as Keras
    # If the output is >= 4 dimensional
    pt_output = model(x_pt).cpu().detach().numpy()
    keras_output = keras_model(x_keras).numpy()
    if len(keras_output.shape) == 4:
        keras_output = keras_output.transpose(0, 3, 1, 2)
    # Mean average diff over all axis

    average_diff = np.mean(pt_output - keras_output)
    print(f'pytorch: {pt_output.shape}')
    print(f'keras: {keras_output.shape}')
    # The differences will be precision errors from multiplication / division
    print(f'Mean average diff: {average_diff}')
    print(f'Pytorch output tensor: {pt_output}')
    print(f'Keras output tensor: {keras_output}')

Please let me know what you think and any suggestions / advice is welcome.
The code is still messy, because the project was originally intended to be a quick and dirty solution to sidestep the EdgeTPU export issue and get some PyTorch models running on TPU as soon as possible.