Convert pre-trained model from Tensorflow to libtorch for inference

jb892 · March 10, 2020, 2:28am

Hi,

I’m working on a project which converts a pre-trained model from TF to libtorch. But I have experienced some issue with the final performance. It’s a huge drop. After some debugging work, I found the major problem is caused by the Conv2d layers in the network. For example, the output of the layer before the first Conv2d layer is the same as in Tensorflow. However, the output tensor of Conv2d is completely different from that in TF. I’m so confused. I have put some example code below.

Tensor loadTensorWeightFromH5(std::string name)
{
    Tensor output = readH5ToTensor(name);
    if (output.dim() == 4)
        // Convert TF weight [kernel_h, kernel_w, channel_in, channel_out] 
        // to [channel_out, channel_in, kernel_h, kernel_w]
        output = output.permute({ 3, 2, 0, 1 }); 
    return output;
}

struct Net : nn::Module
{
    Net()
        :mConv0(nn::Conv2dOptions(3, 32, 1).padding(0).stride(1)),
        mBN0(nn::BatchNormOptions(32))
    {
        register_module("Conv0", mConv0);
        register_module("BN0", mBN0);

        mConv0->weight = loadTensorWeightFromH5("Conv0_weights:0");
        mConv0->bias = loadTensorWeightFromH5("Conv0_bias:0");
        mBN0->weight = loadTensorWeightFromH5("BN0_gamma:0");
        mBN0->bias = loadTensorWeightFromH5("BN0_beta:0");
        mBN0->running_mean = loadTensorWeightFromH5("BN0_moving_mean:0");
        mBN0->running_var = loadTensorWeightFromH5("BN0_moving_variance:0");
    }

    Tensor forword(Tensor & input)
    {
        // Convert NHWC to NCHW
        auto x = input.permute({ 0, 3, 1, 2 });

        x = mConv0->forward(x);

        x = mBN0->forward(x);

        x = torch::relu(x);

        // Convert NCHW to NHWC
        x = x.permute({ 0, 2, 3, 1 });

        return x;
    }

    nn::Conv2d mConv0;
    nn::BatchNorm mBN0;
}

Could anyone help?

yf225 · March 10, 2020, 2:32pm

If we run the same code in PyTorch Python API, does it give the same output as libtorch?

jb892 · March 11, 2020, 8:02am

Nope. That’s a good point! I will give it a try, thx!

jb892 · March 12, 2020, 11:00am

Hi after some research, I finally found what I did wrong here. It’s nothing to do with the Conv2D layer. It’s the batch norm layer cause the problem. Since the only thing, I wanna do here is inferencing. I need to set batch norm layers to evaluation mode for testing. Which looks like bn->eval(). It solved my problem.