PyTorch slower than Tensorflow/Keras during training

I’m porting a Tensorflow project to PyTorch and after executing I realize that Tensorflow is faster than PyTorch during training and it’s weird to me.

I isolate the 2 projects in 2 different notebooks with all classes and functions needed to train the model.
Remember this is just a snippet of a bigger project (a project that uses genetic algorithms) and that’s why CustomModel/TFModelConvert module needs to receive a list of arbitrary Resblock modules.

In the TensorFlow notebook, you will see that it took 58 seconds to train during 5 epochs.
In the PyTorch notebook, it took 122 seconds to train using the same layers and the same number of epochs.

My question is what can I do or what am I doing wrong for PyTorch to be slower than TensorFlow?

TensorFlow notebook: Google Colab
PyTorch notebook: Google Colab

You are recreating randomly initialized layers in each forward pass in ResBlock:

class ResBlock(nn.Module):
    def __init__(self):
        super(ResBlock, self).__init__()
        self.layers = None

    def forward(self, x):
        net = x.clone()

        net = self.layers(net)

        with torch.no_grad():
            # Calculating the kernel size needed to downsample
            out_channels = get_out_channels(self.layers)
            k_size = (x.shape[-2] - net.shape[-2] + 1, x.shape[-1] - net.shape[-1] + 1)

        # Apply an convolution
        conv = nn.Conv2d(x.shape[1], out_channels, k_size).to(DEVICE)

        return net + conv(x)

which sounds wrong unless you don’t want to train conv.
Initialize the layers in the nn.Module.__init__ method and use them in the forward method.

conv doesn’t need to be trained, it previously had a dirac_ initialization because I just want to reduce the input and don’t want to increase the number of parameters. I could use a MaxPooling but then I would need to create a zero pad tensor, which will be expensive.

But even if I want to train conv how am I going to initialize it in __init__ if I need to calculate k_size based on the input form? Is there a way to do this in PyTorch?

To test, I just removed everything from the forward function and wrote this:

def forward(self, x):
        return self.layers(x)

The elapsed time didn’t get much better.

Since the conv is not trained at all, you could use adaptive pooling layers instead as I don’t understand how your current approach of using new random filters could work.
In any case, since you still see a slowdown you would need to profile the code using the native profiler or e.g. Nsight Systems and check where the bottlenecks are.
Generally I would also recommend reading through the Performance Guide.

I have already changed the conv to a pooling layer since the goal is to downsample only.
Thanks for the article! I change a few things and the timing improves a bit.

In a PyTorch group, someone reminded me that PyTorch uses Dynamic Computation Graphs and Tensorflow still uses Static Computation Graphs in the background and is able to do optimizations during training. This is probably why TensorFlow is slightly faster after the first epoch.