Performance of keras and Pytorch comparison

I converted my code from keras to pytorch. It is a regression task with 3D CNNs.
I used just one thread in keras, but in pytorch I am using 16 threads.
Can this issue be the reason for the increase of my loss?
I am using SGD optimizer and MSE loss function.

No, the loss shouldn’t depend on the number of threads (inter-op/intra-op).
I would recommend to check the model architecture as well as the initialization of all random parameters as they differ between the frameworks.

If the loss is still diverging in PyTorch, you could load the Keras parameters into your PyTorch model and make sure to get the same output.

1 Like

Thank you so much @ptrblck who I learned pytorch with your comments here.

Actually my model in keras (for stereoscopic video processing) is like bellow:

some code
combined = concatenate([left.output, right.output])

combined = Conv3D(128, (3, 3, 3), activation='relu', strides=1, kernel_initializer='he_uniform' , padding='same')(combined)

combined = BatchNormalization()(combined)

combined = Conv3D(256, (3, 3, 3), activation='relu', strides=1 , kernel_initializer='he_uniform' , padding='same')(combined)

combined = BatchNormalization()(combined)

combined = MaxPooling3D(pool_size=(2, 2, 2))(combined)

combined = Flatten()(combined)

# apply a FC layer and then a regression prediction on the

# combined outputs

z = Dense(128, activation="relu")(combined)

z = BatchNormalization()(z)

z = Dropout(0.5)(z)

z = Dense(32, activation="relu")(z)

z = Dense(1, activation="linear")(z)

model = Model(inputs=[left.input, right.input], outputs=z)

return model

What I did not change in pytorch is the “kernel_initializer=‘he_uniform’”
I dont know how to set it in pytorch.
Do you think it can be the source of my big loss?

It could be the reason for an initially different loss value, but I’m not sure if it could also explain the same loss at the beginning which diverges later.

You can use the torch.nn.init methods via model.apply as e.g. shown here.

I did the initialization but still my code in pytorch is not the performance of keras code
My pytorch class is as bellow:

class combineNet(nn.Module):

def __init__(self, modelA, modelB):

    super(combineNet, self).__init__()

    self.modelA = modelA

    self.modelB = modelB


    # conv

    self.conv1 = nn.Conv3d(2048, 128, kernel_size=(3,3,3), stride=1,  padding=( 0,1,2))

    self.BN1 = nn.BatchNorm3d(128, eps=0.001, momentum=0.99)

    self.conv2 = nn.Conv3d(128, 256, kernel_size=(3,3,3), stride=1,  padding=(2,1,0))

    self.BN2 = nn.BatchNorm3d(256, eps=0.001, momentum=0.99)

    self.MP = nn.MaxPool3d(2)

    self.fc1 = nn.Linear(1024, 256, bias=False)#

    self.BN = nn.BatchNorm1d(256, eps=0.001, momentum=0.99)

    self.dr = nn.Dropout(0.5)

    self.classifier1 = nn.Linear(256, 128, bias=False)#

    self.classifier2 = nn.Linear(128, 1, bias=False)#

    for m in self.modules():

            if isinstance(m, nn.Conv3d):

                m.weight = nn.init.kaiming_normal(m.weight, mode='fan_out')

            elif isinstance(m, nn.BatchNorm3d) or isinstance(m, nn.BatchNorm1d):



def forward(self, x1, x2):

    x1 = self.modelA(x1)

    x2 = self.modelB(x2)

    x =, x2), 1)#, dim=1

    x = x.view(x.size(0),2048,4,4,2)

    x = F.relu(self.conv1(x))

    x = self.BN1(x)

    x = F.relu(self.conv2(x))

    x = self.BN2(x)

    x = self.MP(x)

    x = x.view(x.size(0), -1)

    x = self.BN(F.relu(self.fc1(x))) #self.pool 

    x = self.dr(x)

    x = F.relu(self.classifier1(x))

    x = self.classifier2(x)

    return x

Any help is appreciated.

If the initialization adaption didn’t help, you could try to load the Keras parameters into the PyTorch model and make sure the same (random) input creates the same outputs.
If that’s the case, the difference should come from the preprocessing.

1 Like

My problem still persists!

Actually the loss in keras constantly decreases to the minimum loss(about 0.2).
But in pytorch the loss starts from about 5.0 and decreases to aboout 0.8 abd stays in 0.8±0.1 forever…
I changed learning rate from 0.1 to 0.0000001 and the same results!

The model and the initializations are the same. The loss function and SGD are the same.
The number of trainable parameters are same for both keras and pytorch.

I think something really wrong with pytorch.
Any help…

I have another question related to my loss.
My batch size is 128, and my target and model outputs are with size 128.
I want to optimize the model with all 128 entries, not their mean or sum values.
How can I do that?

I used reduction=none but I dont know how to apply this loss

Initialization solved most part of my loss.
Thank you @ptrblck