Torch model subtly different than Keras model

BlackBird123 · October 14, 2021, 3:50pm

Hello folks,

New to Torch and trying to translate a Keras model to Torch. I’m finding a slight difference in model accuracy that makes me think I haven’t done the model translation properly. Here is the code for the two models:

Keras

def keras_model(input_tensor_length):
    he_uniform_initializer = he_uniform()
    input_tensor = Input(shape=(1, input_tensor_length))

    dense = Dense(
        units=100, activation="relu", kernel_initializer=he_uniform_initializer
    )(input_tensor)

    dense = Dropout(0.5)(dense)

    dense = Dense(
        units=50, activation="relu", kernel_initializer=he_uniform_initializer
    )(dense)

    dense = Dropout(0.5)(dense)

    dense = Dense(
        units=25, activation="relu", kernel_initializer=he_uniform_initializer
    )(dense)

    predictions = Dense(1, activation="linear")(dense)

    sigm = Dense(1, activation="sigmoid")(predictions)

    model = Model(inputs=input_tensor, outputs=sigm)

    model.compile(
        optimizer=Adam(lr=0.0001), loss="binary_crossentropy", metrics=["mae", "acc"]
    )

Torch

class model(nn.Module):
    def __init__(self):
        super(model, self).__init__()

        self.mod = nn.Sequential(
            nn.Linear(input_dim,100),
            nn.Linear(100,50),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(50, 25),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(25, 1),
            nn.Sigmoid()
        )

    def forward(self, x):
        y = self.mod(x)
        return y

Both are trained using BCE on 0/1 targets. Binary Accuracy is computed on both validation sets using the same function here and the resulting accuracies are Kears: 0.697 Torch 0.689:

def binary_accuracy(self,y_true, y_pred):
        N = sum(y_true == np.round(y_pred))
        return N/len(y_true)

ptrblck · October 14, 2021, 11:06pm

In your Keras model, you are using a ReLU and Dropout after the first linear layer, which is missing in the PyTorch model.
Also, you are using an additional Dropout layer after the nn.Linear(50, 25) layer in PyTorch, which is not present in Keras and it seems you are using two Dense layers as the output, where the first one is using a “linear” activation function while the latter uses sigmoid.

BlackBird123 · October 15, 2021, 7:59pm

Thanks for your response! I really appreciate it.

If I’m understanding you correctly, it should be something like this:

self.mod = nn.Sequential(
            nn.Linear(input_dim,100),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(100, 50),
            nn.ReLU(),
            nn.Dropout(0.5),
            nn.Linear(50, 25),
            nn.ReLU(),
            nn.Linear(25, 1),
            nn.Linear(1, 1),
            nn.Sigmoid()
        )

So from this I am learning that the number of units in the Keras definition is the second number in the Torch layer definition. In essence:

Dense(units=100, activation=“relu”, kernel_initializer=he_uniform_initializer)
nn.Linear(input_dim, 100)
nn.ReLU()

I’m still not certain how to reat the Input() definition from Keras. Is it a linear layer like I have it?

input_tensor = Input(shape=(1, input_tensor_length))

Also, do you think the he_uniform_initialization matters?

ptrblck · October 15, 2021, 11:25pm

Yes, the specified units in Dense are corresponding to the out_features in PyTorch.

You don’t need to specify input layers or any placeholders in PyTorch and can directly pass the input tensors to the model.

Yes, initialization can matter but it also depends on the model.

BlackBird123 · October 16, 2021, 12:10pm

Thanks so much for your help! I really appreciate it