FFNN PyTorch vs Keras implementation

I am trying to replicate in PyTorch the behavior of a simple FFNN created with Keras.

This is the Keras version:

initializer = keras.initializers.random_uniform(seed=1)

    model = Sequential([
        Dense(512, activation="relu", input_shape=input_shape, kernel_initializer=initializer),
        Dense(512, activation="relu", kernel_initializer=initializer),
        Dense(num_output, activation="softmax")
    ])

    model.compile(
        optimizer=keras.optimizers.Adam(learning_rate=0.01),
        loss="categorical_crossentropy",
        # TODO: change metric
        metrics=[keras.metrics.AUC()]
    )

and the model is trained as follows:

self.model.fit(self.X, self.y, batch_size=batch_size, epochs=epochs, verbose=verbose)

Now, my PyTorch implementation (it’s my first pytorch NN) is the following:

class FFNN(nn.Module):
def init(self, input_shape, num_classes):

    super(FFNN, self).__init__()

    self.net = nn.Sequential(
        nn.Linear(input_shape[0], 512),
        nn.ReLU(),
        nn.Linear(512, 512),
        nn.ReLU(),
        nn.Linear(512, num_classes),
        nn.Softmax(dim=1))
    self.net.apply(self.init_weights)

def forward(self, X):
    return self.net(X)

def init_weights(self, m):
    if type(m) == nn.Linear:
        m.weight.data.uniform_(-0.05, 0.05)

and it is trained as follows:

    y_idx = get_num_from_1hot(y)

    # convert to tensor
    X_tr = Variable(torch.from_numpy(X).float(), requires_grad=False)
    y_tr = Variable(torch.from_numpy(y_idx), requires_grad=False)

    # Loss and Optimizer
    optimizer = torch.optim.Adam(self.model.parameters(), lr=0.01)
    loss_func = torch.nn.CrossEntropyLoss()

    for i in range(epochs):
        y_pred = self.model(X_tr)

        loss = loss_func(y_pred, y_tr)

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Unfortunately the Pytorch version seems to get stuck in the training (test accuracy stops at a very specific value for MANY epochs – sometimes until the end of the training).
Can you spot any problem with my implementation?

I think you should check pytorch version. using

import torch
print(torch.__version__)

latest version as of 2020 is 1.6.0

That Variable is deprecated in new pytorch version. Instead you could use
autograd_tensor = torch.randn((2, 3, 4), requires_grad=True) , See here https://pytorch.org/docs/stable/autograd.html#variable-deprecated,

I think You should go through https://pytorch.org/tutorials/, examples like https://pytorch.org/tutorials/beginner/blitz/cifar10_tutorial.html to see it working.

I am suing torch 1.4.0.

Thanks for the links, I will definitely follow-up on that.