Basic MNIST Keras model to Pytorch implementation

jarvico · November 13, 2020, 9:24am

Hi all,

I need to implement below model structure defined in Keras using pytorch.

    # MNIST model
    layers = [
        Conv2D(64, (3, 3), padding='valid', input_shape=(28, 28, 1)),
        Activation('relu'),
        Conv2D(64, (3, 3)),
        Activation('relu'),
        MaxPooling2D(pool_size=(2, 2)),
        Dropout(0.5),
        Flatten(),
        Dense(128),
        Activation('relu'),
        Dropout(0.5),
        Dense(10),
        Activation('softmax')
    ]

I tried below model in Pytorch, but I would like to double check if I did it correctly:

class LeNet_dropout(nn.Module):
    def __init__(self):
        super(LeNet_dropout,self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, padding=0, bias=True)
        self.conv2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=0, bias=True)
        self.dropout = nn.Dropout(0.5)
        self.fc1 = nn.Linear(12*12*64, 128)
        self.fc2 = nn.Linear(128, 10)


    def last_hidden_layer_output(self, X):
        X = F.relu(self.conv1(X))
        X = self.dropout(F.max_pool2d(F.relu(self.conv2(X)),2))
        X = X.view(-1, 12*12*64)
        X = self.dropout(F.relu(self.fc1(X)))
        return X

    def forward(self, X):
        X = F.relu(self.conv1(X))
        X = self.dropout(F.max_pool2d(F.relu(self.conv2(X)),2))
        X = X.view(-1, 12*12*64)
        X = self.dropout(F.relu(self.fc1(X)))
        X = torch.softmax(F.relu(self.fc2(X)), dim=1)
        return X

Would you please comment and correct me if I did anything wrong?

And I am not sure which loss function to use to train this model. Because there is a softmax in output. I would appreciate if you can also guide how to train exactly? I think I cannot use nn.CrossEntropyLoss as below:

output = Model(data)
loss = nn.CrossEntropyLoss(output,target)

And last thing, I will also need to use the output of to last hidden layer activation. Thats why I also include a method called last_hidden_layer_output. Is there any other way to handle this?

ptrblck · November 13, 2020, 11:09am

nn.CrossEntropyLoss can be used for a multi-class classification and expects raw logits as the model output so you should remove the last torch.softmax activation.
Also, remove the last F.relu and return the output of self.fc2(x) directly.

You could use forward hooks as described here or you could keep this method but reuse it in forward to avoid duplicated code:

def forward(self, x):
    x = self.last_hidden_layer_output(x)
    x = self.fc2(x)
    return x

jarvico · November 13, 2020, 11:23am

Thank you ptrblck.

First of all , I needed to use softmax at last layer.

The reason I used F.relu at the last layer is because when I calculate loss as below I get nan values after some epoch.

toutput = model(data)
loss = F.nll_loss(torch.log(output), target)

So, I used Relu at last layer and changes loss calculate as below:

eps = 1e-7
output = model(data)
loss = F.nll_loss(torch.log(output+eps), target)

If I remove Relu from last layer, how can I guarantee the term output in torch.log become non negative?

ptrblck · November 13, 2020, 11:28am

Why would that be the case?

F.nll_loss or nn.NLLLoss expects log probabilities as the model output, so you should use:

output = F.log_softmax(x, dim=1)
loss = F.nll_loss(output, target)

while nn.CrossEntropyLoss expects raw logits and will apply F.log_softmax + F.nll_loss internally.

jarvico · November 13, 2020, 12:27pm

Hi,

Thank you ptrblck.

I changed my model to:

class Model(nn.Module):
    def __init__(self):
        super(Model,self).__init__()
        self.conv1 = nn.Conv2d(in_channels=1, out_channels=64, kernel_size=3, padding=0, bias=True)
        self.conv2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, padding=0, bias=True)
        self.dropout = nn.Dropout(0.5)
        self.fc1 = nn.Linear(12*12*64, 128)
        self.fc2 = nn.Linear(128, 10)

    def last_hidden_layer_output(self, X):
        X = F.relu(self.conv1(X))
        X = self.dropout(F.max_pool2d(F.relu(self.conv2(X)),2))
        X = X.view(-1, 12*12*64)
        X = self.dropout(F.relu(self.fc1(X)))
        return X

    def forward(self, X):
        X = self.last_hidden_layer_output(X)
        X = self.fc2(X)
        return X

I calculate loss like this:

        output = model(data)
        loss = F.nll_loss(F.log_softmax(output, dim=1), target)

I got predictions like this:

predictions = F.softmax(model(data), dim=1).data.argmax(1, keepdim=True)

I got last hidden layer output like this:

model.last_hidden_layer(data).data

ptrblck · November 13, 2020, 11:10pm

Your code looks generally good.

Some small suggestions:

Don’t use the .data attribute, as it could yield unwanted side effects
you can get the predictions directly via torch.argmax(output, dim=1) without applying the softmax, since the max. logit will also have the max. probability. However, you can of course apply the softmax, if you want to see the probabilities (just don’t pass it to the criterion to calculate the loss).