Pytorch equivalent of Keras

Neda · November 12, 2018, 8:33pm

I’m trying to convert CNN model code from Keras to Pytorch.

here is the original keras model:

input_shape = (28, 28, 1)
model = Sequential()
model.add(Conv2D(28, kernel_size=(3,3), input_shape=input_shape))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten()) # Flattening the 2D arrays for fully connected layers
model.add(Dense(128, activation=tf.nn.relu))
model.add(Dropout(0.2))
model.add(Dense(10,activation=tf.nn.softmax))

And this is my Pytorch model, and I am not sure am I doing right or not as I am new in CNN and Pytorch. I couldn’t find out what should be output channel in Pytorch corespond with this keras mode. Any comment would appreciate.

class NeuralNet(nn.Module):
    def __int__(self):
        super(NeuralNet, self).__init__()
        self.hidden1 = nn.conv2d(28, 28, kernel_size=(3, 3))
        self.hidden2 = nn.maxpoo2d(2, 2)
        self.hidden3 = nn.Linear(128, 10)  # equivalent to Dense in keras
        self.hidden4 = nn.Dropout(0.2)
        self.hidden5 = nn.linear(10)

    def forward(self, x):
        x = self.hidden2(self.hidden1(x))
        x = self.hidden2(F.relu(self.hidden3(x)))
        x = self.hidden2(F.relu(self.hidden5(x)))
        x = x.view(-1, 128)
        return x

ptrblck · November 12, 2018, 8:51pm

The in_channels in Pytorch’s nn.Conv2d correspond to the number of channels in your input.
Based on the input shape, it looks like you have 1 channel and a spatial size of 28x28.
Your first conv layer expects 28 input channels, which won’t work, so you should change it to 1.

Also the Dense layers in Keras give you the number of output units.
For nn.Linear you would have to provide the number if in_features first, which can be calculated using your layers and input shape or just by printing out the shape of the activation in your forward method.
Let’s walk through your layers:

After the first conv layer, your output will have the shape [batch_size, 28, 26, 26]. The 28 is given by the number of kernels your conv layer is using. Since you are not using any padding and leave the stride and dilation as 1, a kernel size of 3 will crop 1 pixel in each spatial dimension. Therefore you’ll end up with 28 activation maps of spatial size 26x26.
The max pooling layer will halve your spatial size, so that you’ll en up with [batch_size, 28, 13, 13].
The linear layer should therefore take 28*13*13=4732 input features.

Here is your revised code:

class NeuralNet(nn.Module):
    def __init__(self):
        super(NeuralNet, self).__init__()
        self.conv = nn.Conv2d(1, 28, kernel_size=3)
        self.pool = nn.MaxPool2d(2)
        self.hidden= nn.Linear(28*13*13, 128)
        self.drop = nn.Dropout(0.2)
        self.out = nn.Linear(128, 10)
        self.act = nn.ReLU()

    def forward(self, x):
        x = self.act(self.conv(x)) # [batch_size, 28, 26, 26]
        x = self.pool(x) # [batch_size, 28, 13, 13]
        x = x.view(x.size(0), -1) # [batch_size, 28*13*13=4732]
        x = self.act(self.hidden(x)) # [batch_size, 128]
        x = self.drop(x)
        x = self.out(x) # [batch_size, 10]
        return x


model = NeuralNet()

batch_size, C, H, W = 1, 1, 28, 28
x = torch.randn(batch_size, C, H, W)
output = model(x)

Neda · November 12, 2018, 9:17pm

@ptrblck thank you very much indeed for the clear explanation.

why in the forward function did x = self.act(self.conv(x)) . How does this work? And is self.out is a fully connected layer?

ptrblck · November 12, 2018, 9:19pm

I just defiend the nn.ReLU module as self.act and am reusing it on the layer’s output.
You could alternatively remove self.act and use F.relu instead.

Yes, self.out is a linear (fully-connected) layer.

AniketNavlur · June 11, 2019, 4:24pm

model = Sequential()

mnist_model.add(Conv2D(32, kernel_size=(3,3), activation='relu', input_shape=(28,28,1)))
mnist_model.add(MaxPool2D(pool_size=(2,2)))
mnist_model.add(Dropout(0.25))

mnist_model.add(Conv2D(64, kernel_size=(3,3), activation='relu'))
mnist_model.add(MaxPool2D(pool_size=(2,2)))
mnist_model.add(Dropout(0.25))

mnist_model.add(Flatten())
mnist_model.add(Dense(256, activation='relu'))
mnist_model.add(Dropout(0.5))
mnist_model.add(Dense(10, activation='softmax'))

What do I do in this case where one convolution layer is followed by another?
Below is my implementation. Is the architecture same as above same as that described by the above tensorflow code?

class NeuralNet(nn.Module):
  
    def __init__(self):
        super(NeuralNet, self).__init__()

        # spatial size: (28,28)    # number of channels = 1
        self.conv1 = nn.Conv2d(1, 32, kernel_size=(3,3))
        self.pool1 = nn.MaxPool2d(2)
        self.drop1 = nn.Dropout(0.25)

        # spatial size: (13, 13)    # number of channels = 32
        self.conv2 = nn.Conv2d(32, 32*64, kernel_size=(3,3))
        self.pool2 = nn.MaxPool2d(2)
        self.drop2 = nn.Dropout(0.25)

        # spatial size: (5,5)    # number of channels = 32*64 (applys 64 different filters on each channel?)
        # therefore number of nodes for the hidden layer 32*64*5*5
        self.linear3 = nn.Linear(32*64*5*5, 256)
        self.drop3 = nn.Dropout(0.5)

        self.linear4 = nn.Linear(256,10)

        self.relu = nn.ReLU()
        self.softmax = nn.Softmax()
        
    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.pool1(x)
        x = self.drop1(x)
        
        x = self.relu(self.conv2(x))
        x = self.pool2(x)
        x = self.drop2(x)
        
        x = x.view(-1, self.num_flat_features(x))
        x = self.relu(self.linear3(x))
        x = self.drop3(x)
        
        x = x.view(-1, self.num_flat_features(x))
        x = self.linear4(x)
        
        return x
      
    def num_flat_features(self, x):
        size = x.size()[1:]  # all dimensions except the batch dimension
        num_features = 1
        for s in size:
            num_features *= s
        return num_features
        
        
model = NeuralNet()

batch_size, C, H, W = 1, 1, 28, 28
x = torch.randn(batch_size, C, H, W)
output = model(x)

god_sp33d · June 11, 2019, 6:30pm

I think there are some corrections,

your 4th line in keras model says output should have 64 channels, in pytorch you are declaring 32*64 channels, we need to work on that. Because, In pytorch we need to declare just number of channels for the input, number of channels for the output, it takes care of the spatial sizes

Also, once we do resize the network only once before we pass it through a linear we don’t need to do it every time it goes through a linear layer.

class NeuralNet(nn.Module):
  
    def __init__(self):
        super(NeuralNet, self).__init__()

        self.conv1 = nn.Conv2d(1, 32, kernel_size=(3,3))
        self.pool1 = nn.MaxPool2d(2)
        self.drop1 = nn.Dropout(0.25)

        self.conv2 = nn.Conv2d(32, 64, kernel_size=(3,3))
        self.pool2 = nn.MaxPool2d(2)
        self.drop2 = nn.Dropout(0.25)

        self.linear3 = nn.Linear(64*5*5, 256)
        self.drop3 = nn.Dropout(0.5)

        self.linear4 = nn.Linear(256,10)
        self.relu = nn.ReLU()
        self.softmax = nn.Softmax()
        
    def forward(self, x):
        x = self.relu(self.conv1(x))
        x = self.pool1(x)
        x = self.drop1(x)
        
        x = self.relu(self.conv2(x))
        x = self.pool2(x)
        x = self.drop2(x)
        
        x = x.view(x.size(0), -1)
        x = self.relu(self.linear3(x))
        x = self.drop3(x)
        x = self.linear4(x)
        return x
              
model = NeuralNet()

batch_size, C, H, W = 1, 1, 28, 28
x = torch.randn(batch_size, C, H, W)
output = model(x)

Prateek_Gupta · February 25, 2020, 12:22pm

Every Conv2D layers majorly takes 3 parameters as input in the respective order: (in_channels, out_channels, kernel_size), where the out_channels acts as the in_channels for the next layer.

As rightly mentioned, you’ve defined 64 out_channels, whereas in pytorch implementation you are using 32*64 channels as output (which should not be the case).

Before using Dense Layer (Linear Layer in case of pytorch), you have to flatten the output and feed the flatten input in the Linear layer. Suppose if x is the input to be fed in the Linear Layer, you have to reshape it in the pytorch implementation as:

x = x.view(batch_size, -1),

where batch_size is the number of images being loaded using the ImageLoader function of torch.

Prateek_Gupta · February 25, 2020, 12:27pm

class NeuralNet(nn.Module):

def __init__(self):
    super(NeuralNet, self).__init__()

    self.conv1 = nn.Conv2d(1, 32, kernel_size=(3,3))
    self.conv2 = nn.Conv2d(32, 64, kernel_size=(3,3))
    self.pool = nn.MaxPool2d(2)
    self.drop = nn.Dropout(0.25)
    self.fc1 = nn.Linear(64*5*5, 256)
    self.fc2 = nn.Linear(256, 10)
    self.softmax = nn.Softmax()
    
def forward(self, x):
    x = self.relu(self.conv1(x))
    x = self.drop(pool(x))
 
    x = self.relu(self.conv2(x))
    x = self.drop(pool(x))
    
    x = x.view(x.shape[0], -1)
    x = self.relu(self.fc1(x))
    x = self.drop(x)

    x = self.relu(self.fc2(x))
    x = self.softmax(x)
    return x

model = NeuralNet()

batch_size, C, H, W = 1, 1, 28, 28
x = torch.randn(batch_size, C, H, W)
output = model(x)

The above code should work fine !

SamIIT · April 8, 2020, 5:26am

Hi @Prateek_Gupta
Don’t you need a Flatten layer before the Linear layers?

Sam

Deeksha · October 6, 2020, 8:35am

can someone please tell me how to convert this keras model to pytorch ? I am bit confused

shape = (64, 64, 3)

classifier = Sequential()
classifier.add(Conv2D(32, (3, 3), activation=‘relu’,input_shape=shape))
classifier.add(Conv2D(32, (3, 3), activation=‘relu’))
classifier.add(MaxPool2D(pool_size = (2, 2)))
classifier.add(Dropout(0.05))

classifier.add(Conv2D(64, (3, 3), activation=‘relu’))
classifier.add(Conv2D(64, (3, 3), activation=‘relu’))
classifier.add(MaxPool2D(pool_size=(2, 2)))
classifier.add(Dropout(0.10))

classifier.add(Flatten())

classifier.add(Dense(128, activation=‘relu’))
classifier.add(Dropout(0.15))
classifier.add(Dense(64, activation=‘relu’))

classifier.add(Dense(32, activation=‘relu’))
classifier.add(Dropout(0.20))

classifier.add(Dense(7, activation=‘softmax’))
classifier.summary()

ptrblck · October 8, 2020, 11:13pm

This tutorial gives you an example on how to create a custom model.
As described there you can initialize the modules in the __init__ method and use them in the forward.

eduardo4jesus · March 1, 2021, 10:19pm

I am just noticing that in Keras, we don’t have to inform the Dense (linear) layer input size. Now, on the PyTorch version, it seems there is no alternative other than inform the input size when instantiate the linear layer. Is that really the case, or there are alternative ways that are not covered in here?

mmg · March 2, 2021, 7:25am

Unfortunately, that is how PyTorch works. You have to specify the input size.

There is a pip package (whose name I am forgetting now). It has this feature that the input dimension is not needed. Plus, it has other things that are missing in PyTorch (like globalaveragepooling2d etc.). I will try to mention the name if I recall (or you can if you find it)

Trusha_Patel · February 7, 2022, 3:43pm

Can we implement the same without using class the way it is written in keras?

raining_day513 · August 22, 2022, 4:13pm

The interpretation of activation=tf.nn.softmax is missing.

ptrblck · August 22, 2022, 5:00pm

I’m not sure if you want to correct this code, but if so then note that the softmax activation is left out on purpose, since you would be using raw logits for nn.CrossEntropyLoss and log probabilities for nn.NLLLoss in a multi-class classification use case and thus will not use a softmax activation.

AlphaBetaGamma96 · August 22, 2022, 5:27pm

Hi @eduardo4jesus,

You’re looking for the Lazy Linear module, more info can be found in the docs here

gahaalt · November 7, 2022, 6:46pm

An alternative would be to use Pytorch Symbolic library which provides API for defining models similarly as in Keras. When using it you get Symbolic Tensors which you can use to get the input size of the intermediate output.

Usage looks like this:

from torch import nn
from pytorch_symbolic import Input, SymbolicModel

inputs = Input(shape=(100,))
outputs = nn.Linear(inputs.features, 100)(inputs)
print(outputs.shape)

model = SymbolicModel(inputs, outputs)
model.summary()

torch.Size([1, 100])
_________________________________________________
     Layer      Output shape   Params   Parent   
=================================================
1    Input_1    (None, 100)    0                 
2*   Linear_1   (None, 100)    10100    1        
=================================================
Total params: 10100
Trainable params: 10100
Non-trainable params: 0
_________________________________________________

Disclaimer: I am the author of Pytorch Symbolic.