CNN Calculated Padded Error

I’m trying to replicate a Keras model which starts out like this:

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_2 (InputLayer)         (None, 720, 1, 1)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 720, 1, 128)       1152      
_________________________________________________________________

The input is (batch_size, 720, 1, 1) and then a conv2d layer is applied on it using 128 filters and kernel size of 8. In trying to replicate this in pytorch, I have:

import torch

a = torch.randn(32,720, 1, 1)
print('a:', a.size()) # a: torch.Size([32, 720, 1, 1])

torch.nn.Conv2d(720, 128, kernel_size=8, stride=1)(a)

But I’m getting the following error…

RuntimeError: Calculated padded input size per channel: (1 x 1). Kernel size: (8 x 8). Kernel size can’t greater than actual input size at /pytorch/aten/src/THNN/generic/SpatialConvolutionMM.c:48

Any ideas what I’m doing wrong and why this is working on keras and not on pytorch?

Could you post the keras code?
It doesn’t really make sense to use a kernel size of 8 on an input of 1x1 spatial dimension.

Sure. Here’s the first part of the Keras model.

x = keras.layers.Input(x_train.shape[1:])
conv1 = keras.layers.Conv2D(128, 8, 1, border_mode='same')(x)
conv1 = keras.layers.normalization.BatchNormalization()(conv1)
conv1 = keras.layers.Activation('relu')(conv1)

@ptrblck: The full Keras code & preprocessing can be viewed here (github).

Your Keras model probably uses NHWC ordering for the convolutions. PyTorch uses NCHW for convolutions.
This means your input should be transposed. Also, it looks like you’re doing a 1-d convolution so you probably want to change the kernel size to: (8, 1).

a = torch.randn(32, 1, 720, 1)
torch.nn.Conv2d(1, 128, kernel_size=(8,1), stride=1)(a)

Here N=32, C=1, H=720, W=1. In the output C will be 128. N is batch. C is channels. H and W are the spatial dimensions of the inputs.

3 Likes

Makes sense - thanks!