Problems with implementing a neural network. From Keras to Pytorch

Hello everyone! I’m writing this post because I am having trouble implementing a neural network in Pytorch being used to Keras.

Considering the input has dimension (6,3,1) - I am trying to work with time series forecasting, I would like to implement the following network:

model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(3, 1)))
model.add(Dense(50, activation='relu'))

I tried with:

MyNet = nn.Sequential(
        nn.Linear(50, 1)

but it doesn’t work. I believe the problem is related to Conv1d. Can you please explain me my mistakes?

Thank you!


Something that I am sure is wrong is that in your Keras code you have 64 as number of filters in Conv layer but I cannot see this number in PyTorch equivalent.
I think you might need to replace it with nn.Conv1d(in_channel=6, out_channel=64, kernel_size=2).

In PyTorch, we just need to determine in/out channels (number of input and output filters), shapes will be captured by PyTorch itself.

Another issue is that you need to add nn.ReLU() between two linear layers. Activation functions in PyTorch can be used like a separate layer.

Furthermore, I am not sure about your data, but you might need to check which dimension corresponds to features or temporal/spatial dims.
For this, you can follow this thread:
Understanding Convolution 1D output and Input - PyTorch Forums

Last 4 replies literally discusses a numerical example.


1 Like


according to Understanding Convolution 1D output and Input my input shape [6, 3, 1] corresponds to [batch_size, in_channel, len]. In addition, out_channel defines the number of the kernels.

As a consequence, MyNet should become:

MyNet = nn.Sequential(
        nn.Conv1d(in_channels = 3, out_channels = 64, kernel_size = 2),
        nn.Linear(50, 1)

Is it correct?

It doesn’t work, I get the following error: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size

Please, let me know what I am doing wrong!

You only have 6 samples?
What I mean is that you need to define which dimension is corresponding to temporal dim. Based on your explanation, it seems you have only 1 timestamp with 3 features for each sample (=6). In this case, your input data has length of 1, but you have defined kernel size = 2, so it’s not valid.
Can you elaborate your data? Because if in_channel=3 and len=1, then you can only define kernel size =1.

In the reference I provided, notice the permute of channels where temporal dimension is moved to last channel.

Besides what @Nikronic already explained, I think TF uses the channels_last format by default, so I assume the input shape corresponds to [batch_size=6, seq_len=3, channels=1] and has to be permuted to fit the expected input for nn.Conv1d as [batch_size, channels, seq_len].

Also, you are still missing another nn.ReLU after the conv layer.

1 Like

@ptrblck, you are right about the input shape correspondece to [batch_size = 6, seq_len = 3, channels = 1]! @Nikronic, yes I only have 6 samples. I permuted the input so to get one of dimensions (6,1,3) = [batch_size, channels, seq_len]. Yes, I was missing another nn.ReLU() after the Conv1d layer.

I modified MyNet as following:

f = nn.Sequential(
        nn.Linear(50, 1)

again, it does not work. I get the following error: Given groups=1, weight of size [64, 1, 2], expected input[6, 6, 3] to have 1 channels, but got 6 channels instead
Please, Can you give me any clue about what it’s wrong?

It says your input data is not [6, 1, 3] as you have mentioned. Your network architecture is correct for this input data. Literally, f is getting an input with shape [6, 6, 3]. Can you show how you use your inputs?

I start with X being the input and y being the ground truth:

for i in range(len(X)):
  print(X[i], y[i])

(6, 3, 1)
 [30]] 40
 [40]] 50
 [50]] 60
 [60]] 70
 [70]] 80
 [80]] 90

where [6,3,1]=[batch_size, seq_len, channels]. Then I permute the dimensions of X:

x_train = torch.FloatTensor(X).permute(0,2,1) 

torch.Size([6, 1, 3])
tensor([[[10., 20., 30.]],

        [[20., 30., 40.]],

        [[30., 40., 50.]],

        [[40., 50., 60.]],

        [[50., 60., 70.]],

        [[60., 70., 80.]]])


train = data.TensorDataset(x_train, y_train)
trainloader = data.DataLoader(train, batch_size=len(x_train), shuffle=False)

Is there something I’m missing?

Actually, no! everything looks fine to me. I even ran your code and it works fine:

x = torch.tensor([[[10., 20., 30.]],

                [[20., 30., 40.]],

                [[30., 40., 50.]],

                [[40., 50., 60.]],

                [[50., 60., 70.]],

                [[60., 70., 80.]]])
y = torch.ones(6, 1)

train = data.TensorDataset(x, y)
trainloader = data.DataLoader(train, batch_size=len(x), shuffle=False)

model = nn.Sequential(
        nn.Linear(50, 1)

for batch in trainloader:
    x_batch, y_batch = batch

You may have missed something? like the way you use x after batching to feed to model? Can you show your train loop?

PS. when you set batch_size=len(your whole data), you will get only 1 batch which contains a tensor [6, 1, 3]. You need to set batch_size=1 if you want 6x[1, 1, 3] tensors.

When it comes to the train loop, first I define the following:

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')
class Learner(pl.LightningModule):
    def __init__(self, model:nn.Module):
        self.model = model
    def forward(self, x):
        return self.model(x)
    def training_step(self, batch, batch_idx):
        x, y = batch      
        y_hat = self.model(x)   
        loss = nn.MSELoss()(y_hat, y)
        logs = {'train_loss': loss}
        return {'loss': loss, 'log': logs}   
    def configure_optimizers(self):
        return torch.optim.Adam(self.model.parameters(), lr=0.005)

    def train_dataloader(self):
        return trainloader

Then I define the model in the following way:

f = nn.Sequential(
        nn.Linear(50, 1)

model = NeuralDE(f, sensitivity='adjoint', solver='dopri5').to(device)

and finally:

learn = Learner(model)
trainer = pl.Trainer(min_epochs=200, max_epochs=300)

Ok, you are using PyTorch Lightning not PyTorch itself. PyTorch Lightning uses different approach to achieve same goal.

Also, You are feeding your f which so far, I assumed that it is the entire model to another nn module called NeuralDE. This NeuralDE is apparently using different structure for model which I don’t know how it works. For instance, if you just define model = f, then your code should work, I think.

I believe the problem is related to NeuralDE bacause I defined model = f as you suggested and it worked! Also, I succeded in implementing a neural network with 2 inputs.

It turns out I have to look deeply into the NeuralDE nn module!

Thank you for your time and patience!

I’m getting this error:

The size of tensor a (200) must match the size of tensor b (3) at non-singleton dimension 1

What does it mean?

Please print full stack trace for error, and the line that causes it.

But I think it’s related to a mathematical operation which dimension mismatch is happening e.g. matrix multiplication of two matrices where dimensions does not match.

The thing is that I am trying to implement a Neural ODE (NODE) in Pytorch. I’m following this example:
If you know a simple implementantion of NODE, please share it with me!

Regarding the error:

f = nn.Sequential(
        nn.Linear(50, ph)

model = NeuralDE(f, 
                   s_span=torch.linspace(0, 1, 10))

learningRate = 0.01
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)
for epoch in range(180):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)

        # print statistics
        running_loss += loss.item()
        if i % 200 == 199:
        #if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  #(epoch + 1, i + 1, running_loss / 2000))
                  (epoch + 1, i + 1, running_loss / 200))
            running_loss = 0.0

print('Finished Training')
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-15bbd244efe3> in <module>()
     11         # forward + backward + optimize
---> 12         outputs = model(inputs)
     13         loss = criterion(outputs, labels)
     14         loss.backward()

6 frames
/usr/local/lib/python3.6/dist-packages/torchdiffeq/_impl/ in rk4_alt_step_func(func, t, dt, y, k1)
     98     if k1 is None:
     99         k1 = func(t, y)
--> 100     k2 = func(t + dt * _one_third, y + dt * k1 * _one_third)
    101     k3 = func(t + dt * _two_thirds, y + dt * (k2 - k1 * _one_third))
    102     k4 = func(t + dt, y + dt * (k1 - k2 + k3))

RuntimeError: The size of tensor a (3) must match the size of tensor b (200) at non-singleton dimension 1

If you know any simple implementation of NODE, please share it. Thank you!