Problems with implementing a neural network. From Keras to Pytorch

Mirage · October 22, 2020, 3:39pm

Hello everyone! I’m writing this post because I am having trouble implementing a neural network in Pytorch being used to Keras.

Considering the input has dimension (6,3,1) - I am trying to work with time series forecasting, I would like to implement the following network:

model = Sequential()
model.add(Conv1D(filters=64, kernel_size=2, activation='relu', input_shape=(3, 1)))
model.add(MaxPooling1D(pool_size=2))
model.add(Flatten())
model.add(Dense(50, activation='relu'))
model.add(Dense(1))

I tried with:

MyNet = nn.Sequential(
        nn.Conv1d(3,1,2),
        nn.MaxPool1d(2,stride=None),
        nn.Flatten(start_dim=1),
        nn.Linear(64,50),
        nn.Linear(50, 1)
        )

but it doesn’t work. I believe the problem is related to Conv1d. Can you please explain me my mistakes?

Thank you!

Nikronic · October 22, 2020, 5:01pm

Hi,

Something that I am sure is wrong is that in your Keras code you have 64 as number of filters in Conv layer but I cannot see this number in PyTorch equivalent.
I think you might need to replace it with nn.Conv1d(in_channel=6, out_channel=64, kernel_size=2).

In PyTorch, we just need to determine in/out channels (number of input and output filters), shapes will be captured by PyTorch itself.

Another issue is that you need to add nn.ReLU() between two linear layers. Activation functions in PyTorch can be used like a separate layer.

Furthermore, I am not sure about your data, but you might need to check which dimension corresponds to features or temporal/spatial dims.
For this, you can follow this thread:
Understanding Convolution 1D output and Input - PyTorch Forums

Last 4 replies literally discusses a numerical example.

Bests

Mirage · October 23, 2020, 4:12pm

Hi,

according to Understanding Convolution 1D output and Input my input shape [6, 3, 1] corresponds to [batch_size, in_channel, len]. In addition, out_channel defines the number of the kernels.

As a consequence, MyNet should become:

MyNet = nn.Sequential(
        nn.Conv1d(in_channels = 3, out_channels = 64, kernel_size = 2),
        nn.MaxPool1d(2),
        nn.Flatten(start_dim=1),
        nn.Linear(64,50),
        nn.ReLU(),
        nn.Linear(50, 1)
        )

Is it correct?

It doesn’t work, I get the following error: Calculated padded input size per channel: (1). Kernel size: (2). Kernel size can't be greater than actual input size

Please, let me know what I am doing wrong!

Nikronic · October 23, 2020, 7:54pm

You only have 6 samples?
What I mean is that you need to define which dimension is corresponding to temporal dim. Based on your explanation, it seems you have only 1 timestamp with 3 features for each sample (=6). In this case, your input data has length of 1, but you have defined kernel size = 2, so it’s not valid.
Can you elaborate your data? Because if in_channel=3 and len=1, then you can only define kernel size =1.

In the reference I provided, notice the permute of channels where temporal dimension is moved to last channel.

ptrblck · October 24, 2020, 9:26am

Besides what @Nikronic already explained, I think TF uses the channels_last format by default, so I assume the input shape corresponds to [batch_size=6, seq_len=3, channels=1] and has to be permuted to fit the expected input for nn.Conv1d as [batch_size, channels, seq_len].

Also, you are still missing another nn.ReLU after the conv layer.

Mirage · October 24, 2020, 3:12pm

@ptrblck, you are right about the input shape correspondece to [batch_size = 6, seq_len = 3, channels = 1]! @Nikronic, yes I only have 6 samples. I permuted the input so to get one of dimensions (6,1,3) = [batch_size, channels, seq_len]. Yes, I was missing another nn.ReLU() after the Conv1d layer.

I modified MyNet as following:

f = nn.Sequential(
        nn.Conv1d(1,64,2),
        nn.ReLU(),
        nn.MaxPool1d(2),
        nn.Flatten(start_dim=1),
        nn.Linear(64,50),
        nn.ReLU(),
        nn.Linear(50, 1)
        )

again, it does not work. I get the following error: Given groups=1, weight of size [64, 1, 2], expected input[6, 6, 3] to have 1 channels, but got 6 channels instead
Please, Can you give me any clue about what it’s wrong?

Nikronic · October 24, 2020, 6:05pm

It says your input data is not [6, 1, 3] as you have mentioned. Your network architecture is correct for this input data. Literally, f is getting an input with shape [6, 6, 3]. Can you show how you use your inputs?

Mirage · October 24, 2020, 6:38pm

I start with X being the input and y being the ground truth:

print(X.shape)
for i in range(len(X)):
  print(X[i], y[i])

(6, 3, 1)
[[10]
 [20]
 [30]] 40
[[20]
 [30]
 [40]] 50
[[30]
 [40]
 [50]] 60
[[40]
 [50]
 [60]] 70
[[50]
 [60]
 [70]] 80
[[60]
 [70]
 [80]] 90

where [6,3,1]=[batch_size, seq_len, channels]. Then I permute the dimensions of X:

x_train = torch.FloatTensor(X).permute(0,2,1) 
print(x_train.shape)
print(x_train)

torch.Size([6, 1, 3])
tensor([[[10., 20., 30.]],

        [[20., 30., 40.]],

        [[30., 40., 50.]],

        [[40., 50., 60.]],

        [[50., 60., 70.]],

        [[60., 70., 80.]]])

Then:

train = data.TensorDataset(x_train, y_train)
trainloader = data.DataLoader(train, batch_size=len(x_train), shuffle=False)

Is there something I’m missing?

Nikronic · October 24, 2020, 9:40pm

Actually, no! everything looks fine to me. I even ran your code and it works fine:


x = torch.tensor([[[10., 20., 30.]],

                [[20., 30., 40.]],

                [[30., 40., 50.]],

                [[40., 50., 60.]],

                [[50., 60., 70.]],

                [[60., 70., 80.]]])
y = torch.ones(6, 1)

train = data.TensorDataset(x, y)
trainloader = data.DataLoader(train, batch_size=len(x), shuffle=False)

model = nn.Sequential(
        nn.Conv1d(1,64,2),
        nn.ReLU(),
        nn.MaxPool1d(2),
        nn.Flatten(start_dim=1),
        nn.Linear(64,50),
        nn.ReLU(),
        nn.Linear(50, 1)
        )

for batch in trainloader:
    x_batch, y_batch = batch
    print(x_batch.shape)
    print(model(x_batch).shape)

You may have missed something? like the way you use x after batching to feed to model? Can you show your train loop?

PS. when you set batch_size=len(your whole data), you will get only 1 batch which contains a tensor [6, 1, 3]. You need to set batch_size=1 if you want 6x[1, 1, 3] tensors.

Mirage · October 25, 2020, 8:15am

When it comes to the train loop, first I define the following:

device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu')

class Learner(pl.LightningModule):
    def __init__(self, model:nn.Module):
        super().__init__()
        self.model = model
    
    def forward(self, x):
        return self.model(x)
    
    def training_step(self, batch, batch_idx):
        x, y = batch      
        y_hat = self.model(x)   
        loss = nn.MSELoss()(y_hat, y)
        logs = {'train_loss': loss}
        return {'loss': loss, 'log': logs}   
    
    def configure_optimizers(self):
        return torch.optim.Adam(self.model.parameters(), lr=0.005)

    def train_dataloader(self):
        return trainloader

Then I define the model in the following way:

f = nn.Sequential(
        nn.Conv1d(1,64,2),
        nn.ReLU(),
        nn.MaxPool1d(2),
        nn.Flatten(start_dim=1),
        nn.Linear(64,50),
        nn.ReLU(),
        nn.Linear(50, 1)
        )

model = NeuralDE(f, sensitivity='adjoint', solver='dopri5').to(device)

and finally:

learn = Learner(model)
trainer = pl.Trainer(min_epochs=200, max_epochs=300)
trainer.fit(learn)

Nikronic · October 25, 2020, 8:59am

Ok, you are using PyTorch Lightning not PyTorch itself. PyTorch Lightning uses different approach to achieve same goal.

Also, You are feeding your f which so far, I assumed that it is the entire model to another nn module called NeuralDE. This NeuralDE is apparently using different structure for model which I don’t know how it works. For instance, if you just define model = f, then your code should work, I think.

Mirage · October 25, 2020, 4:57pm

I believe the problem is related to NeuralDE bacause I defined model = f as you suggested and it worked! Also, I succeded in implementing a neural network with 2 inputs.

It turns out I have to look deeply into the NeuralDE nn module!

Thank you for your time and patience!

Mirage · October 26, 2020, 4:02pm

I’m getting this error:

The size of tensor a (200) must match the size of tensor b (3) at non-singleton dimension 1

What does it mean?

Nikronic · October 26, 2020, 6:34pm

Please print full stack trace for error, and the line that causes it.

But I think it’s related to a mathematical operation which dimension mismatch is happening e.g. matrix multiplication of two matrices where dimensions does not match.

Mirage · October 29, 2020, 9:54am

The thing is that I am trying to implement a Neural ODE (NODE) in Pytorch. I’m following this example: https://github.com/DiffEqML/torchdyn/blob/master/tutorials/02_classification.ipynb
If you know a simple implementantion of NODE, please share it with me!

Regarding the error:

f = nn.Sequential(
        nn.Conv1d(3,64,kernel_size=2),
        nn.ReLU(),
        nn.MaxPool1d(2),
        nn.Flatten(start_dim=1),
        nn.Linear(576,50),
        nn.ReLU(),
        nn.Linear(50, ph)
        )

model = NeuralDE(f, 
                   solver='rk4',
                   sensitivity='autograd',
                   s_span=torch.linspace(0, 1, 10))

learningRate = 0.01
criterion = torch.nn.MSELoss()
optimizer = torch.optim.SGD(model.parameters(), lr=learningRate)

for epoch in range(180):  # loop over the dataset multiple times

    running_loss = 0.0
    for i, data in enumerate(trainloader, 0):
        # get the inputs; data is a list of [inputs, labels]
        inputs, labels = data

        # zero the parameter gradients
        optimizer.zero_grad()

        # forward + backward + optimize
        outputs = model(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()

        # print statistics
        running_loss += loss.item()
        if i % 200 == 199:
        #if i % 2000 == 1999:    # print every 2000 mini-batches
            print('[%d, %5d] loss: %.3f' %
                  #(epoch + 1, i + 1, running_loss / 2000))
                  (epoch + 1, i + 1, running_loss / 200))
            running_loss = 0.0

print('Finished Training')

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-27-15bbd244efe3> in <module>()
     10 
     11         # forward + backward + optimize
---> 12         outputs = model(inputs)
     13         loss = criterion(outputs, labels)
     14         loss.backward()

6 frames
/usr/local/lib/python3.6/dist-packages/torchdiffeq/_impl/rk_common.py in rk4_alt_step_func(func, t, dt, y, k1)
     98     if k1 is None:
     99         k1 = func(t, y)
--> 100     k2 = func(t + dt * _one_third, y + dt * k1 * _one_third)
    101     k3 = func(t + dt * _two_thirds, y + dt * (k2 - k1 * _one_third))
    102     k4 = func(t + dt, y + dt * (k1 - k2 + k3))

RuntimeError: The size of tensor a (3) must match the size of tensor b (200) at non-singleton dimension 1

If you know any simple implementation of NODE, please share it. Thank you!