Swapaxes to meet input for lstm throws an error

LSTM is expecting (seq_len, batch, input_size) but when I do that PyTorch throws an error:

ValueError: Expected input batch_size (20) to match target batch_size (27).

20 is seq_len and 27 is batch_size.
Input shape:

torch.Size([20, 27, 87])

(seq_len, batch, input_size) accordingly.
My model:

class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.lstm1 = nn.LSTM(input_size=87, hidden_size=256)
        self.lstm2 = nn.LSTM(input_size=256, hidden_size=128)
        self.lstm3 = nn.LSTM(input_size=128, hidden_size=64)
        self.lstm4 = nn.LSTM(input_size=64, hidden_size=32)
        self.fc1 = nn.Linear(in_features=32, out_features=128)
        self.fc2 = nn.Linear(in_features=128, out_features=64)
        self.fc3 = nn.Linear(in_features=64, out_features=32)
        self.fc4 = nn.Linear(in_features=32, out_features=3)

    def forward(self, x):
        x = torch.tanh(self.lstm1(x)[0])
        x = torch.tanh(self.lstm2(x)[0])
        x = torch.tanh(self.lstm3(x)[0])
        x = torch.tanh(self.lstm4(x)[0])
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x

There are two questions:

  1. What’s wrong with seq_len in my case?
  2. Is it ok to swapaxes to meet requirments or it mixes important things:
X_train = X_train.swapaxes(1,0)

I think it depends on your use case.
Currently you are feeding a tensor with the shape [seq_len, batch_size, hidden_size] into your linear layer.
Usually the input to a linear layer should be [batch_size, *, in_features]. The * means any number of additional dimensions, where the same linear operation will be performed on.
In this case, your output won’t differ, but will be shaped [seq_len, batch_size, out_features].
Since your target is most likely in the shape [batch_size, *], you’ll get an error trying to calculate the loss.
You could permute the output or the activation in your model to set the batch dimension as dim0.

In case your target is only a single class for the whole sequence, probably you would like to get the activation of the last step from your LSTM.

Changed to dim0 labels, got error:

ValueError: Expected target size (27, 3), got torch.Size([27])

Manual:

Target: (N)(N) where each value is 0≤targets[i]≤C−10≤targets[i]≤C−1, or

(N,d1,d2,...,dK)(N,d1,d2,...,dK) with K≥2K≥2 in the case of K-dimensional loss.

So, manual allow [batch_size, * ] for target.
The question is how it should be encoded. I used the following data to encode labels and nn works:

y_train_torch.shape
torch.Size([27, 3])
tensor([[1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [1, 0, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 1, 0],
        [0, 0, 1],
        [0, 0, 1],
        [0, 0, 1],
        [0, 0, 1],
        [0, 0, 1],
        [0, 0, 1],
        [0, 0, 1]])

Feeding into nn any other shape causes an error.
I strongly believe in PyTorch and on my first task where I analyzed financial reports of Russian companies to define target capitalization it performed really well. Slightly better than TensorFlow. Keras caused an overfit.
I also strongly believe that I am monkey playing with collider and I want to learn.
Considering our case with seq_len it seems that it doesn’t really matter whether batch come first or seq_len.
But it’s nn and it should fit the data it should learn and give me accuracy at least on training data. Of course my dataset is really small but… I said: "One, two and three’ to mic and trying to classify it into three categories. Made it 30 times. Keras is fine and I reached 100% val acc and 100% test acc for 6 epochs but I want to use flexible instrument such as PyTorch for my tasks.
model(input) in PyTorch gives me that(see below) after 3000 epochs my doubt that I incorrectly encoded labels:

tensor([[[ -7.7228,   3.0183,  11.2289],
         [  0.1328,  -3.4348,   4.6932],
         [-10.8275, -10.2396,   0.1168],
         ...,
         [-10.7704, -10.0659,   0.1782],
         [-10.8403, -10.1490,   0.1689],
         [-10.7978, -10.1629,   0.1400]],

        [[-11.1168,   4.3190,  15.7578],
         [  0.3338,  -5.1119,   6.4511],
         [-15.0107, -15.3557,  -0.3449],
         ...,
         [-14.9699, -15.3302,  -0.3514],
         [-14.9923, -15.2761,  -0.3178],
         [-14.9932, -15.3604,  -0.3547]],

        [[-11.9856,   4.5941,  16.8911],
         [  0.4712,  -5.5082,   6.8625],
         [-16.2449, -17.0981,  -0.5629],
         ...,
         [-16.2539, -17.0999,  -0.5594],
         [-16.2275, -17.0364,  -0.5457],
         [-16.2737, -17.1193,  -0.5595]],

        ...,

        [[  0.2076,  -5.2667,  17.3286],
         [ -7.1703, -13.7633,  25.5701],
         [-17.3713, -18.6646,  -0.7112],
         ...,
         [-17.3655, -18.6440,  -0.7082],
         [-17.3683, -18.6579,  -0.7098],
         [-17.3624, -18.6468,  -0.7087]],

        [[  0.2093,  -5.2638,  17.3253],
         [ -7.1709, -13.7636,  25.5766],
         [-17.3682, -18.6617,  -0.7112],
         ...,
         [-17.3644, -18.6427,  -0.7081],
         [-17.3678, -18.6579,  -0.7099],
         [-17.3617, -18.6466,  -0.7088]],

        [[  0.2100,  -5.2618,  17.3232],
         [ -7.1747, -13.7550,  25.5766],
         [-17.3694, -18.6632,  -0.7113],
         ...,
         [-17.3630, -18.6415,  -0.7081],
         [-17.3680, -18.6584,  -0.7100],
         [-17.3609, -18.6461,  -0.7089]]], grad_fn=<ThAddBackward>)

Visual inspection and calculation gives my acc 0% on training set and 33 percent acc on test set.
And big thank you for answer ptrblck.
Hope to find the truth in my problem.
Ready to post any data for the task at my disposal.
Full code:

import librosa
from os import listdir
import numpy as np
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.nn.functional as F
def loadSound(path):
    soundList = listdir(path)
    loadedSound = []
    for sound in soundList:
        Y, sr = librosa.load(path + sound)
        loadedSound.append(librosa.feature.mfcc(Y, sr=sr))   
    return np.array(loadedSound)
one = loadSound('./voice_123/one/')
one = loadSound('./voice_123/one/')
two = loadSound('./voice_123/two/')
three = loadSound('./voice_123/three/')
X = np.concatenate((one, two, three), axis=0)
one_label = np.concatenate((np.ones(10), np.zeros(10), np.zeros(10)))
two_label = np.concatenate((np.zeros(10), np.ones(10), np.zeros(10)))
three_label = np.concatenate((np.zeros(10), np.zeros(10), np.ones(10)))
y = np.concatenate((one_label[:, None], two_label[:, None], three_label[:, None]), axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42, shuffle=False)
# X_train = X_train.swapaxes(1,0)
# X_test = X_test.swapaxes(1,0)
X_train_torch = torch.from_numpy(X_train).float()
X_test_torch = torch.from_numpy(X_test).float()
y_train_torch = torch.from_numpy(y_train).long()
y_test_torch = torch.from_numpy(y_test).long()
class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.lstm1 = nn.LSTM(input_size=87, hidden_size=256)
        self.lstm2 = nn.LSTM(input_size=256, hidden_size=128)
        self.lstm3 = nn.LSTM(input_size=128, hidden_size=64)
        self.lstm4 = nn.LSTM(input_size=64, hidden_size=32)
        self.fc1 = nn.Linear(in_features=32, out_features=128)
        self.fc2 = nn.Linear(in_features=128, out_features=64)
        self.fc3 = nn.Linear(in_features=64, out_features=32)
        self.fc4 = nn.Linear(in_features=32, out_features=3)

    def forward(self, x):
        x = torch.tanh(self.lstm1(x)[0])
        x = torch.tanh(self.lstm2(x)[0])
        x = torch.tanh(self.lstm3(x)[0])
        x = torch.tanh(self.lstm4(x)[0])
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x
model = RNN()
model(X_train_torch)
loss_fn = torch.nn.CrossEntropyLoss()
learning_rate = 0.00001
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(3000):
    y_pred = model(X_train_torch)
    loss = loss_fn(y_pred, y_train_torch)
    print(t, loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
for t in range(3000):
    y_pred = model(X_train_torch)
    loss = loss_fn(y_pred, y_train_torch)
    print(t, loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()
learning_rate = 0.0001
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(3000):
    y_pred = model(X_train_torch)
    loss = loss_fn(y_pred, y_train_torch)
    print(t, loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

And I can provide my: “One, two and three”.

Based on your description, it seems you are trying to classify your dataset into one of three classes.
You are right about the docs of nn.CrossEntropyLoss, i.e. the target might be multi-dimensional.
However, if you compare the shapes of the input (model output) and the target, you see that the channel dimension is missing (C in the docs).
In your case, your target should have the shape [batch_size] and contain the class indices, i.e. values in the range [0, 2].
Just call y_train_torch = torch.argmax(y_train_torch) and you should be fine.

The multi-dimensional use case is useful for e.g. segmentation tasks, where each pixel belongs to one particular class.

What you mean is that:

tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2,
        2, 2, 2])

I did it around 10 times and couldn’t believe my eyes till I read docs.
Actually [batch_size, *] is possible, I believe.
torch.argmax(y_train_torch) gives on my y_train_torch:

tensor(20)

Okay. I am closer to truth. I checked the shape of input and output tensors they are fine.
But:

model_for = model(X_train_torch)
for number in range(27):
    print(model_for[number].argmax())

Gives:

tensor(1)
tensor(1)
tensor(1)
tensor(1)
tensor(1)
tensor(1)
tensor(1)
tensor(1)
tensor(1)
tensor(2)
tensor(2)
tensor(2)
tensor(2)
tensor(2)
tensor(2)
tensor(2)
tensor(2)
tensor(10)
tensor(0)
tensor(0)
tensor(11)
tensor(5)
tensor(11)
tensor(11)
tensor(5)
tensor(5)
tensor(5)

Why is 5,10 and 11? I didn’t say that :slight_smile:
Labels are fine. Not sure but without it. It’s not working and asking me to give exact dimensions of the labels.
What if the problem is in seq_len and batch?
But PyTorch is giving me the error in the case when I correctly did a model that batch should be second.
So, even in the case of messing up with seq_len I have a bunch of Linear layers that should play the game and make everything perfect. Even if there is a mistake. Correct me if I am wrong.
Therefore, the problem might be in: seq_len.
Therefore, I have to change slicing of lstm to something else for model to be happy.
How?

Sorry, my bad. It should be torch.argmax(y_train_torch, dim=1).
This will give you the right class indices.

model(X_train_torch)[0] gives:

tensor([[-355.4695, -424.0195, -681.5226],
        [-339.5056, -432.3004, -696.3847],
        [-359.5927, -438.7826, -705.6765],
        [-346.2557, -439.3127, -707.5528],
        [-359.3715, -437.6559, -703.8282],
        [-356.7079, -443.8343, -714.3834],
        [-357.2831, -442.1734, -711.5266],
        [-355.7645, -442.3993, -712.0264],
        [-359.3816, -440.9185, -709.2794],
        [-357.2108, -443.9724, -714.5976],
        [-355.9514, -443.8340, -714.3595],
        [-359.4459, -440.7137, -708.9507],
        [-359.6165, -440.2903, -708.2195],
        [-357.3955, -443.4591, -713.6996],
        [-359.5545, -442.3593, -711.6683],
        [-356.2823, -443.2863, -713.4790],
        [-359.6244, -439.4549, -706.8243],
        [-358.2084, -445.1720, -716.4835],
        [-359.5984, -437.4826, -703.5777],
        [-356.8849, -442.9969, -712.9752]], grad_fn=<SelectBackward>)

which is shape:

torch.Size([20, 3])

Made a change as ptrblck suggested and I have become closer to truth data of output. At least it looks better. Going deeper in what ptrblck said.

tensor([[0.0890, 0.0380, 0.0673],
        [0.0890, 0.0379, 0.0674],
        [0.0891, 0.0380, 0.0674],
        [0.0891, 0.0379, 0.0674],
        [0.0891, 0.0379, 0.0674],
        [0.0892, 0.0380, 0.0675],
        [0.0891, 0.0380, 0.0675],
        [0.0891, 0.0378, 0.0673],
        [0.0891, 0.0378, 0.0673],
        [0.0891, 0.0379, 0.0676],
        [0.0890, 0.0379, 0.0674],
        [0.0890, 0.0379, 0.0675],
        [0.0891, 0.0379, 0.0674],
        [0.0892, 0.0379, 0.0675],
        [0.0891, 0.0379, 0.0674],
        [0.0892, 0.0380, 0.0675],
        [0.0891, 0.0380, 0.0675],
        [0.0891, 0.0379, 0.0675],
        [0.0891, 0.0379, 0.0674],
        [0.0892, 0.0379, 0.0675]], grad_fn=<SelectBackward>)

Obviously problem with seq_len because 20 is seq_len not a batch_size:

torch.Size([20, 3])

Trying to change input data to match manual [seq_len, batch_size, *]. Got an error:

ValueError: Expected input batch_size (20) to match target batch_size (27).

Will try to debug.
Shape of output from model(input) is still:

torch.Size([20, 27, 3])

Strange we did self.lstm1(x)[0]) that should destroy dimension 20.
Obviously, I should change the code in lstm part of my model to match dimension, it seems that tanh make it fit for the next lstm which is not right because I want to feed it to Linear. As was suggested to me earlier. Will try to debug the model.
According to the manual tanh doesn’t change output:

input (Tensor) – the input tensor
out (Tensor, optional) – the output tensor

Will try to look manual for lstm.
Changing model to:

        x = torch.tanh(self.lstm1(x))
        x = torch.tanh(self.lstm2(x))
        x = torch.tanh(self.lstm3(x))
        x = torch.tanh(self.lstm4(x)[0])

Didn’t help. Got error:

TypeError: tanh(): argument 'input' (position 1) must be Tensor, not tuple

What if I try:

    class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()
        self.lstm1 = nn.LSTM(input_size=87, hidden_size=256)
        self.lstm2 = nn.LSTM(input_size=256, hidden_size=128)
        self.lstm3 = nn.LSTM(input_size=128, hidden_size=64)
        self.lstm4 = nn.LSTM(input_size=64, hidden_size=32)
        self.fc1 = nn.Linear(in_features=32, out_features=128)
        self.fc2 = nn.Linear(in_features=128, out_features=64)
        self.fc3 = nn.Linear(in_features=64, out_features=32)
        self.fc4 = nn.Linear(in_features=32, out_features=3)

    def forward(self, x):
        x = torch.tanh(self.lstm1(x)[0])
        x = torch.tanh(self.lstm2(x)[0])
        x = torch.tanh(self.lstm3(x)[0])
        x = torch.tanh(self.lstm4(x)[0][0])
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = F.relu(self.fc3(x))
        x = self.fc4(x)
        return x

Oh my God, it finally worked :smile:
And acc:

Training accuracy: 100.0%
Testing accuracy: 100.0%

Thank you very much ptrblck,all is good!

1 Like

Awesome @andreiliphd!
It was a pleasure to see how you managed to get rid of all the bugs! :slight_smile:

it’s good.i have the same problem.Thanks you help me to slove the problem.