Changed to dim0 labels, got error:

```
ValueError: Expected target size (27, 3), got torch.Size([27])
```

Manual:

```
Target: (N)(N) where each value is 0â€targets[i]â€Câ10â€targets[i]â€Câ1, or
(N,d1,d2,...,dK)(N,d1,d2,...,dK) with Kâ„2Kâ„2 in the case of K-dimensional loss.
```

So, manual allow [batch_size, * ] for target.

The question is how it should be encoded. I used the following data to encode labels and nn works:

```
y_train_torch.shape
torch.Size([27, 3])
tensor([[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 1, 0],
[0, 0, 1],
[0, 0, 1],
[0, 0, 1],
[0, 0, 1],
[0, 0, 1],
[0, 0, 1],
[0, 0, 1]])
```

Feeding into nn any other shape causes an error.

I strongly believe in PyTorch and on my first task where I analyzed financial reports of Russian companies to define target capitalization it performed really well. Slightly better than TensorFlow. Keras caused an overfit.

I also strongly believe that I am monkey playing with collider and I want to learn.

Considering our case with seq_len it seems that it doesnât really matter whether batch come first or seq_len.

But itâs nn and it should fit the data it should learn and give me accuracy at least on training data. Of course my dataset is really small butâŠ I said: "One, two and threeâ to mic and trying to classify it into three categories. Made it 30 times. Keras is fine and I reached 100% val acc and 100% test acc for 6 epochs but I want to use flexible instrument such as PyTorch for my tasks.

model(input) in PyTorch gives me that(see below) after 3000 epochs my doubt that I incorrectly encoded labels:

```
tensor([[[ -7.7228, 3.0183, 11.2289],
[ 0.1328, -3.4348, 4.6932],
[-10.8275, -10.2396, 0.1168],
...,
[-10.7704, -10.0659, 0.1782],
[-10.8403, -10.1490, 0.1689],
[-10.7978, -10.1629, 0.1400]],
[[-11.1168, 4.3190, 15.7578],
[ 0.3338, -5.1119, 6.4511],
[-15.0107, -15.3557, -0.3449],
...,
[-14.9699, -15.3302, -0.3514],
[-14.9923, -15.2761, -0.3178],
[-14.9932, -15.3604, -0.3547]],
[[-11.9856, 4.5941, 16.8911],
[ 0.4712, -5.5082, 6.8625],
[-16.2449, -17.0981, -0.5629],
...,
[-16.2539, -17.0999, -0.5594],
[-16.2275, -17.0364, -0.5457],
[-16.2737, -17.1193, -0.5595]],
...,
[[ 0.2076, -5.2667, 17.3286],
[ -7.1703, -13.7633, 25.5701],
[-17.3713, -18.6646, -0.7112],
...,
[-17.3655, -18.6440, -0.7082],
[-17.3683, -18.6579, -0.7098],
[-17.3624, -18.6468, -0.7087]],
[[ 0.2093, -5.2638, 17.3253],
[ -7.1709, -13.7636, 25.5766],
[-17.3682, -18.6617, -0.7112],
...,
[-17.3644, -18.6427, -0.7081],
[-17.3678, -18.6579, -0.7099],
[-17.3617, -18.6466, -0.7088]],
[[ 0.2100, -5.2618, 17.3232],
[ -7.1747, -13.7550, 25.5766],
[-17.3694, -18.6632, -0.7113],
...,
[-17.3630, -18.6415, -0.7081],
[-17.3680, -18.6584, -0.7100],
[-17.3609, -18.6461, -0.7089]]], grad_fn=<ThAddBackward>)
```

Visual inspection and calculation gives my acc 0% on training set and 33 percent acc on test set.

And big thank you for answer ptrblck.

Hope to find the truth in my problem.

Ready to post any data for the task at my disposal.

Full code:

```
import librosa
from os import listdir
import numpy as np
from sklearn.model_selection import train_test_split
import torch
import torch.nn as nn
import torch.nn.functional as F
def loadSound(path):
soundList = listdir(path)
loadedSound = []
for sound in soundList:
Y, sr = librosa.load(path + sound)
loadedSound.append(librosa.feature.mfcc(Y, sr=sr))
return np.array(loadedSound)
one = loadSound('./voice_123/one/')
one = loadSound('./voice_123/one/')
two = loadSound('./voice_123/two/')
three = loadSound('./voice_123/three/')
X = np.concatenate((one, two, three), axis=0)
one_label = np.concatenate((np.ones(10), np.zeros(10), np.zeros(10)))
two_label = np.concatenate((np.zeros(10), np.ones(10), np.zeros(10)))
three_label = np.concatenate((np.zeros(10), np.zeros(10), np.ones(10)))
y = np.concatenate((one_label[:, None], two_label[:, None], three_label[:, None]), axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.1, random_state=42, shuffle=False)
# X_train = X_train.swapaxes(1,0)
# X_test = X_test.swapaxes(1,0)
X_train_torch = torch.from_numpy(X_train).float()
X_test_torch = torch.from_numpy(X_test).float()
y_train_torch = torch.from_numpy(y_train).long()
y_test_torch = torch.from_numpy(y_test).long()
class RNN(nn.Module):
def __init__(self):
super(RNN, self).__init__()
self.lstm1 = nn.LSTM(input_size=87, hidden_size=256)
self.lstm2 = nn.LSTM(input_size=256, hidden_size=128)
self.lstm3 = nn.LSTM(input_size=128, hidden_size=64)
self.lstm4 = nn.LSTM(input_size=64, hidden_size=32)
self.fc1 = nn.Linear(in_features=32, out_features=128)
self.fc2 = nn.Linear(in_features=128, out_features=64)
self.fc3 = nn.Linear(in_features=64, out_features=32)
self.fc4 = nn.Linear(in_features=32, out_features=3)
def forward(self, x):
x = torch.tanh(self.lstm1(x)[0])
x = torch.tanh(self.lstm2(x)[0])
x = torch.tanh(self.lstm3(x)[0])
x = torch.tanh(self.lstm4(x)[0])
x = F.relu(self.fc1(x))
x = F.relu(self.fc2(x))
x = F.relu(self.fc3(x))
x = self.fc4(x)
return x
model = RNN()
model(X_train_torch)
loss_fn = torch.nn.CrossEntropyLoss()
learning_rate = 0.00001
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(3000):
y_pred = model(X_train_torch)
loss = loss_fn(y_pred, y_train_torch)
print(t, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
for t in range(3000):
y_pred = model(X_train_torch)
loss = loss_fn(y_pred, y_train_torch)
print(t, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
learning_rate = 0.0001
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
for t in range(3000):
y_pred = model(X_train_torch)
loss = loss_fn(y_pred, y_train_torch)
print(t, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()
```

And I can provide my: âOne, two and threeâ.