Since Y is of shape (1000,1,2), then indexing this tensor by target = Y[i] will make target to have shape (1, 2). But the input to the loss function defined as torch.nn.CrossEntropyLoss needs input to have shape (N,C) and target to be of shape (N). (N is the batch-size)
So, I think you should reshape both target and the output of your model accordingly. So, the target should have shape (1), and output should have shape (1, 2). But I think you are using one-hot vectors for the target, so that needs to be changed to be a tensor of class labels from {0, 1},
So, given that the shape of output is (2) and the target is (1,2), then I think the following changes may solve the issue:
output = rnn(item)
target = y[i]
# reshape the output:
output = output.reshape(-1, 2)
# get elements in the second column of target
target = target[:,1]
# now compute the loss
loss = loss_func(output, target.argmax(dim=1))
import torch
import torch.nn as nn
class RNN(nn.Module):
def __init__(self):
super(RNN, self).__init__()
self.rnn = nn.LSTM(
input_size=6,
hidden_size=6,
num_layers=2,
batch_first=True,
)
self.fc = nn.Linear(6, 2)
def forward(self, x):
out, (h_n, h_c) = self.rnn(x, None)
return out[:, -1, :] # Return output at last time-step
X = torch.randn(10, 1, 6)
y = torch.randn(10, 1, 2)
rnn = RNN()
optimizer = torch.optim.Adam(rnn.parameters(), lr=0.001)
loss_func = nn.CrossEntropyLoss()
for j in range(2):
for i, item in enumerate(X):
item = item.unsqueeze(0)
output = rnn(item)
target = y[i]
# print(target)
#target = target.squeeze_()
#print("input shape:",output.shape, "output shape:", target.shape)
loss = loss_func(output, target.argmax(dim=1))
optimizer.zero_grad()
loss.backward()
optimizer.step()
Hope it helps. But still, you code doesn’t make much sense to me. First, you input shape is (1000, 1, 6), in which you are saying that your whole batch size is 1000, sequence length is 1, and feature dimension is 6. If you are using RNN, why do you have sequence length 1? Second, You label’s shape is (batch_size, sequence_len, num_of_classes). This is weird. I think you want one label for each sequence, then why is sequence_len involved? Should it be (batch_size, num_of_classes)? Third, in your RNN network, you defined fc linear layer in init(), but you forgot to call it in forward()