Hi,

I’m trying to train a language model using a BiLSTM, but I’m getting really weird values for the test loss.

A training epoch looks like this:

```
for batch in tqdm(get_batches(train, BATCH_SIZE, shuffle=True)):
model.zero_grad()
X, y, lengths = batch
batch_size, seq_len = X.size()
hidden = model.init_hidden(batch_size)
yhat, hidden = model(X, lengths, hidden)
yhat = yhat.permute(1, 2, 0)
loss = loss_function(yhat, y)
loss.backward()
optimizer.step()
total_train_loss += loss.item()
hidden = (hidden[0].detach(), hidden[1].detach())
```

and my test loop looks like this:

```
model.eval()
with torch.no_grad():
for batch in tqdm(get_batches(test, 1, shuffle=False)):
X, y, lengths = batch
batch_size, seq_len = X.size()
hidden = model.init_hidden(batch_size)
yhat, hidden = model(X, lengths, hidden)
yhat = yhat.permute(1, 2, 0)
loss = loss_function(yhat, y)
total_test_loss += loss.item()
hidden = (hidden[0].detach(), hidden[1].detach())
```

I’m getting a loss of 1.43, and that’s without dividing it by the number of batches.

Anyone has any idea why would that happen? I’m using `nn.CrossEntropyLoss(ignore_index=PAD)`

as the loss function.

Thanks!