Simple IMDB Binary classification not working

awsgcp · January 31, 2020, 6:44am

Tried to switch from tensorflow(keras) to pytorch. Testing basic binary classification model with embedding layer for IMDB movie review data. in tensorflow, everything is working fine, easy to build the model and converge very fast. Tried to replicate the same model in pytorch, struggled to make it work.

i searched online, did not see anyone used sigmoid for binary classificaiton, some used log_softmax, not sure why.

Is there anything done wrong? please help, thank you very much!

import torch
import torchtext
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim

TEXT = torchtext.data.Field(tokenize = 'spacy', fix_length=40)
LABEL = torchtext.data.LabelField(dtype = torch.float)

train_dataset, test_dataset = torchtext.datasets.IMDB.splits(TEXT, LABEL)

MAX_VOCAB_SIZE = 10000

TEXT.build_vocab(train_dataset, max_size = MAX_VOCAB_SIZE, vectors = "glove.6B.100d")
LABEL.build_vocab(train_dataset)

BATCH_SIZE = 32
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

train_loader, test_loader = torchtext.data.BucketIterator.splits(
    (train_dataset, test_dataset),
    batch_size = BATCH_SIZE,
    device = device)




class RNN(nn.Module):
    def __init__(self):
        super(RNN, self).__init__()

        self.embedding = nn.Embedding(len(TEXT.vocab), 100, padding_idx=TEXT.vocab.stoi[TEXT.pad_token])
        self.lstm = nn.LSTM(100, 100, batch_first = True)
        self.fc1 = nn.Linear(100, 1)


    def forward(self, x):
        x = self.embedding(x)
        x, hidden = self.lstm(x)
        x = x[-1,:,:]
        x = torch.sigmoid(self.fc1(x))
        return x


net = RNN()
net.embedding.weight.data.copy_(TEXT.vocab.vectors)

for param in net.embedding.parameters():
  param.requires_grad = False

net = net.to(device)
criterion = nn.BCELoss()
optimizer = optim.Adam(net.parameters()) 

for epoch in range(2): 
    print("epoch: {}".format(epoch+1))
    running_loss = 0.0
    for i, data in enumerate(train_loader, 0):
        inputs, labels = data
        optimizer.zero_grad()
        outputs = net(inputs)
        loss = criterion(outputs.squeeze(1), labels)
        loss.backward()
        optimizer.step()

        running_loss += loss.item()
        if i % 200 == 199:
            print('[%d, %5d] loss: %.3f' %
                  (epoch + 1, i + 1, running_loss / 200))
            running_loss = 0.0
print('Finished Training')

correct = 0
total = 0
with torch.no_grad():
    for data in test_loader:
        images, labels = data
        outputs = net(images)
        predicted = torch.round(outputs).reshape(-1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

print('Accuracy of the test data: {}'.format(100 * correct / total))

ptrblck · January 31, 2020, 6:52am

The code looks generally alright.
We recommend using raw logits and nn.BCEWithLogitsLoss, as the numerical stability is better than sigmoid + nn.BCELoss.

F.log_softmax is used together with e.g. nn.NLLLoss for a multi-classification use case.
You could rewrite the binary classification to a 2 class multi-class classification, but your approach should work.
If you want to try to reproduce your TensorFlow model, I would recommend to also check the parameter initializations, as they might differ for the used layers.

awsgcp · January 31, 2020, 7:17am

Thanks ptrblck, i wish there is something wrong with the code. but seems you also agree there is no major problem. i thought for this simple example it would converge very fast since embedding should be powerful enough for simple text classification. but the above code never gets the loss below 0.6 after many epochs. accuracy is just a bit above 50%.
anyway, I will keep looking into other example scripts online to see if I can reproduce the expected good result.
thanks again for the quick reply! ptrblck

ptrblck · January 31, 2020, 7:20am

Oh, wait a moment.
x = x[-1,:,:] looks fishy.
Since you are using batch_first=True, you would slice the tensor in the batch dimension, thus only using the last sample?
If you want to use the last output in the seq dimension, you would have to index the tensor in dim1.

Could you print the shape of output, please?

awsgcp · January 31, 2020, 7:40am

sure, no problem.

before x = x[-1,:,:], x.shape = torch.Size([40, 32, 100])
after x = x[-1,:,:], x.shape = torch.Size([32, 100])
after fully connected layer, x.shape = torch.Size([32, 1])

if the slice is in the batch dimension, the code should throw error. since 40(max sentence length) != 32. thanks.

ptrblck · January 31, 2020, 7:47am

Thanks for the information.
It seems as if the output is returned as [seq_len, batch_size, features].
Could you check the shape of the input also?
As you can see, I’m not very familiar with torchtext, but since you are using batch_first=True in yyour nn.LSTM, the batch dimension shoule be in dim0, which is apparently not the case here.

awsgcp · January 31, 2020, 7:56am

the input shape of the model is: torch.Size([40, 32])
yes, you are right, batch_first seems to be the cause of the problem!
thank you very much ptrblck!