IMDB returns only sentiment of 1

IljaAvadiev · March 14, 2023, 9:08pm

Hallo community,

I was trying to create a simple sentiment analysis model in Pytorch and I wanted to use torchtext.
train_dataset = torchtext.datasets.IMDB(root=‘./datasets’, split=‘train’) test_dataset = torchtext.datasets.IMDB(root=‘./datasets’, split=‘test’) dl = DataLoader(train_dataset, batch_size=1000, shuffle=True) next(iter(dl))[0]

When I use the above code I expected to get a mixture of 0’s and 1’s as the sentiment value, but for some reasons I get only 1’s.

Is there something wrong on my end?

raj-rishav · March 14, 2023, 11:22pm

I dont know what’s going on for sure. But yeah if you iterate over the dataloader using a for loop, you will find that there are both sentiments available in the batch.
Also dataset provided by torchtext, has labels 1 and 2 . I dont think theres anything wrong at your end, as when I tried to do it, I got the same results

for i, j in dl:
    print(i)

you will see that there are instances of label 2 too.

IljaAvadiev · March 15, 2023, 5:46pm

Thanks for the answer @raj-rishav.

I think this is an issue with torchtext.
GitHub Issue