IMDB returns only sentiment of 1

Hallo community,

I was trying to create a simple sentiment analysis model in Pytorch and I wanted to use torchtext.

train_dataset = torchtext.datasets.IMDB(root=‘./datasets’, split=‘train’)
test_dataset = torchtext.datasets.IMDB(root=‘./datasets’, split=‘test’)
dl = DataLoader(train_dataset, batch_size=1000, shuffle=True)
next(iter(dl))[0]

When I use the above code I expected to get a mixture of 0’s and 1’s as the sentiment value, but for some reasons I get only 1’s.

Is there something wrong on my end?

I dont know what’s going on for sure. But yeah if you iterate over the dataloader using a for loop, you will find that there are both sentiments available in the batch.
Also dataset provided by torchtext, has labels 1 and 2 . I dont think theres anything wrong at your end, as when I tried to do it, I got the same results

for i, j in dl:
    print(i)

you will see that there are instances of label 2 too.

1 Like

Thanks for the answer @raj-rishav.

I think this is an issue with torchtext.
GitHub Issue