[Solved] Possible bug in dataloader when returning list

nivter · April 24, 2018, 6:09am

I have raised an issue on github but posting it here as well

A loader class that is supposed to return a list object apparently tampers with the object. Here’s an example:

from torch.utils.data import Dataset, DataLoader
import numpy as np

sents = ['This is a sentence.', 'Yet another sentence.', 'Worry not, another sequence of words is here']
a = []
for sent in sents:
    words = sent.strip().split(' ')
    a.append(words)

a now is:

[['This', 'is', 'a', 'sentence.'], ['Yet', 'another', 'sentence.'], ['Worry', 'not,', 'another', 'sequence', 'of', 'words', 'is', 'here']]

Then I proceed to create a loader class:

class Loader(Dataset):
    def __init__(self):
        self.n = len(a)
        
    def __getitem__(self, index):
        return a[index]
    
    def __len__(self):
        return self.n

Now I define a loader:

loader = DataLoader(Loader(), batch_size=1)
i = iter(loader)
print(i.next())

What I expect:

['This', 'is', 'a', 'sentence.']

What I get:

[('This',), ('is',), ('a',), ('sentence.',)]

Is this on purpose (I doubt) or a bug?

nivter · April 24, 2018, 8:03am

Found the solution on github. Seems like it is the expected behaviour.