DataLoader changes type of variables based on CPU/GPU existance

aviasd · May 17, 2020, 1:26am

I am getting a batch of strings from the dataloader.
The way I get it is setting batch_size=1 and creating buckets in the Dataset.init(), so every batch is only “one file” for the dataloader, but it is 8 strings in a list.
When using CPU, the dataloader puts every string in a tuple like this (“some string”, ).
So we get list of 8 tuples.
When using GPU, the dataloader puts every string in a list like this [“some string”].
So we get list of 8 lists.
Is that something that is known to happen?

Isaac_Kargar · May 17, 2020, 7:49am

can you pls put a sample code?

ghery · June 16, 2021, 10:50am

With this code:

import torch

class MyDataset(Dataset):
    def __init__(self):
        self.data = 100*[['a', 'b']]
        
    def __getitem__(self, index):
        x = self.data[index]
        return x
    
    def __len__(self):
        return len(self.data)

dataset = MyDataset()
loader = DataLoader(
    dataset,
    batch_size=5,
    num_workers=2,
    shuffle=True
)

for data in loader:
    print(data)

I get

[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]

Any clue @Isaac_Kargar ?