I am getting a batch of strings from the dataloader.
The way I get it is setting batch_size=1 and creating buckets in the Dataset.init(), so every batch is only “one file” for the dataloader, but it is 8 strings in a list.
When using CPU, the dataloader puts every string in a tuple like this (“some string”, ).
So we get list of 8 tuples.
When using GPU, the dataloader puts every string in a list like this [“some string”].
So we get list of 8 lists.
Is that something that is known to happen?
can you pls put a sample code?
With this code:
import torch
class MyDataset(Dataset):
def __init__(self):
self.data = 100*[['a', 'b']]
def __getitem__(self, index):
x = self.data[index]
return x
def __len__(self):
return len(self.data)
dataset = MyDataset()
loader = DataLoader(
dataset,
batch_size=5,
num_workers=2,
shuffle=True
)
for data in loader:
print(data)
I get
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
[('a', 'a', 'a', 'a', 'a'), ('b', 'b', 'b', 'b', 'b')]
Any clue @Isaac_Kargar ?